I would have expected that by now someone would have written a comparative analysis on pieces of scholarly writing on the Canadian campus fossil fuel divestment movement: for instance, engaging with both Joe Curnow’s 2017 dissertation and mine from 2022.
So, I gave both public texts to NotebookLM to have it generate an audio overview. It wrongly assumes that Joe Curnow is a man throughout, and mangles the pronunciation of “Ilnyckyj” in a few different ways — but at least it acts like it has read about the texts and cares about their content.
It is certainly muddled in places (though perhaps in ways I have also seen in scholarly literature). For example, it treats the “enemy naming” strategy as something that arose through the functioning of CFFD campaigns, whereas it was really part of 350.org’s “campaign in a box” from the beginning.
This hints to me at how large language models are going to be transformative for writers. Finding an audience is hard, and finding an engaged audience willing to share their thoughts back is nigh-impossible, especially if you are dealing with scholarly texts hundreds of pages long. NotebookLM will happily read your whole blog and then have a conversation about your psychology and interpersonal style, or read an unfinished manuscript and provide detailed advice on how to move forward. The AI isn’t doing the writing, but providing a sort of sounding board which has never existed before: almost infinitely patient, and not inclined to make its comments all about its social relationship with the author.
I wonder what effect this sort of criticism will have on writing. Will it encourage people to hew more closely to the mainstream view, but providing a critique that comes from a general-purpose LLM? Or will it help people dig ever-deeper into a perspective that almost nobody shares, because the feedback comes from systems which are always artificially chirpy and positive, and because getting feedback this way removes real people from the process?
And, of course, what happens when the flawed output of these sorts of tools becomes public material that other tools are trained on?
AI models fed AI-generated data quickly spew nonsense
Researchers gave successive versions of a large language model information produced by previous generations of the AI — and observed rapid collapse.
https://www.nature.com/articles/d41586-024-02420-7
“Training artificial intelligence (AI) models on AI-generated text quickly leads to the models churning out nonsense, a study has found. This cannibalistic phenomenon, termed model collapse, could halt the improvement of large language models (LLMs) as they run out of human-derived training data and as increasing amounts of AI-generated text pervade the Internet.
“The message is, we have to be very careful about what ends up in our training data,” says co-author Zakhar Shumaylov, an AI researcher at the University of Cambridge, UK. Otherwise, “things will always, provably, go wrong”. he says.” The team used a mathematical analysis to show that the problem of model collapse is likely to be universal, affecting all sizes of language model that use uncurated data, as well as simple image generators and other types of AI.”