How AI-generated text is poisoning the internet

It’s been a wild year for AI. If you’ve spent much time online, you’ve probably come across images generated by AI systems like DALL-E 2 or Stable Diffusion, or jokes, essays or other text written by ChatGPT, the latest incarnation of OpenAI’s large language model GPT – 3.
Sometimes it’s obvious when an image or piece of text was created by an AI. But increasingly, the output these models generate can easily fool us into thinking it was made by a human. And especially large language models are self-confident bullshitters: they create text that sounds correct, but may in fact be full of falsehoods.
While it doesn’t matter if it’s just a bit of fun, using AI models to offer unfiltered health advice or provide other forms of important information can have serious consequences. AI systems can also make it stupidly easy to produce reams of misinformation, abuse and spam, distorting the information we consume and even our sense of reality. This can be particularly worrying about, for example, elections.
The proliferation of these easily accessible large language models raises an important question: How will we know whether what we read online was written by a human or a machine? I just published a story looking at the tools we currently have for spotting AI-generated text. Spoiler alert: Today’s detection toolset is woefully inadequate against ChatGPT.
But there is a more serious long-term implication. We may be witnessing, in real time, the birth of a snowball full of bullshit.
Large language models are trained on datasets built by scraping the internet for text, including all the toxic, silly, fake, malicious things people have written online. The finished AI models resurrect these fakes as fact, and their output is distributed all over the internet. Tech companies are scraping the internet again, creating AI-written text that they use to train larger, more convincing models, which humans can use to generate even more nonsense before it is scraped again and again, ad nauseam.
This problem—AI feeding on itself and producing increasingly polluted output—extends to images. “The internet is now forever infected with images made by AI,” Mike Cook, an AI researcher at King’s College London, told my colleague Will Douglas Heaven in his new piece on the future of generative AI- models.
“The images we made in 2022 will be part of any model made from now on.”