Nobody is creating content anymore
Something strange is happening on the internet. In 2025, AI-generated text surpassed human-written output for the first time. More than half of newly published articles online are now written by machines. Ahrefs found that 74% of new web pages contain detectable AI content. Europol warns that up to 90% of online content could be synthetically generated by 2026. The internet is filling up with words that no human ever thought or felt. And the real problem isn't just the flood of mediocre content. It's what happens next, when AI starts learning from itself.
The content flood
The numbers are hard to ignore. A study by Graphite, an SEO firm, tracked over 60,000 articles published between 2020 and 2025. By late 2024, the quantity of AI-generated articles had already surpassed human-written ones. The growth curve is steep: within just 12 months of ChatGPT's launch, AI-generated articles accounted for nearly 39% of all new publications. By 2025, they were the majority. This isn't limited to throwaway blog spam. Companies are leaning heavily on AI for marketing copy, documentation, support articles, social media posts, and even journalism. Neil Patel's data from 500 companies shows that businesses are increasingly choosing AI over human writers for text-based content, though most still use humans to edit AI output. The sheer volume is staggering. ARK Invest highlighted that the total volume of AI-generated written content exceeded human-written output in 2025. We've reached the point where machines are the internet's most prolific authors.
When AI eats its own tail
Here's where things get genuinely concerning. AI models learn by finding patterns in training data, the vast majority of which has historically been scraped from the internet. But if the internet is now mostly AI-generated content, future models will inevitably train on the output of previous models. Researchers at Oxford University, led by Ilia Shumailov, published a landmark paper in Nature in 2024 that gave this problem a name: model collapse. Their finding is stark. When generative models are trained on data produced by earlier generations of models, the output degrades irreversibly. The tails of the original data distribution, the rare, unusual, and diverse content, disappear first. Over successive generations, the model converges toward bland, homogeneous output that bears little resemblance to the original training data. Shumailov compared it to photocopying a photocopy. Each generation loses fidelity. Eventually, you're left with a dark, unreadable square. The technical explanation involves three compounding sources of error. Statistical approximation errors from finite training samples. Functional approximation errors from the model's limited capacity. And, most critically, errors that accumulate when models are recursively trained on synthetic data. Each generation amplifies the distortions of the last.
The data wall
This problem is compounded by a more fundamental constraint: we're running out of fresh human data. Stanford's 2026 AI Index Report warned that available real data for AI training could be depleted within six years. Earlier research from Epoch AI predicted that high-quality text data would be exhausted before 2026 if training trends continued. The irony is almost poetic. The very success of AI in generating content is destroying the ecosystem it depends on. Every AI-written article that displaces a human-written one reduces the pool of genuinely original material available for future training. It's a kind of intellectual strip-mining, extracting value from human creativity without replenishing it. Forbes reported on the phenomenon with a blunt framing: AI models generate content, that content floods the internet, future models train on this synthetic output, and each generation produces lower-quality, more homogeneous results. The tails of distributions disappear. Diversity collapses.
The dead internet, slowly becoming real
The "dead internet theory," once dismissed as a fringe conspiracy, is starting to look prescient. The original claim, that most internet activity consists of bots and automated content, was speculative when it emerged around 2016. But a Stanford and Imperial College London collaboration recently studied how much web text is AI-generated using the Internet Archive's Wayback Machine, comparing pages from 2022 to 2025. The trajectory they found is alarming. We're not at a fully dead internet yet. The historical archive of human-created content is still vast. But the balance is shifting fast. New content creation is increasingly dominated by machines, and genuine human voices are becoming harder to find in the noise. Google has responded by improving its ability to detect and deprioritize AI-generated content, with its latest algorithm updates reportedly 40% better at identification. Audiences are noticing too. Content marketing platforms report that purely AI-generated articles have significantly lower engagement rates compared to human-written pieces. People can sense when something lacks the messiness and authenticity of a real human perspective.
Why this matters beyond AI
The implications reach far beyond the AI industry. If models can only improve by training on high-quality human data, and that data is being diluted or displaced, we face a ceiling on AI capability. More compute and bigger models won't solve a problem rooted in the quality of the input. But there's a cultural dimension too. The internet was built on the promise of human connection, of people sharing ideas, experiences, and knowledge. When the majority of published content is generated by systems optimizing for engagement metrics rather than expressing genuine thought, something fundamental changes about what the internet is for. Creative workers are already feeling the effects. A BBC report found that more than two-thirds of workers in the creative industries believe AI has undermined their job security. Half of novelists worry AI could replace them. Visual artists, illustrators, and graphic designers report that AI is being used to lower wages, degrade work quality, and in some cases replace human creators entirely. Fewer human creators means less original content. Less original content means less high-quality training data. Less high-quality training data means worse AI models. It's a vicious cycle.
What breaks the loop
Some researchers are exploring synthetic data anchored in human truth, using AI to scale and augment human-created datasets rather than replace them entirely. Others advocate for human-in-the-loop annotation as a safeguard against model collapse. The idea is that human judgment needs to remain in the pipeline, defining what "good" looks like, setting standards, and catching the drift that purely automated systems introduce. There's also growing recognition that the value of human-created content is about to increase dramatically. If AI content is becoming a commodity, cheap and everywhere, then authentic human perspective becomes scarce and therefore more valuable. The economics may eventually self-correct, as audiences, search engines, and platforms learn to reward originality over volume. But self-correction takes time, and the damage in the interim could be significant. The models being trained right now, on today's increasingly synthetic internet, will shape AI capabilities for years to come.
The uncomfortable question
We're heading toward a future where AI systems are primarily learning from each other rather than from us. The photocopies are getting blurrier with each generation, and we're making fewer originals. The uncomfortable truth is that AI's greatest vulnerability isn't a technical problem. It's a human one. The systems only work as long as humans keep creating, keep thinking, keep writing things that are messy, surprising, and real. The moment we stop, the machines start talking to themselves. And as the research makes clear, that conversation degrades quickly. Nobody is creating content anymore might be an exaggeration today. But the trend line is pointing in a direction that should make everyone, especially the companies building these systems, deeply uncomfortable.
References
- Shumailov, I. et al. "AI models collapse when trained on recursively generated data." Nature 631, 755-759 (2024)
- Graphite. "More Articles Are Now Created by AI Than Humans." graphite.io
- Ahrefs. "AI Content Prevalence: 74.2% of new web pages contain AI content." ahrefs.com
- Stanford University. "2026 AI Index Report." hai.stanford.edu
- Forbes. "AI May Be Running Out Of Data, Stanford Report Warns." forbes.com
- Forbes. "Nobody Is Talking About Synthetic Data In AI." forbes.com
- Europol. "Facing Reality: Law Enforcement and the Challenge of Deepfakes." europol.europa.eu
- Fast Company. "Is the 'dead internet' theory coming true? New Stanford research calculates exactly how far we are." fastcompany.com
- BBC. "We're creatives, this is what AI has done to our jobs." bbc.com
- Invisible Tech. "AI training in 2026: anchoring synthetic data in human truth." invisibletech.ai