Entering the matrix

Somewhere between 2022 and now, the internet crossed a strange threshold. The majority of new text published online is no longer written by humans. It is written by AI, trained on human writing, which will soon be scraped to train the next generation of AI. The ouroboros is eating its own tail, and we are watching it happen in real time. The phrase "entering the matrix" used to be a sci-fi metaphor. Now it feels more like a status update. AI is generating content for AI to consume. The loop is closed. And if you squint, it starts to look like we have already stepped inside a reality that is being authored by machines.

The feedback loop nobody asked for

Large language models learn by ingesting enormous amounts of text from the internet. Until recently, that text was overwhelmingly human-written. Messy, contradictory, creative, deeply human. But as generative AI tools flooded the web with synthetic content, the composition of the internet shifted. By late 2025, studies estimated that over half of all new articles published online were AI-generated. That means the next wave of AI models is being trained on a web that is already majority-synthetic. Each generation of models learns from the output of the last, and the signal gets a little weaker each time. Researchers call this model collapse, a degenerative process where AI trained on AI-generated data progressively loses the diversity and accuracy of the original human-generated training set. A landmark 2024 paper in Nature by Shumailov et al. demonstrated that models trained recursively on their own output begin to lose information about rare events and edge cases first. Over successive generations, the output converges toward a narrow, flattened distribution that bears little resemblance to the original data. Think of it like photocopying a photocopy. Each pass loses a bit of detail. Eventually, you are staring at a smudge.

AI slop and the pollution of the commons

The practical consequence of this feedback loop has a name: AI slop. It refers to the flood of low-quality, AI-generated content that now saturates search results, social media feeds, and content farms. Written not for readers, but for algorithms. Optimized not for insight, but for clicks. The problem is not just that this content is bland. It is that it actively degrades the information environment that all of us, humans and machines alike, depend on. MIT Technology Review described the dynamic bluntly: AI models regurgitate falsehoods as fact, their output gets scraped, and the next generation of models treats that output as ground truth. The noise compounds. The signal fades. For anyone trying to do real research, learn something new, or just find a trustworthy answer to a simple question, the internet is becoming harder to navigate. The commons is being polluted, and the polluters are running on autopilot.

We are already inside

Here is where the "matrix" metaphor gets uncomfortably literal. If you interact with AI assistants, read AI-summarized articles, consume AI-curated feeds, and make decisions based on AI-generated analysis, then a meaningful portion of your reality is already being mediated by machines. You are not just using AI. You are living inside an information environment that AI is actively shaping. And it is not a one-way street. Your interactions with AI, the queries you type, the content you create in response, the feedback you provide, all of that feeds back into the system. You are both consumer and training data. The boundary between the human and the synthetic is not a wall. It is a gradient, and it is getting blurrier. Writer Shreya Shankar captured this tension well: "If we're asking AI to analyze our data, the AI decides what patterns are worth surfacing, what anomalies matter, what questions are worth asking. We see the world through a filter we didn't choose and can't fully inspect." That is the matrix. Not a simulation run by sentient machines, but a subtler enclosure, an information environment where the things you see, the answers you get, and the ideas you encounter have been pre-processed by systems that are, themselves, increasingly trained on their own output.

What is at stake

The risks are not hypothetical. They cascade across several domains:

Knowledge quality. If AI models lose the ability to represent rare but important information, entire fields of knowledge become harder to access. Medical research, legal precedent, scientific edge cases, all of these live in the tails of the distribution that model collapse erodes first.

Cultural diversity. Human expression is messy, local, idiosyncratic. AI-generated content trends toward the generic. As synthetic text dominates the web, the long tail of human culture risks being flattened into a smooth, featureless average.

Epistemic autonomy. When your information diet is curated and summarized by AI, you lose some ability to form independent judgments. You outsource not just the labor of reading, but the act of deciding what matters.

Trust. The more AI-generated content floods the web, the harder it becomes to distinguish real from synthetic. Trust in online information erodes, and with it, the shared epistemic foundation that democratic societies depend on.

So what do we do

There is no clean fix. But there are directions worth moving in. Preserve human-generated data. Researchers at Harvard have argued for protecting pre-AI datasets, comparing clean human-written data to "low-background steel," a rare resource made before nuclear testing that remains essential for sensitive scientific instruments. The analogy is apt. Human-written text from before the AI flood may become one of the most valuable datasets in existence. Invest in human-in-the-loop systems. Synthetic data is not inherently bad. When used strategically and reviewed by humans, it can augment training sets without triggering collapse. The key is keeping humans involved in the loop, not as an afterthought, but as a structural safeguard. Build better filters. Detection tools for AI-generated content are improving, though they remain imperfect. The goal is not to ban synthetic content, but to make it identifiable, so that both humans and models can make informed decisions about what to trust. Stay curious. On a personal level, the best defense against living inside the matrix is to maintain your own taste, your own judgment, your own willingness to seek out primary sources. Read things that surprise you. Seek out voices that are clearly, unmistakably human. Do not let the algorithm decide what is interesting.

The matrix is not a metaphor

We tend to think of "the matrix" as a dramatic, all-or-nothing scenario. A simulation you are either inside or outside of. But the real version is more gradual. It is an information environment that is slowly being reshaped by systems that feed on their own output, that flatten diversity, that optimize for engagement over truth. We are not plugged into pods. But we are swimming in a sea of synthetic content, making decisions based on AI-curated information, and contributing data that trains the next generation of models. The loop is closed. The question is not whether we are in the matrix. The question is whether we can keep enough of reality in view to notice.

References

Shumailov, I., et al. (2024). "AI models collapse when trained on recursively generated data." Nature. Link

"How AI-generated text is poisoning the internet." MIT Technology Review (2022). Link

"Over 50 Percent of the Internet Is Now AI Slop, New Data Finds." Futurism (2025). Link

Shankar, S. "On the Consumption of AI-Generated Content at Scale." Link

"AI Slop I: Pollution in Our Communication Environment." Khazanah Research Institute. Link

"AI Slop III: Society and Model Collapse." Khazanah Research Institute. Link

Burden, J., et al. "Model Collapse and the Right to Uncontaminated Human-Generated Data." Harvard Journal of Law & Technology. Link

"What Is Model Collapse?" IBM. Link

"ChatGPT and generative AI have polluted the internet, and may have broken themselves." Digital Watch Observatory. Link