We can no longer review code

Something shifted in software engineering, and most of us felt it before we could name it. AI coding agents now write thousands of lines of code in minutes. Pull requests arrive fat with machine-generated diffs. And somewhere between the thrill of shipping faster and the quiet dread of the review queue, a new reality settled in: we can no longer meaningfully review the code we ship. This is not a complaint about AI quality. The code is often fine, sometimes even elegant. The problem is more fundamental. Code review was designed for a world where humans wrote code at human speed. That world is gone.

The asymmetry problem

Brian Kernighan observed decades ago that reading code is harder than writing it. That was true when humans wrote every line. It is dramatically more true now. AI agents can generate hundreds of lines while you are still thinking about a variable name. A developer using Claude Code, Cursor, or Copilot can produce in an afternoon what used to take a week. But review speed has not changed. Your brain still processes code at the same rate. You still need to hold context in working memory, trace control flow, and reason about edge cases. The result is a widening gap. According to Opsera's 2026 AI Coding Impact Benchmark Report, AI-generated pull requests wait 4.6 times longer in review than human-written ones. That number tells you everything. We are generating code at machine speed and reviewing it at human speed, and the queue is only growing.

The cognitive load trap

Reviewing AI-generated code is not just slower, it is cognitively harder. When you write code yourself, you build a mental model as you go. You know why you chose this data structure, why you handled that edge case, why the function lives in this file. The review is partly a formality because you already understand the decisions. AI-generated code arrives without that context. It compiles. It passes tests. It looks plausible. But you have no insight into the reasoning behind it, because there was no reasoning. The AI optimized for syntactic correctness, not for communicating intent. A security auditor who spent a week reading through an AI-heavy Node.js codebase described it well: every function did exactly what it was supposed to do, but nothing more. There was no shared understanding of why things were structured the way they were. Comments explained what the code did, not why. The auth middleware was written three different ways in three different places, all slightly different, all working. The code was correct but opaque. CodeRabbit's research puts it bluntly: reviewing AI-authored code is often more cognitively demanding than writing it from scratch. Their analysis found AI-generated PRs contained roughly 1.7 times more issues overall compared to human-written PRs, with more critical and major findings. The code looks right. The bugs hide in the gaps between looking right and being right.

The trust paradox

Stack Overflow's 2025 developer survey revealed a striking contradiction. Over 84% of developers were using or planning to use AI tools, but only 29% said they trusted AI output, down 11 percentage points from the previous year. More people are using AI. Fewer people trust it. Everyone ships it anyway. Cloudsmith's research found that only 67% of developers review AI-generated code before every deployment. A third of production code is going out potentially unvetted. This is not because developers are lazy. It is because the volume overwhelms the process. When you are staring down your fifth 2,000-line AI-generated PR of the day, the temptation to skim becomes overwhelming. The experienced developers on Reddit describe this tension vividly. One wrote: "You just traded writing for reviewing, and reviewing is arguably harder because you need to catch what the AI got subtly wrong." Management wants delivery velocity. Engineers know the debt is accumulating. The review process, the one thing standing between plausible code and correct code, is buckling under the weight.

Peter Steinberger said it out loud

Peter Steinberger, the creator of OpenClaw and founder of PSPDFKit, made headlines with a phrase that crystallized what many developers were privately thinking: "I ship code I don't read." In his conversation with Gergely Orosz on The Pragmatic Engineer, Steinberger described a workflow centered entirely on AI agents. He calls it "agentic engineering," distinguishing it from the more dismissive "vibe coding." He spends about 20% of his time on refactoring, all done by agents. He does not manually review every line. This is not carelessness. Steinberger is a veteran engineer with decades of experience. His point is that the old model, where a human reads every line before it ships, does not scale when agents produce code at the rate they do now. The question is not whether you review every line. The question is what replaces line-by-line review when line-by-line review becomes impossible. Boris Cherny, the creator and head of Claude Code at Anthropic, echoed a version of this from the toolmaker's side. In multiple interviews, he has described a future where traditional IDEs like VS Code and Xcode become obsolete, replaced by agent-driven workflows. If the person building the most prominent AI coding tool believes we are moving past the write-then-review loop, it is worth paying attention.

What 10,000-line PRs actually mean

The discourse around massive AI-generated PRs often focuses on the wrong thing. The problem is not that the code is bad. The problem is that a 10,000-line PR is fundamentally unreviewable by a single human in any meaningful way. Matt Watson, writing on LinkedIn, put it sharply: "10,000 lines of code you can't explain or debug is useless in software development." The best developers do not just write code. They understand the architecture decisions and tradeoffs they made. They know why they built it that way. When an AI generates the code, that "why" disappears. Francisco Trindade, writing about the enterprise reality in early 2026, described what many organizations are experiencing: software teams are drowning in PRs. In the past, writing 10,000 lines required writing them. Now it takes a few prompts. Maintaining quality is a challenge because there is simply too much code to review, authors often do not review their own AI-generated output, and the pressure naturally leads to mistakes. Thoughtbot's practical guide to reviewing AI-generated PRs acknowledges the core tension directly: "AI will always be able to write code faster than you can review it." Their advice is to be strategic about what you spend time commenting on and be prepared to let minor things go. That is good pragmatic advice. It is also an admission that comprehensive review is no longer possible.

The role is changing whether we like it or not

The industry consensus forming in 2026 is that the developer role is shifting from "writer of code" to "reviewer and orchestrator of AI-generated code." Multiple surveys and trend analyses point to this framing. Developers become supervisors. Debugging and code inspection skills become more valuable than syntax knowledge. Understanding patterns and anti-patterns matters more than typing speed. But this framing has a problem. If reviewing AI code is harder and more time-consuming than writing it was, and if the volume of code to review keeps growing, then simply relabeling the job does not solve anything. You cannot scale human attention the way you can scale token generation. The Opsera report found that AI-generated code introduces 15 to 18% more security vulnerabilities. Senior engineers realize nearly five times the productivity gains of junior engineers. The gap is not closing. It is widening. The developers who can actually evaluate AI output are the ones who have years of experience understanding why code should be written a certain way, the very understanding that AI does not encode.

Where this leaves us

We are in an awkward transition. The tools for generating code leaped ahead. The tools and practices for verifying code did not. The result is a growing trust deficit, more code in production that no human fully understands, and a review process that was designed for a different era. Some teams are responding by using AI to review AI-generated code, fighting fire with fire. CodeRabbit, Greptile, and others are building automated review tools that can catch patterns humans miss at scale. This helps, but it introduces its own questions about trust and verification. Other teams are rethinking the unit of review. Instead of reviewing diffs line by line, they review behavior: does the system do what it should? Does it handle the edge cases that matter? This shifts the focus from code comprehension to specification and testing, a different skill set entirely. The honest answer is that nobody has fully figured this out yet. We are generating code faster than we can understand it, and the old safeguards are not keeping up. The developers who will navigate this well are the ones who recognize the shift and adapt their workflows, not by reading every line (that ship has sailed), but by getting better at asking the right questions about what the code should do, and building systems that verify the answers. The era of comprehensive human code review is ending. What replaces it is still being written, probably by an AI agent, in a PR that nobody will fully read.

References

Opsera, "AI Coding Impact 2026 Benchmark Report" https://opsera.ai/resources/report/ai-coding-impact-2026-benchmark-report/

CodeRabbit, "It's harder to read code than to write it, especially when AI writes it" https://coderabbit.ai/blog/its-harder-to-read-code-than-to-write-it-especially-when-ai-writes-it

CodeRabbit, "AI vs human code gen report: AI code creates 1.7x more issues" https://coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report

Stack Overflow, "Mind the gap: Closing the AI trust gap for developers" https://stackoverflow.blog/2026/02/18/closing-the-developer-ai-trust-gap/

Cloudsmith, "2025 Artifact Management & AI Risks Report" https://cloudsmith.com/blog/ai-is-now-writing-code-at-scale-but-whos-checking-it

Gergely Orosz, "The creator of Clawd: I ship code I don't read," The Pragmatic Engineer https://newsletter.pragmaticengineer.com/p/the-creator-of-clawd-i-ship-code

Lenny Rachitsky, "Head of Claude Code: What happens after coding is solved," Lenny's Newsletter https://www.lennysnewsletter.com/p/head-of-claude-code-what-happens

Francisco Trindade, "Will Humans Still Review Code?" https://franciscomt.medium.com/will-humans-still-review-code-a6f7d3f0c39c

Thoughtbot, "How to review AI generated PRs" https://thoughtbot.com/blog/how-to-review-ai-generated-prs

ShiftMag, "AI Hasn't Made Developers Faster, It's Made Their Review Queues Longer" https://shiftmag.dev/ai-hasnt-made-developers-faster-its-made-their-review-queues-longer-8935/

Illya Yalovoy, "The Senior Engineer's Job in 2026 Is Code Review, Not Code Writing" https://medium.com/@yalovoy/the-senior-engineers-job-in-2026-is-code-review-not-code-writing-f804036c55ab