Clean room development

On March 31, 2026, Anthropic accidentally shipped a source map file inside version 2.1.88 of its Claude Code npm package. The file pointed to an archive on Cloudflare R2 storage containing nearly 2,000 TypeScript files and over 500,000 lines of unobfuscated source code. Security researcher Chaofan Shou spotted the exposure and posted it publicly. Within hours, mirrors had spread across GitHub, accumulating over 41,500 forks before Anthropic could respond. What happened next was arguably more interesting than the leak itself. A Korean developer named Sigrid Jin woke up at 4 AM, sat down, and ported the core architecture to Python from scratch using an AI orchestration tool. The resulting project, claw-code, became the fastest GitHub repository in history to surpass 30,000 stars. And the legal theory behind it rests on a concept that has shaped the tech industry for over four decades: clean room development.

What clean room development actually is

Clean room development, sometimes called clean room design or the Chinese wall technique, is a method of recreating a piece of software without infringing on the copyrights of the original. The core idea is straightforward: you reverse-engineer the behavior and specifications of an existing system, then have a completely separate team implement it from scratch, with no access to the original source code. The process typically works in two stages. First, an analysis team examines the original system and documents what it does, its inputs, outputs, behaviors, and functional requirements. A lawyer reviews this specification to ensure no copyrighted expression has been carried over, only functional descriptions. Then a second team, one that has never seen the original code, builds a new implementation based solely on that specification. The legal foundation is a distinction in copyright law between ideas and expression. Copyright protects the specific way something is written, the particular arrangement of code, not the underlying functionality or behavior. Two programs can do the exact same thing, but if they were written independently, they are considered distinct creative works.

The precedent that built an industry

The most consequential use of clean room development happened in the early 1980s, and it fundamentally shaped the personal computer industry. When IBM released its original PC in 1981, most of the hardware used off-the-shelf components that anyone could buy. But the BIOS, the firmware that initialized the hardware and provided basic input/output services, was IBM's proprietary code. Any company wanting to build a compatible PC needed a BIOS that behaved identically to IBM's, but couldn't copy IBM's code without inviting a lawsuit. Compaq solved this problem with a clean room approach. They hired two teams: one to document every function and behavior of the IBM BIOS, and another, with no exposure to IBM's code, to write a new BIOS from that documentation. The result was a fully compatible BIOS that IBM couldn't claim as a copyright violation. Compaq's portable PC, released in 1983, was the first legally produced IBM-compatible clone. Phoenix Technologies took this even further. In July 1984, they announced a commercially licensable BIOS developed through a clean room process. They emphasized that the programmer who wrote it had never even worked with Intel microprocessors before, having previously been a TMS9900 programmer. This deliberate separation was the point: the cleaner the room, the stronger the legal defense. The floodgates opened. Clone manufacturers could license the Phoenix BIOS instead of risking their own reverse engineering efforts. PC prices dropped. Competition exploded. The affordable personal computer revolution that followed, the one that put a PC on every desk, was built on clean room development. Several companies that skipped the clean room process, including Corona Data Systems, Eagle Computer, and Handwell Corporation, were sued by IBM for copyright infringement and forced to rewrite their implementations. The legal line was clear: copy the behavior, not the code.

Claw-code and the AI twist

The claw-code project follows the same conceptual logic as the Compaq BIOS, but with a significant difference. Instead of human teams separated by a wall, the "clean room" was an AI coding tool. Sigrid Jin used oh-my-codex, a workflow layer built on top of OpenAI's Codex, to orchestrate the rewrite. The process captured the architectural patterns and structural design of Claude Code's agent harness, then reimplemented them in a completely different programming language, Python instead of TypeScript. The argument is that a language translation constitutes independent expression, even if the functional behavior is equivalent. The project's own documentation is careful with its framing. It describes itself as a "clean-room Python rewrite that captures the structural patterns" of Claude Code, rather than claiming to be a direct copy. The distinction matters legally. But this is where the traditional clean room model starts to strain. In the classic process, the implementation team has zero knowledge of the original code. They work from a specification only. In the claw-code case, the AI tool had access to the original TypeScript source and was asked to rewrite it in Python. Whether this constitutes a clean room in the legal sense is genuinely unclear. Anthropic responded aggressively, filing DMCA takedown requests that resulted in the removal of over 8,000 copies and adaptations from GitHub. The takedowns were aimed primarily at direct copies of the original TypeScript, but the legal status of claw-code as a derivative work remains an open question.

The legal gray zone

The key legal precedents for clean room development were established in an era when humans did all the work. The cases that matter, Sega v. Accolade, Atari v. Nintendo, and the IBM clone litigation, all involved human programmers writing code from specifications. The courts recognized reverse engineering as fair use when the process genuinely produced independent creative expression. AI rewrites complicate this framework in several ways. First, there's the question of whether AI-generated code is even copyrightable. Under current U.S. law, copyright requires human authorship. If Anthropic's own developers used AI to write Claude Code, as some have argued, then portions of that code may not be eligible for copyright protection in the first place. Casey Muratori, a well-known programmer, raised exactly this point publicly: "According to Anthropic itself, their devs do not write any code by hand. As far as I know, AI-generated code is not copyrightable under US law." Second, a traditional clean room works because the implementation team demonstrably never saw the original code. When an AI model does the rewriting, that separation is harder to prove. The model processes the original source and produces output based on it. Whether that output constitutes a "transformative" work or a derivative one is a question that copyright law hasn't answered yet. Third, the speed and ease of AI rewrites changes the practical calculus. When a clean room process required months of human effort and significant investment, it was a serious undertaking that naturally limited its use. When an AI can rewrite 500,000 lines of code overnight, the barrier drops to nearly zero. This raises policy questions about whether the legal framework designed for one reality still makes sense in another. The chardet incident earlier in 2026 foreshadowed these tensions. A maintainer used Claude to rewrite a popular open source Python library from scratch, then published the result under a different, more permissive license. The original LGPL-licensed code and the AI-rewritten MIT-licensed code did the same thing, but the question of whether the rewrite constituted a legitimate clean room effort or license laundering sparked a heated debate in the open source community.

What this means going forward

The Claude Code leak and its aftermath have surfaced a set of questions that the software industry will need to answer. For companies protecting proprietary code, the incident demonstrates that source code exposure now carries a new category of risk. A leak doesn't just mean someone can read your code, it means an AI can translate it to another language and redistribute it before your legal team finishes their morning coffee. Anthropic's 8,000 DMCA takedowns show the scale of the containment problem, and the fact that claw-code remains available suggests the limits of that approach. For developers building on reverse-engineered or AI-rewritten code, the legal uncertainty is real. Clean room development has a strong legal history when the process is followed rigorously. But AI-assisted rewrites blur the lines in ways that haven't been tested in court. Anyone building on projects like claw-code should factor that uncertainty into their decisions. For the legal system, the fundamental question is whether the idea-expression distinction that underpins clean room development still holds when AI can translate between expressions almost instantly. If the expression is what copyright protects, and AI can generate an infinite number of functionally equivalent expressions, then the practical value of that protection approaches zero. Clean room development was designed for a world where reimplementation was expensive and slow. It balanced the interests of original creators with the public benefit of interoperability and competition. In a world where AI makes reimplementation nearly free, that balance may need to be recalibrated. The IBM BIOS clean room took months of careful human work. Claw-code took one developer and an AI a few hours. The legal principle is the same. Everything else has changed.

References

"Clean-room design." Wikipedia. https://en.wikipedia.org/wiki/Clean-room_design

Olson, R. "The Claude Code leak in four charts: half a million lines, three accidents, forty tools." Dr. Randal S. Olson, April 2026. https://www.randalolson.com/2026/04/02/claude-code-leak-four-charts/

"Claude's code: Anthropic leaks source code for AI software engineering tool." The Guardian, April 2026. https://www.theguardian.com/technology/2026/apr/01/anthropic-claudes-code-leaks-ai

"Introduction." Claw Code Documentation. https://www.mintlify.com/instructkr/claw-code/introduction

"Diving into Claude Code's Source Code Leak." Engineer's Codex. https://read.engineerscodex.com/p/diving-into-claude-codes-source-code

Columbus, L. "In the wake of Claude Code's source code leak, 5 actions enterprise security leaders should take now." VentureBeat, April 2026. https://venturebeat.com/security/claude-code-512000-line-source-leak-attack-paths-audit-security-leaders

"What Is claw-code? The Claude Code Rewrite Explained." WaveSpeed AI Blog. https://wavespeed.ai/blog/posts/what-is-claw-code/

"AI can rewrite open source code, but can it rewrite the license, too?" Ars Technica. https://arstechnica.com/ai/2026/03/ai-can-rewrite-open-source-code-but-can-it-rewrite-the-license-too/

"How Clean Room Reverse Engineering Built the Modern Tech Industry." NTARI. https://www.ntari.org/post/how-clean-room-reverse-engineering-built-the-modern-tech-industry

"Claude Code Leak Exposes AI Supply Chain Threats." eSecurity Planet. https://www.esecurityplanet.com/artificial-intelligence/claude-code-leak-exposes-ai-supply-chain-threats/

"Tales from 80s Tech: How Compaq's Clone Computers Skirted IBM's IP and Gave Rise to EISA." All About Circuits. https://www.allaboutcircuits.com/news/how-compaqs-clone-computers-skirted-ibms-patents-and-gave-rise-to-eisa/