Google just killed Micron
Google published a compression algorithm called TurboQuant on March 24, and within 48 hours, Micron had lost over 20% of its market value. SK Hynix fell 6%. Samsung dropped nearly 5%. Kioxia lost 6%. The memory chip sector, which had been riding the AI wave to all-time highs, suddenly looked vulnerable to a single research paper. The premise is simple and, on the surface, terrifying for anyone long on memory stocks: TurboQuant can reduce the memory required to run large language models by six times, with zero accuracy loss. If AI needs dramatically less memory, the entire demand thesis for high-bandwidth memory chips reprices overnight. But the real story is more interesting than a stock selloff. It's about what happens when software optimization catches up to hardware expansion, and why the obvious conclusion is almost certainly wrong.
What TurboQuant actually does
TurboQuant is a vector quantization algorithm developed by Google Research, authored by Amir Zandieh and Vahab Mirrokni. It compresses the key-value (KV) cache, the high-speed memory store that allows an AI model to retrieve past calculations without reprocessing them, from the standard 16 bits per value down to just 3 bits. That is a 6x reduction in memory footprint with, according to Google's benchmarks, zero accuracy loss. The technical mechanism works in two stages. First, PolarQuant randomly rotates data vectors and converts them from standard Cartesian coordinates into polar coordinates. Instead of storing X, Y, Z positions, it stores a radius and angles. After rotation, the distribution of angles becomes highly predictable and concentrated, which means the system no longer needs to store expensive normalization constants for every block of data. It maps data onto a fixed circular grid where the boundaries are already known. Second, Quantized Johnson-Lindenstrauss (QJL) handles the residual error from the first stage using just 1 bit per value. It acts as a mathematical error-checker that eliminates bias in attention scores. The combination means TurboQuant achieves massive compression without the memory overhead that traditionally defeats the purpose of quantization. Google tested TurboQuant across standard long-context benchmarks including LongBench, Needle In A Haystack, ZeroSCROLLS, and RULER using open-source models like Gemma and Mistral. On H100 GPUs, 4-bit TurboQuant achieved up to 8x performance increase in computing attention logits compared to unquantized keys. The paper will be formally presented at ICLR 2026 in late April. This is real. It works. And it's not the whole story.
The market panic was immediate
Micron hit an all-time high of $471 on March 18. Six trading sessions later, it was down over 20%, its worst multi-session performance since the tariff shock selloff in April 2025. SanDisk dropped over 3.5% on four consecutive sessions. The Korean memory giants, SK Hynix and Samsung, followed with sharp declines in Seoul trading. The logic driving the selloff is straightforward. Both stocks entered 2026 pricing in the assumption that AI memory demand would scale linearly with model size and context length. TurboQuant complicates that assumption. If inference workloads can run with 6x less memory, the demand curve for HBM chips shifts, potentially dramatically. But there's an important detail the market appeared to overlook in the initial panic: this is still a laboratory result. TurboQuant has not been deployed at production scale across any major AI infrastructure stack. An open-source release is expected in Q2 2026. The gap between a research paper and production adoption across the industry is measured in quarters, sometimes years.
The Jevons paradox argument
Morgan Stanley's semiconductor analyst Shawn Kim called the stock reaction excessive and made the case that TurboQuant could ultimately benefit memory makers over the longer term. The argument hinges on Jevons paradox: when a resource becomes more efficient to use, total consumption tends to increase rather than decrease. The historical precedent is compelling. JPEG compression did not reduce camera storage demand. It enabled digital photography to explode, which drove storage demand far higher than anyone anticipated. Video codecs did not reduce hard drive purchases. They enabled 4K streaming, which drove demand through the roof. When DeepSeek demonstrated efficient training in January 2025, triggering a similar selloff in NVIDIA and memory stocks, AI capital expenditure commitments from hyperscalers hit record highs within two quarters. The selloff proved to be an entry point, not a cycle turning point. The mechanism is intuitive. Lower inference costs reduce the per-token cost of running AI services. Cheaper tokens mean more applications become economically viable. More applications mean more inference workloads. More inference workloads mean more memory demand, even if each individual workload uses less memory than before. BofA Securities analyst Vivek Arya made the most direct case against the demand destruction thesis. Similar compression techniques have been in circulation since 2024. NVIDIA alone has published four distinct KV cache efficiency methods over the past twelve months without altering hardware procurement at scale. The more telling evidence, Arya argued, sits in Google's own spending plans. Despite publishing TurboQuant, Google raised its calendar year 2026 capital expenditure outlook to approximately $180 billion, up 100% year over year, well above the prior consensus of roughly $127 billion. "The 6x improvement in memory efficiency," Arya wrote, "is likely to produce a 6x increase in accuracy and/or context length, rather than 6x decrease in memory."
Hardware moats are thinner than investors thought
The Jevons paradox argument is probably correct over a multi-year horizon. But it misses the more immediate and more interesting signal: the market is not just pricing in near-term adoption. It is pricing in the existence of a credible software pathway to lower memory intensity. That is a different and harder assumption to dismiss. This is the pattern that keeps repeating in AI infrastructure. Hardware advantages that looked structural turn out to be temporary. NVIDIA's dominance was supposed to be unshakeable until DeepSeek showed that frontier-level training could happen at a fraction of the expected cost. Memory demand was supposed to scale linearly with model size until Google showed that software compression could break that relationship. The lesson is not that hardware does not matter. It clearly does. The lesson is that hardware moats are thinner than the market prices them. Software optimization moves faster than fabrication plants can be built, and every efficiency gain in software changes the economics of hardware in ways that are difficult to predict from supply chain models alone. Google wins twice in this scenario. It reduces its own infrastructure costs by running more inference on existing hardware, and it destabilizes the economics of competitors who were counting on memory scarcity to maintain pricing power. That is a strategic advantage that goes beyond the technical achievement.
The real story is software eating hardware margins
Wells Fargo analyst Andrew Rocha acknowledged the dynamic directly: "TurboQuant is directly attacking the cost curve here." Lower memory specifications per AI workload quickly raise the question of how much total capacity the industry actually needs. Rocha stopped short of a bearish conclusion, noting that the demand destruction scenario requires broad adoption that has not yet occurred. But the question is now on the table in a way it was not two weeks ago. Andrew Jackson of Ortus Advisors offered a counterpoint, noting that TurboQuant may make "little difference to demand given the extreme supply constraints" that currently characterize the AI memory market. When demand already outstrips supply by a wide margin, efficiency gains get absorbed into performance improvements rather than demand reduction. This is probably the right framing for the next 12 to 18 months. The AI memory market is supply-constrained. Micron plans capital expenditures exceeding $25 billion this year. Construction of fabrication plants in Idaho and New York continues. The company projects Q3 fiscal 2026 revenue of approximately $33.5 billion. These are not the numbers of a company facing demand destruction. But zoom out further and the picture shifts. Every major AI lab is investing in efficiency. Google has TurboQuant. NVIDIA publishes KV cache optimization papers quarterly. Open-source quantization methods like GPTQ, AWQ, and GGUF have been driving inference efficiency for years. The trend is clear: the software layer is getting better at doing more with less hardware, and that trend is accelerating. The parallel to the broader tech industry is instructive. Cloud computing did not reduce total spending on servers. It made servers more efficient, which made more workloads viable, which drove total demand higher. But it also compressed margins for hardware vendors and shifted value to the software and platform layers. Memory chip makers may face a similar dynamic: growing total demand but declining pricing power as software efficiency gives buyers more leverage.
Distribution and integration beat raw specs
The deeper point is one that applies across the AI stack. Raw hardware specifications are necessary but not sufficient for durable competitive advantage. What matters more is distribution, integration, and the ability to turn infrastructure into products that generate revenue. Google does not just publish research papers. It operates one of the largest inference fleets in the world, serving Gemini across Search, Gmail, Workspace, Cloud, and Android. TurboQuant is not an academic exercise. It is a tool for reducing the cost of running AI at Google's scale, which directly improves margins on existing products and enables new ones. Micron makes excellent memory chips. But Micron does not control how those chips are used, and it cannot prevent its customers from finding ways to use fewer of them per workload. The value in AI increasingly accrues to the companies that control the full stack, from silicon to software to distribution, rather than to component suppliers who sell into a market where efficiency improvements are a constant threat to unit economics. This is not a new pattern. It is the same dynamic that played out in CPUs (Intel vs. ARM), storage (spinning disks vs. SSDs vs. cloud), and networking (dedicated hardware vs. software-defined). The component maker builds the foundation. The platform company captures the value.
What comes next
The TurboQuant open-source release is expected in Q2 2026. ICLR 2026 in late April will provide the first major public technical review. The question is not whether the compression works. Google's benchmarks are clear, and the underlying math (PolarQuant, QJL) has strong theoretical grounding. The question is adoption speed. Enterprise AI infrastructure moves slowly. Production deployments require extensive testing, integration with existing frameworks, and validation across diverse workloads. The gap between a Google Research blog post and industry-wide adoption is measured in years, not weeks. In the meantime, Micron's fundamentals remain strong. Revenue is growing. AI-driven HBM demand is real and expanding. Supply constraints persist. The company is building capacity that will take years to come online. But the market is forward-looking, and what it saw in TurboQuant was not a single algorithm. It was a signal that software efficiency is catching up to hardware expansion faster than the consensus expected. That signal will not un-ring. Every future memory stock valuation will now carry an implicit discount for the possibility that software compression makes each chip go further than the current demand models assume. The stock price is not the story. The story is that the relationship between AI capability and hardware consumption is not linear, and every efficiency breakthrough widens the gap between what the models can do and how much silicon they need to do it. For memory makers, that is a more nuanced challenge than a simple demand curve. It is a structural shift in how value is distributed across the AI stack.
References
- Google Research, "TurboQuant: Redefining AI efficiency with extreme compression," March 24, 2026. Link
- CNBC, "A Google AI breakthrough is pressuring memory chip stocks from Samsung to Micron," March 26, 2026. Link
- Benzinga, "Micron Stock's Rally Looked Unstoppable, Until Google's TurboQuant Hit," March 27, 2026. Link
- Morningstar/MarketWatch, "Micron's stock is dropping. Is Google partly to blame?" March 25, 2026. Link
- Ars Technica, "Google's TurboQuant AI-compression algorithm can reduce LLM memory usage by 6x," March 25, 2026. Link
- VentureBeat, "Google's new TurboQuant algorithm speeds up AI memory 8x, cutting costs by 50% or more," 2026. Link
- TechCrunch, "Google unveils TurboQuant, a new AI memory compression algorithm," March 25, 2026. Link
- The Motley Fool, "Should You Buy the Dip on Micron?" March 26, 2026. Link
- Sherwood News, "Sandisk, Micron dive as Google Research unveils AI algorithm to reduce memory demands," 2026. Link
- Yahoo Finance, "Micron Reassesses AI Memory Outlook As TurboQuant And SK Hynix Reshape Sector," March 27, 2026. Link
- Digital Applied, "Google TurboQuant: 6x LLM Memory Compression Guide," 2026. Link
- Reuters, "How Big Tech's $630B AI splurge will fall short," March 26, 2026. Link
You might also enjoy