Google doesn't need Nvidia
For over a decade, Nvidia has owned the AI hardware story. Their GPUs power the majority of model training worldwide, and their CUDA ecosystem has created deep lock-in across research labs, startups, and hyperscalers alike. With roughly 80-90% of the AI accelerator market by revenue, their dominance has felt inevitable. But Google just made a move that suggests the future looks very different. With the launch of Ironwood, their seventh-generation TPU designed specifically for inference, Google isn't just building another chip. They're executing a vertical integration strategy that could fundamentally reshape who controls the economics of AI.
The long game that started in 2015
Google's custom silicon ambitions aren't new. The TPU story begins around 2013, when Jeff Dean, Jonathan Ross (now CEO of Groq), and the Google Brain team ran a projection that alarmed them. They calculated that if every Android user used Google's voice search for just three minutes a day, the company would need to double its entire global data center capacity to handle the compute load. Rather than buying more of someone else's hardware, Google chose to build its own. The first TPU shipped internally in 2015. By 2018, Cloud TPUs became available to external customers. Seven generations later, Ironwood represents the culmination of that decade-long bet. Each generation has pushed further. TPU v2 and v3 added training capabilities. TPU v4 and v5 scaled to massive pod-level systems. And now Ironwood marks a deliberate pivot: it's the first TPU generation designed with inference as the primary workload, not an afterthought.
Why inference is the real battleground
The AI industry's center of gravity is shifting. For years, the conversation revolved around training: who could build the biggest model, who had the most GPU clusters, who could burn the most compute on pre-training runs. That era isn't over, but it's no longer the whole story. Inference, the process of actually running trained models to generate responses, is rapidly becoming the dominant cost. Industry estimates suggest inference accounts for 80-90% of the lifetime cost of a production AI system, because training happens occasionally while inference runs continuously. Deloitte estimates that inference workloads accounted for half of all AI compute in 2025, jumping to roughly two-thirds in 2026. By 2030, the inference market alone could reach $255 billion. This is the shift Google is positioning for. The question is no longer just "who can train the biggest model" but "who can serve it cheapest at scale." And that's a very different optimization problem, one where purpose-built silicon has a massive advantage over general-purpose GPUs.
What makes Ironwood different
Ironwood isn't a minor iteration. Google claims a 10x peak performance improvement over TPU v5p, and more than 4x better performance per chip for both training and inference compared to the previous generation TPU v6e. A single Ironwood superpod scales to 9,216 chips, delivering 42.5 exaflops of compute, more than 24x the compute power of the world's largest supercomputer. The architecture reflects lessons from a decade of internal deployment. Enhanced SparseCore technology targets the sparse computations common in recommendation systems and large language models. Increased HBM capacity and bandwidth address the memory bottleneck that limits how large a model you can serve efficiently. And improved inter-chip interconnect networking, using optical circuit switching, allows the system to dynamically reshape itself into optimized "slices" for different workload types. The cost story is equally compelling. TPU v6e already offered up to 4x better performance per dollar compared to Nvidia H100 for large language model workloads. Google Cloud committed-use discounts push pricing as low as $0.39 per chip-hour. Ironwood extends this advantage further.
Vertical integration is the real moat
The chip itself is only part of the story. What makes Google's position genuinely threatening to Nvidia is the full stack. Consider what Google owns: the data (Search, YouTube, Gmail, Maps), the models (Gemini), the chips (TPU), the cloud platform (Google Cloud), and the distribution (Android, Chrome, Workspace). Name another company that controls all five layers. This vertical integration creates a flywheel that's nearly impossible to replicate. Better models need more compute. Google's TPUs deliver that compute more cheaply. Cheaper compute enables better model performance. Better models attract more customers. More customers generate more data. More data improves the models. The cycle reinforces itself at every layer. Critically, the same TPU fleet powers everything: Search, YouTube, Gemini training and inference, Ads, Workspace, Maps, and external Cloud customers like Anthropic and Midjourney. This means Google amortizes its chip investment across multiple massive revenue lines, dramatically reducing effective cost per workload. AWS, Azure, and standalone AI labs can't match this kind of utilization efficiency. Google also benefits from a single optimized software stack: one compiler (XLA/JAX), one interconnect fabric, one model-serving system, one orchestration layer. When you control the hardware, the software, and the workloads, you can co-design across the entire stack in ways that a general-purpose chip vendor simply cannot.
Everyone is going custom
Google isn't alone in this move. The pattern is unmistakable: every major hyperscaler is investing in custom silicon. Amazon has Trainium, now in its third generation, purpose-built for AI training and inference on AWS. Microsoft just launched Maia 200, an inference accelerator built on TSMC's 3nm process, claiming three times the FP4 performance of Amazon's third-generation Trainium and FP8 performance exceeding Google's seventh-generation TPU. Meta has been developing MTIA (Meta Training and Inference Accelerator) for its internal AI workloads. Even OpenAI has locked in a multiyear design partnership with Broadcom to develop custom accelerators. The common thread is margin pressure. Nvidia's GPUs are expensive, and when inference is your dominant cost, even small efficiency gains compound into billions. TrendForce projects that ASICs' share in AI servers will jump from 20.9% in 2025 to 27.8% in 2026, while GPU share shrinks from 75.9% to 69.7%. Broadcom, which has co-designed all seven generations of Google's TPU, is targeting $100 billion in AI chip revenue by the end of 2027. They're betting that the custom silicon trend isn't a blip but a structural shift.
Nvidia's position isn't as safe as it looks
None of this means Nvidia is in trouble tomorrow. Their current position is formidable. They still dominate training workloads, where their GPU architecture and CUDA ecosystem create genuine switching costs. Virtually every major AI breakthrough of the past decade was first developed on Nvidia hardware. Their Blackwell generation is selling faster than they can produce it, with a nearly $500 billion backlog. But history offers a cautionary parallel. Intel's dominance of CPUs looked equally permanent. They had the manufacturing lead, the software ecosystem (x86), the enterprise relationships, and the margins. Then the world shifted to mobile, where ARM-based chips offered better performance per watt, and Intel's moat evaporated almost overnight. They never recovered their position. Nvidia faces a similar structural risk. If inference becomes the dominant workload (which it is), and if purpose-built chips serve inference more efficiently than general-purpose GPUs (which they do), then the market could gradually move away from Nvidia's sweet spot. Not because their chips are bad, but because the economics favor specialization. The CUDA ecosystem is Nvidia's strongest defense. Researchers and engineers have built a decade of tooling, libraries, and muscle memory around CUDA. But Google is actively eroding this moat with initiatives like TorchTPU, which makes TPUs fully compatible with PyTorch, reducing the friction of switching. Meta's reported talks to use Google's TPUs for its AI models, a deal potentially worth billions, signals that even the largest GPU customers are looking for alternatives.
What this means for the broader ecosystem
If the hyperscalers succeed in building their own silicon, the downstream effects are significant. For AI startups, the news is mostly good. As inference costs drop, the barrier to building AI-powered products falls with it. More efficient hardware means cheaper API pricing, which means more viable AI applications at smaller scales. The companies building on top of AI models, rather than building the models themselves, stand to benefit enormously. For chip companies in the middle, the picture is harder. Companies that serve as intermediaries between Nvidia and end customers face a squeeze from both directions: hyperscalers building their own chips from above, and falling inference costs from below. For Nvidia specifically, the strategic response likely involves pushing further into networking, systems integration, and software platforms, areas where their ecosystem advantage is hardest to replicate. Their acquisition of Mellanox and investments in full-stack data center solutions suggest they see this coming.
The bigger picture
Google's Ironwood launch isn't just a chip announcement. It's the clearest signal yet that the AI hardware market is entering a new phase, one where vertical integration matters more than raw performance, where inference economics trump training benchmarks, and where owning the full stack from silicon to software to services creates compounding advantages that no single-layer competitor can match. Nvidia built the foundation that made the AI revolution possible. But the companies that will capture the most value from AI going forward may not be the ones selling picks during the gold rush. They may be the ones that own the entire mine.
References
- Ironwood: The first Google TPU for the age of inference (Google Blog)
- TPU transformation: A look back at 10 years of our AI-specialized chips (Google Cloud Blog)
- Ironwood TPUs and new Axion-based VMs for your AI workloads (Google Cloud Blog)
- More compute for AI, not less (Deloitte Insights)
- Nvidia AI GPU Market Share 2026 (Silicon Analysts)
- Maia 200: The AI accelerator built for inference (Microsoft Blog)
- CES 2026: AI compute sees a shift from training to inference (Computerworld)
You might also enjoy