How Google’s TPUs are reshaping the economics of large-scale AI
For more than a decade, Nvidia’s GPUs have underpinned nearly every major advance in modern AI. That position is now being challenged. Frontier models such as Google’s Gemini 3 and Anthropic’s Claude 4.5 Opus were trained not on Nvidia hardware, but on Google’s latest Tensor Processing Units, the Ironwood-based TPUv7. This signals that a viable alternative to the GPU-centric AI stack has already arrived — one with real implications for the economics and architecture of frontier-scale training.Nvidia's CUDA (Compute Unified Device Architecture), the platform that provides access to the GPU's massive parallel architecture, and its surrounding tools have created what many have dubbed the "CUDA moat"; once a team has built pipelines on CUDA, switching to another platform is prohibitively expensive because of the dependencies on Nvidia’s software stack. This, combined with Nvidia's first-mover advantage, helped the company achieve a staggering 75% gross margin.Unlike GPUs, TPUs were designe
For more than a decade, Nvidia’s GPUs have underpinned nearly every major advance in modern AI. That position is now being challenged.
Frontier models such as Google’s Gemini 3 and Anthropic’s Claude 4.5 Opus were trained not on Nvidia hardware, but on Google’s latest Tensor Processing Units, the Ironwood-based TPUv7. This signals that a viable alternative to the GPU-centric AI stack has already arrived — one with real implications for the economics and architecture of frontier-scale training.
Nvidia's CUDA (Compute Unified Device Architecture), the platform that provides access to the GPU's massive parallel architecture, and its surrounding tools have created what many have dubbed the "CUDA moat"; once a team has built pipelines on CUDA, switching to another platform is prohibitively expensive because of the dependencies on Nvidia’s software stack. This, combined with Nvidia's first-mover advantage, helped the company achieve a staggering 75% gross margin.
Unlike GPUs, TPUs were designed from day one as purpose-built silicon for machine learning. With each generation, Google has pushed further into large-scale AI acceleration, but now, as the hardware behind two of the most capable AI models ever trained, TPUv7 signals a broader strategy to challenge Nvidia’s dominance.
GPUs and TPUs both accelerate machine learning, but they reflect different design philosophies: GPUs are general-purpose parallel processors, while TPUs are purpose-built systems optimized almost exclusively for large-scale matrix multiplication. With TPUv7, Google has pushed that specialization further by tightly integrating high-speed interconnects directly into the chip, allowing TPU pods to scale like a single supercomputer and reducing the cost and latency penalties that typically come with GPU-based clusters.
TPUs are "designed as a complete 'system' rather than just a chip," Val Bercovici, Chief AI Officer at WEKA, told VentureBeat.
Google's commercial pivot from internal to industry-wide
Historically, Google restricted access to TPUs solely through cloud rentals on the Google Cloud Platform. In recent months, Google has started offering the hardware directly to external customers, effectively unbundling the chip from the cloud service. Customers can choose between treating compute as an operating expense by renting via cloud, or a capital expenditure (purchasing hardware outright), removing a major friction point for large AI labs that prefer to own their own hardware and effectively bypassing the "cloud rent" premium for the base hardware.
The centerpiece of Google's shift in strategy is a landmark deal with Anthropic, where the Claude 4.5 Opus creator will receive access to up to 1 million TPUv7 chips — more than a gigawatt of compute capacity. Through Broadcom, Google's longtime physical design partner, approximately 400,000 chips are being sold directly to Anthropic. The remaining 600,000 chips are leased through traditional Google Cloud contracts. Anthropic's commitment adds billions of dollars to Google's bottom line and locks one of OpenAI's key competitors into Google's ecosystem.
Eroding the "CUDA moat"
For years, Nvidia’s GPUs have been the clear market leader in AI infrastructure. In addition to its powerful hardware, Nvidia's CUDA ecosystem features a vast library of optimized kernels and frameworks. Combined with broad developer familiarity and a huge installed base, enterprises gradually became locked into the "CUDA moat," a structural barrier that made it impractically expensive to abandon a GPU-based infrastructure.
One of the key blockers preventing wider TPU adoption has been ecosystem friction. In the past, TPUs worked best with JAX, Google's own numerical computing library designed for AI/ML research. However, mainstream AI development relies primarily on PyTorch, an open-source ML framework that can be tuned for CUDA.
Google is now directly addressing the gap. TPUv7 supports native PyTorch integration, including eager execution, full support for distributed APIs, torch.compile, and custom TPU kernel support under PyTorch’s toolchain. The goal is for PyTorch to run as easily on TPUs as it does on Nvidia GPUs.
Google is also contributing heavily to vLLM and SGLang, two popular open-source inference frameworks. By optimizing these widely-used tools for TPU, Google ensures that developers are able to switch hardware without rewriting their entire codebase.
Advantages and disadvantages of TPUs versus GPUs
For enterprises comparing TPUs and GPUs for large-scale ML workloads, the benefits center primarily on cost, performance, and scalability. SemiAnalysis recently published a deep dive weighing the advantages and disadvantages of the two technologies, measuring cost efficiency, as well as technical performance.
Thanks to its specialized architecture and greater energy efficiency, TPUv7 offers significantly better throughput-per-dollar for large-scale training and high-volume inference. This allows enterprises to reduce operational costs related to power, cooling, and data center resources. SemiAnalysis estimates that, for Google's internal systems, the total cost of ownership (TCO) for an Ironwood-based server is approximately 44% lower than the TCO for an equivalent Nvidia GB200 Blackwell server. Even after factoring in the profit margins for both Google and Broadcom, external customers like Anthropic are seeing a ~30% reduction in costs compared to Nvidia. "When cost is paramount, TPUs make sense for AI projects at massive scale. With TPUs, hyperscalers and AI labs can achieve 30-50% TCO reductions, which could translate to billions in savings," Bercovici said.
This economic leverage is already reshaping the market. Just the existence of a viable alternative allowed OpenAI to negotiate a ~30% discount on its own Nvidia hardware. OpenAI is one of the largest purchasers for Nvidia GPUs, however, earlier this year, the company added Google TPUs via Google Cloud to support its growing compute requirements. Meta is also reportedly in advanced discussions to acquire Google TPUs for its data centers.
At this stage, it might seem like Ironwood is the ideal solution for enterprise architecture, but there are a number of trade-offs. While TPUs excel at specific deep learning workloads, they are far less flexible than GPUs, which can run a wide variety of algorithms, including non-AI tasks. If a new AI technique is invented tomorrow, a GPU will run it immediately. This makes GPUs more suitable for organizations that run a wide range of computational workloads beyond standard deep learning.
Migration from a GPU-centric environment can also be expensive and time-consuming, especially for teams with existing CUDA-based pipelines, custom GPU kernels, or that leverage frameworks not yet optimized for TPUs.
Bercovici recommends that companies "opt for GPUs when they need to move fast and time to market matters. GPUs leverage standard infrastructure and the largest developer ecosystem, handle dynamic and complex workloads that TPUs aren't optimized for, and deploy into existing on-premises standards-based data centers without requiring custom power and networking rebuilds."
Additionally, the ubiquity of GPUs means that there is more engineering talent available. TPUs demand a rare skillset. "Leveraging the power of TPUs requires an organization to have engineering depth, which means being able to recruit and retain the rare engineering talent that can write custom kernels and optimize compilers," Bercovici said.
In practice, Ironwood’s advantages can be realized mostly for enterprises with large, tensor-heavy workloads. Organizations requiring broader hardware flexibility, hybrid-cloud strategies, or HPC-style versatility may find GPUs the better fit. In many cases, a hybrid approach combining the two may offer the best balance of specialization and flexibility.
The future of AI architecture
The competition for AI hardware dominance is heating up, but it's far too early to predict a winner — or if there will even be a winner at all. With Nvidia and Google innovating at such a rapid pace and companies like Amazon joining the fray, the highest-performing AI systems of the future could be hybrid, integrating both TPUs and GPUs.
"Google Cloud is experiencing accelerating demand for both our custom TPUs and Nvidia GPUs,” a Google spokesperson told VentureBeat. “As a result, we are significantly expanding our Nvidia GPU offerings to meet substantial customer demand. The reality is that the majority of our Google Cloud customers use both GPUs and TPUs. With our wide selection of the latest Nvidia GPUs and seven generations of custom TPUs, we offer customers the flexibility of choice to optimize for their specific needs."
Share
What's Your Reaction?
Like
0
Dislike
0
Love
0
Funny
0
Angry
0
Sad
0
Wow
0
