Inside Nvidia’s $20 Billion Gambit to Save Its Margins
For the past three years, Nvdia (NASDAQ:NVDA) has operated as the undisputed king of the AI era, fueled by a voracious global appetite for large-scale model training. However, the company’s recent strategic maneuver – a massive $20 billion licensing deal with startup Groq and the hiring of its top leadership – signals a pivotal shift in the Silicon Valley landscape. As the industry moves from the “build” phase to the “deploy” phase, Nvidia is beginning to hedge its bets against the looming “inference threat.” So what does it mean for the company?

Image by Michael Schwarzenberger from Pixabay
Asset allocation is a smarter path than stock picking. The asset allocation strategies of Trefis’ Boston-based, wealth management partner yielded positive returns during the 2008-09 period when the S&P lost more than 40%. And now, Trefis High Quality Portfolio is part of it.
From Model Creation to Model Deployment
To date, Nvidia’s dominance has been built on the back of training: the computationally intensive process of teaching an AI model using massive datasets. This requires the raw horsepower of high-end GPUs like the H100 and Blackwell. However, the long-term economic value of AI lies in inference—the process of a trained model actually answering user queries. One way to think about this, is training is the CapEx phase of AI and has been the primary driver of Nvidia’s triple-digit growth so far, while inference is the OpEx phase and is projected to become the vastly larger and more sustainable market.
Nvidia’s Inference Hedge: The Groq Licensing Deal
Nvidia’s recent strategic move – a massive $20 billion licensing deal with startup Groq and the hiring of its top leadership – signals a pivotal shift in the Silicon Valley landscape. We see the deal is an admission of sorts from Nvidia that the workloads of the future will be inference-based. In other words, Nvidia is positioning itself for a market where cost-per-token and latency matter more than raw training throughput.
Groq’s specialized Language Processing Units (LPUs) use SRAM (Static Random-Access Memory) to deliver lightning-fast responses, bypassing the memory bottlenecks that often slow down traditional GPUs. By absorbing Groq’s talent, including founder and former Google chip architect Jonathan Ross, Nvidia is effectively defending itself against specialized architectures that threaten to do inference better, faster, and cheaper than a general-purpose GPU.
The Groq deal follows a “quasi-acquisition” pattern recently seen with Microsoft and Inflection AI, or Meta and Scale AI. By licensing the tech and hiring the C-suite without a formal merger, Nvidia gains the intellectual property and the brains behind the fastest inference engine in the world while attempting to sidestep antitrust regulators. By leveraging Groq’s SRAM-based approach, Nvidia is potentially preparing for a world where AI models are “smaller” and “faster” rather than just “bigger.”
The Margin Question: Can Nvidia Sustain a 50% Net Margin?
Nvidia currently enjoys a staggering net margin of over 50%, a figure virtually unheard of in the hardware space. This profitability is driven by the scarcity and “must-have” nature of its training chips. However, the shift toward inference poses a direct threat to these historic margins.
The economics of inference are fundamentally different from training. In training, performance is everything, and big tech companies are willing to pay a “scarcity premium.” In inference, the primary metrics are cost-per-token and energy efficiency. Because inference workloads are less computationally “heavy” than training, they can often be run on less expensive, specialized silicon or even older-generation GPUs, which have much lower price points and margins. As inference gathers pace, Nvidia faces a “margin squeeze” from two sides:
Startups like Groq (which Nvidia just spent $20 billion to neutralize or effectively absorb) and Cerebras design chips that are 10x more efficient for running models than general-purpose GPUs. If Nvidia has to compete on “cost-per-token” rather than “scarcity,” they lose their pricing power. Companies like Google, Meta, Amazon, and Microsoft are increasingly deploying in-house accelerators optimized for inference workloads, where cost per token, power efficiency, and utilization matter more than peak performance. These custom chips are “good enough” for large-scale inference and directly reduce dependence on Nvidia’s high-margin GPUs.
Commoditization Risk and Product Mix Shift
If inference becomes a commodity service, the premium Nvidia charges for its ecosystem (CUDA) may lose its luster. Maintaining a 50% net margin is significantly harder when the market is optimized for cost-efficiency rather than raw power. Buyers increasingly compare chips on cost-per-token and energy efficiency, pushing demand toward cheaper accelerators or internal silicon. While Nvidia could see volumes of overall chips grow given the potentially massive size of their AI inference market, the product mix could be skewed away from high-margin training GPUs toward lower-margin inference hardware and deployments.
The Trefis High Quality (HQ) Portfolio, with a collection of 30 stocks, has a track record of comfortably outperforming its benchmark that includes all three – the S&P 500, S&P mid-cap, and Russell 2000 indices. Why is that? As a group, HQ Portfolio stocks provided better returns with less risk versus the benchmark index; less of a roller-coaster ride, as evident in HQ Portfolio performance metrics.
Invest with Trefis Market-Beating Portfolios
See all Trefis Price Estimates