top of page

China Doesn't Need to Beat Nvidia Everywhere

  • Writer: Qu Yuan
    Qu Yuan
  • Mar 30
  • 5 min read

The AI race is no longer only about who builds the best model. It is about who can deploy useful ones without asking permission.


Washington still talks about AI power as though it were mainly a question of denying China the summit, of top-end compute, advanced manufacturing, high-bandwidth memory, and the cleanest path to frontier training. The U.S. lead on that terrain is substantial, but it does not constitute the whole strategic question.


A state does not need to dominate frontier training to make AI strategically useful, it just needs to embed machine judgment into administrative systems, industrial workflows, consumer platforms and enterprise software at a token cost low enough to sustain large-scale routine use. The relevant question is therefore whether China can build systems that are sufficiently usable, cheap and sovereign to matter. In other words, China may not need to match Nvidia across the full stack. It may only need to become good enough at inference, at scale, on hardware that Washington cannot meaningfully switch off.


Inference is what happens after the model has already been trained: it serves queries, ranks results, generates outputs, runs agents, and embeds capabilities inside products and institutions. The burden shifts from absolute performance toward the more bounded demands of latency, cost, throughput and repeatable execution. Training encourages a certain image of AI power: one ever-taller skyscraper, with strategic advantage concentrated at the summit. Deployment looks less like that than like a city with many bounded functions operating reliably together at scale. The question is not only who owns the highest floor but who can make the wider system run.


The market is already reorganizing around this. At its 2026 GTC event, Jensen Huang placed inference at the centre of Nvidia's next phase, splitting inference work across its own Vera Rubin systems and Groq-licensed technology. Google presented Ironwood as a TPU built for the age of inference, emphasizing serving workloads and energy efficiency over raw computational prestige. Alibaba's Token Hub is built around creating, delivering and applying tokens; Nvidia's own platform push is framed around lowering the cost of generating them. Tokens matter because they are how AI stops being a demonstration and starts being billed and budgeted.


Nvidia is simultaneously preparing an inference offering built around Groq technology that can be sold into the Chinese market, a sign that this part of the stack is becoming a site of political negotiation as well as technical competition. The Zhipu case shows what the transition looks like in practice: domestically available chips — Huawei Ascend, Moore Threads, Cambricon, Kunlunxin — already used to run models even while the hardest training workloads still depend on Nvidia. That dependence is a real constraint. The stronger claim is not that China has already decoupled at every layer, but that inference is where decoupling begins, and that this beginning is enough to alter the strategic calculus.


The strategic value of AI does not lie in a single sovereign model that does everything. It lies in enough domestic serving capacity to sustain routine deployment across firms, platforms, and state systems. Under those conditions, organized sufficiency matters more than isolated brilliance.


As long as access to foreign compute could be treated as a commercial input, Chinese firms faced a relatively straightforward calculation: use the best available tools and delay the cost of substitution. Once access came to be understood as conditional and revocable, the choice was between a superior stack that might be interrupted and an inferior one that could at least be counted on. Restrictions hurt China. They are unlikely to halt Chinese AI progress altogether because they strengthen the incentives for domestic alternatives.


Huawei matters because it offers a migration path. It consistently presents Ascend not as a standalone chip line but as part of a broader environment — CANN as the optimization and runtime layer, MindSpore as the framework, ModelArts as the deployment platform — whose significance is systemic before it is competitive. The point is not that Huawei has surpassed the incumbent, but that an ecosystem now exists in which exiting is at least possible. Moore Threads, Cambricon and Kunlunxin are not individually decisive. Their main effect is to widen the domestic option set. A multi-vendor inference layer is weaker than a fully mature frontier stack, but more politically resilient than one whose indispensable components all sit beyond national control.


The strongest argument against easy accounts of Chinese substitution remains the software layer. CUDA is not merely a programming interface, it is the dense inheritance of libraries, toolchains, optimization routines, debugging habits, and developer familiarity accumulated over two decades.


Yet the software difficulty is not uniform across the stack. Training stresses everything at once: compiler, scheduler, memory system, distributed orchestration, and the long tail of performance tuning across giant clusters. Immaturity anywhere becomes expensive very quickly. Inference is narrower. The task is not to build the model from scratch across giant clusters, but to serve an existing model reliably and cheaply enough for routine use. A weaker runtime can be survivable at the serving layer in ways that would be punishing at the training frontier. Domestic chips are already appearing there even while Nvidia remains the preferred option for the hardest training workloads. That asymmetry is the shape of the transition itself: substitution arriving first where the software problem is bounded, the performance threshold is lower, and the need for autonomy is highest.


It would be a mistake to conclude that the frontier no longer matters. China remains constrained at the highest end of the semiconductor industry, and the frontier is one of the main places where the next round of efficiency gains is won — gains that later diffuse downward through the stack. A country denied smooth access to top-end hardware is therefore placed at a disadvantage in absorbing the design improvements that follow. But the frontier does not settle the whole strategic question. The claim here is not that domestic inference makes China independent of it, only that it reduces the share of strategically important AI activity that frontier denial alone can govern.


Export controls still slow frontier training and raise the price of substitution at the cutting edge. What is less clear is whether they continue to decide the more basic question of deployment. Recent moves to permit renewed sales of Nvidia's H200 into China sharpen the point. What is now being managed is selective access to a lagging tier of hardware that remains strategically valuable even below the frontier. The regime looks less like a hard cut-off than calibrated throttling, and a control regime calibrated mainly to the top of the stack will lose coercive force as the layers below it are domesticated.


Yet states do not derive strength from owning the summit in the abstract. They derive it from turning computation into a routine instrument of economic organization and administrative reach. China may remain behind at the frontier, but if it can serve capable models at scale on hardware Washington cannot meaningfully switch off, U.S. controls no longer decide the strategic outcome.



bottom of page