Greetings from a world where…
if you’re in Brussels and looking for something to do this Thursday night, come out to my book talk!
…As always, the searchable archive of all past issues is here. Please please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay support access for all AND compensation for awesome ChinAI contributors).
Feature Translation: For some Chinese chips, “no end in sight” to support the full-parameter version of DeepSeek
Context: There’s been a lot of noise about Chinese firms racing to adopt DeepSeek, but can Chinese AI chip companies support the deployment of DeepSeek models? This AItechtalk article (link to original Chinese) analyzes the capabilities of Chinese AI chip outfits (e.g., Cambricon, Moore Threads, Enflame, etc) to support inference services for DeepSeek models. One crucial distinction: whether the chips support a distilled version of DeepSeek1 — a smaller-sized model with a few billion or dozens of billions of parameters, which reduces deployment costs — or the full-parameter version of DeepSeek (with a parameter count as high as 671 billion).
Key Takeaways: With nearly 20 Chinese chip companies rushing to announce that their products could support DeepSeek’s models, one important point of clarification is needed: can your chips the full-parameter version or just the distilled version?
Running the full-powered version of DeepSeek requires more than just one server rack of 8 cards, presenting challenges for Chinese chip firms that struggle to run multiple high-performance servers in parallel. On these interconnection problems, senior AI chip engineer Jack relays, “It will be difficult to do it, and there may be no end in sight to successfully adapting to the full-parameter version of DeepSeek.”
In contrast, Chinese AI chips face no issues with supporting the distilled version of DeepSeek. And perhaps we shouldn’t be so quick to discount the distilled versions. “I once thought that adapting a distilled version of DeepSeek model was not very valuable, and many engineers also preferred the full-blooded version of DeepSeek, but now my thoughts have changed.” Bo Lin, who has more than 20 years of experience in the chip industry, said, “The distilled version of the model can meet the chat needs of ordinary users, which is of great significance to the dissemination of AI.”
Despite the accuracy limitations, Jack also stated that distilled models can significantly boost the capabilities of edge AI: “With a distilled version of DeepSeek, for example, a particular application scenario that could only deploy a 7B model before can now achieve the performance of a 14B model.”
Why do Chinese AI chip companies trail Nvidia on this front? We’ve mentioned the issue of interconnections within and between server racks (Nvidia’s NVLink is a key strength here). Let’s get deeper into the details:
Chinese AI chips do not support FP8 data representations, which is a common method to reduce the memory footprint of AI applications. By contrast, in 2022, Nvidia’s H100 chip already supported FP8. Since Chinese AI chips only support FP16, deploying DeepSeek requires twice the storage and memory footprint, increasing the need for more cards.
Bo Lin, the chip industry veteran, is very blunt about the fact that the latest Chineese AI chips do not support FP8: “This shows that many people who make AI chips in China do not understand AI.”
From the article: “After DeepSeek exploded, we wanted to adapt it with a card from a domestic AI chip company.” Boyuan, a practitioner at a Chinese intelligent computing center, said, “But the reality is that if the (inference) performance of DeepSeek on an (Nvidia) A100 is 100 points, this domestic card only provides a few points of performance, and even if it is optimized, it only has a performance of around 10 percent that of the A100.”
I’ll conclude with some granular figures to keep an eye on going forward.
One useful indicator is the inference processing speeds of these chips (when running DeepSeek models, for instance). You want to get to at least 20 tokens per second to provide a satisfactory user experience; this results in a first word latency of 1-1.4 seconds.
AItechtalk learned that “current leading Chinese AI chip companies have only achieved 10 tokens/s…to adapt to the full-parameter version of DeepSeek.” Though, the piece also cites some reports by Chinese AI chip companies that they’ve achieved 25 tokens per second in intelligent computing centers in deploying the full-parameter version of DeepSeek.
Many of AItechtalk’s sources suggested that Chinese companies wouldn’t get to the 25 tokens/s mark until the end of the month.
Note: Jack, Bo Lin, and Boyuan in the article are all pseudonyms.
Full Translation: For some Chinese chips, "no end in sight" to support the full-blooded version of DeepSeek
ChinAI Links (Four to Forward)
Must-read: The women who made America’s microchips and the children who paid for it
For The Verge, Justine Calma revisits America’s first generation of semiconductor manufacturers:
The industry employed many women and young people from Hispanic and Asian immigrant families who’d previously worked in canneries that were closing up shop as Americans started importing more fresh fruit. The factories offered a new kind of assembly line work that you didn’t need a degree or much training to land. But a lack of appropriate safety measures left workers vulnerable to a slurry of chemicals that posed dire health risks. Over the years, many of the workers had miscarriages, including Yvette.
Now, as the U.S. is building out new semiconductor manufacturing factories, CHIPS Communities United — a a coalition organizing for the safe and responsible implementation of the CHIPS Act — has “published a letter to semiconductor industry execs asking them to sign legally binding community benefit agreements when they build new fabs. They asked companies to replace chemicals that can cause cancer, miscarriages, birth defects, and fetal brain damage.”
Should-read: China R&D Funding Report 2024 (in Chinese)
The Intellectual [知识分子] did a helpful summary of Dalian University of Technology’s annual report on China’s R&D spending. The comparison of R&D intensity levels between China and G7 countries was especially useful. China sits in the middle of the G7 countries on this metric, with a R&D investment intensity of 2.54%, which trails the U.S., Japan, Germany, and the United Kingdom.
Should-read: How Candise Lin Became the Unofficial Ambassador of Chinese Internet Culture
By Zeyi Yang, this is a portal-opening Wired article that profiles Candise Lin, a California-based social media influencer who “scours the Chinese internet looking for a new celebrity feud, the hottest meme, or perhaps a viral college dorm challenge, which she then translates into English and explains in a minute-long video.”
As Josef Burton, former US diplomat who follows Lin on Instagram, captures the significance of this portal, “China is presented as this completely othered place where no one jokes around, this censored, barren hell space that’s all hyper propaganda … But no, people joke around. Daily life exists. Memes exist.”
Should-read: Why China may struggle to unlock the power of AI
Robyn Mak’s Reuters commentary applies some of the key takeaways from my argument about China’s “diffusion deficit.” One interesting tidbit: “Last year, more than 60% of 500 small and medium sized (Chinese) enterprises polled are only in the ‘early’ stages of digitisation, using basic data management and IT applications.”
Thank you for reading and engaging.
These are Jeff Ding's (sometimes) weekly translations of Chinese-language musings on AI and related topics. Jeff is an Assistant Professor of Political Science at George Washington University.
Check out the archive of all past issues here & please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay for a subscription will support access for all).
Also! Listen to narrations of the ChinAI Newsletter in podcast format here.
From the article: “A distilled DeepSeek model uses the data generated by DeepSeek-R1 to fine-tune other models. The parameters range from a few billion to dozens of billions, such as DeepSeek-R1-Distill-Qwen-1.5B/7B/14B/32B, DeepSeek R1-Distill-Llama-8B/70B.”