Discover more from ChinAI Newsletter
ChinAI #233: A public comment on the U.S.'s investment curbs in Chinese AI firms
Plus, Wudaokou Origins of China’s Large Models (Part 2)
Greetings from a world where…
Does school really start in a week?
…As always, the searchable archive of all past issues is here. Please please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay support access for all AND compensation for awesome ChinAI contributors).
Reflections on U.S.'s investment curbs in Chinese AI firms
Regarding last week’s executive order that restricts new U.S. investment in subsets of the semiconductor, quantum, and AI fields, I will repeat what I wrote last fall during debates about the U.S.’s export ban on high-end chips to China:
When it comes to U.S.-China technology competition, the benefits of the “promote” plank will always outweigh the “protect” plank. And, when we reflect on moves like this one thirty years from now, there’s a decent chance that such “protect” actions were counterproductive.
What is the goal here? Here’s my blunt (and admittedly somewhat unfair) read of the Biden administration’s approach to technological competition with China: Let’s take Trump’s strategy of being tough on China for the sake of being tough on China, but we’ll add this brilliant twist — it’ll just be narrower and slightly less counterproductive.
What is the theory of victory here?
If it’s to slow down China’s military modernization efforts, then in the future, will there really be fewer Chinese companies that support military AI applications because of these investment curbs? I think there’s a reasonable case to be made in the other direction: AI is a general-purpose technology with so many applications, and smart U.S. capital would probably push Chinese companies toward more commercial end-uses.
More importantly, slowing down China’s military modernization is not the be-all and end-all of our national interest. The U.S. benefits from its investments in China. A world with a weak China may be less safe than one with a strong China. Same goes for a world with a China decoupled from the U.S. vs. one with a China that is still very much coupled. When it comes to all these points, smart people can disagree, but let’s recognize at the very least that there’s a debate to be had here.
I want to be very concrete: I think it’s a good thing that Baidu, Alibaba, Tencent, Bytedance, and so many other Chinese technology giants are “foreign-invested enterprises.” It is good for the U.S. national interest that the Chinese government struggles to measure “self-reliance” and “indigenous innovation” because of the existence of these hybrid firms (see my CLM piece); it is a good thing that these firms are beholden, to some extent, to international stakeholders. I would much rather it be these companies at the leading edge of Chinese AI research than the more traditional national champions.
My broader point is this: I think China policymakers in this administration, many of whom are the smartest and hardest-working people in this space, have spent so much of their time figuring out how to implement policy effectively, without having a full-throated debate about what policy goals actually matter.
A good example of this: In February of this year, CSET put out a very well-researched, detailed report on this topic of U.S. outbound investment into Chinese AI companies. The report does not address the end goals of investment controls, aside from one notable exception: buried at page 35, we get this exceptionally perceptive paragraph:
What is our goal? Are we trying to prevent the Chinese military from reaching its 2049 modernization goals? Are we working to stymie Beijing’s efforts to use technology to abuse human rights? Are we worried about China gaining a first-mover advantage in emerging technologies? However we choose to define national security will help to inform the type of outbound investment screening regime, as well as bolster its effectiveness.
Let me be very clear. This is not a critique of the authors or the rigor of the report — I previously recommended it as a must-read, and I’m glad we have researchers doing the hard work of investigating the outbound investment screening regime. I’m highlighting this report because I think it nicely illustrates a broader trend in the national security community. We’ve put the cart before the horse when it comes to policies like these recent investment curbs, and we’ve failed to fully debate out the goals which these policies are meant to achieve, and the trade-offs they necessitate.
We’ve left that for a single paragraph on page 35 — just barely an afterthought.
So, what should the U.S. prioritize in terms of its national interest? I’ve been very outspoken that the U.S. government is overly preoccupied with “China gaining a first-mover advantage in emerging technologies.” Instead, my Diffusion Deficit paper argues that the U.S. should prioritize policies that facilitate the widespread adoption of emerging technologies across the U.S. economy.
Just imagine: what if we took all the time, political capital, and talent that is being spent on crafting “protect” measures and invested it into “promote” measures that built up the U.S.’s diffusion capacity?
In my view, one thing is for sure. Our nation would be more secure.
Feature Translation: A Brief History of Large Models in Wudaokou (second half)
Context: In the previous issue, I started digesting an incredibly informative history of China’s large language model development, a longform Leiphone article by Caixian Chen (link to original Chinese). Last week we translated 4,000 words. This week, we knocked out 4,000 more. Here’s what we learned.
Key Passages: In my GovAI report on recent trends with China’s large language models, I called out the hype over BAAI’s Wudao 2.0; a common refrain from English-language coverage was that this model represented “bigger, stronger, faster AI from China.” This article fleshes out more detailed reasons why WuDao 2.0 was over-hyped and misperceived:
“But many Wudao members know that, in fact, in 2021, the real Chinese-produced large-scale models have not yet appeared. The upper layer of the 100-billion model and the trillion-dollar model of Wudao 2.0 is a sparse architecture. Although the number of parameters of the model is enlarged through sparsification, the base capacity is akin to that of a tens-of-billion parameter scale model”
The details about BAAI’s struggles to train and run large models are truly fascinating:
“The file size of the trillion-dollar large model copied from the hard disk is about 20T, and more than 500 A100s are needed for inference (running the model). Therefore, after the Wudao team copied the files back to Beijing from Shandong (where the Sunway Oceanlight supercomputer was located in Qingdao), they could not afford to use them, so they could only open them to the industry. Several companies copied the documents, ‘one guesses that they weren’t able to use them either after copying them.’”
More concrete compute limits. At a Wudao internal meeting at end of 2021, Tang Jie lays out his goals to train a 100-billion parameter model. Article recounts: “They calculated the cost and found that to complete these goals, 1,000 cards needed to be run continuously for two months without errors, and the training cost would be extremely high. At that time, BAAI only had 480 A100 cards, and gave 400 of them to Tang Jie's team.”
They tried using chips made by Chinese companies, such as Huawei’s Ascend 910: “During this period, Tang Jie’s team adapted various cards on the market, and found that it was impossible for 2,000 910A cards to train a converged 100 billion parameter large model in a short time…In the end, Tang Jie rented 1,000 cards from the supercomputer center in Jinan…reconstructed the operator from the bottom layer, invested more than 20 people in training for 8 months, and finally trained a 100-billion parameter large model in July 2022 ——GLM-130B was born.” Per the article, at the beginning, the efficiency of Huawei’s 920 was only 18% of Nvidia’s A100, though Tang Jie’s team reportedly upped the efficiency to 40% after some modifications.
On BAAI’s impact as a cultivating ground for China’s large model ecosystem:
After the obligatory allusion to BAAI as the Whampoa Military Academy for China’s large models, Chen concludes, “It is worth noting that BAAI’s Wudao not only gave birth to the first batch of large-scale model companies in China, but also influenced a group of post-90s AI masters and doctoral students... Among the teams for Wudao 1.0 and 2.0, more than 85% of the members are young students born in the 90s.”
People who spun-off from BAAI to start ventures based on large models: June 2021 - Zhiwu Lu established Sophon Engine; November 2021 - Minlie Huang founded “Lingxin Intelligence”; March 2022 - Fanchao Qi starts Shenyan Technology; and lastly, August 2022 - Zhiyuan Liu founded “ModelBest.” A previous issue covered ModelBest’s efforts to to make it more cost-effective for small businesses, students, and government departments to use large models (ChinAI #199).
On how ChatGPT changed everything:
Initially: “In 2022, China's AI fully entered a capital winter. After the establishment of companies based on large-scale models, they all went out to raise funds with confidence, but none of the investors were willing to pay.”
After ChatGPT comes out: “Large models became popular all of a sudden. The big model companies that were not paid attention to before, such as Zhipu, Modelbest, Lingxin, Zhizi, Shenyan... have also become the stars of tomorrow in Chinese capital (circles). Sophon Engine originally couldn’t raise money. After ChatGPT came out, the angel round was valued at 100 million.”
***I rarely do calls for more paid subscriptions, since ChinAI is a passion project and it serves as an invaluable platform for my research. However, in recent weeks, I’ve noticed a decline in paid subscriptions from our peak of 200 (which was already a very small subset of the now approaching 16,000 free subscribers). So: If you find that the type of content ChinAI puts out doesn’t exist anywhere else on the Internet, then support it through a paid subscription (link).
FULL TRANSLATION: A Brief History of Large Models in Wudaokou
Thank you for reading and engaging.
*In the next issue, ChinAI links will be back in full force, I promise.
These are Jeff Ding's (sometimes) weekly translations of Chinese-language musings on AI and related topics. Jeff is an Assistant Professor of Political Science at George Washington University.
Check out the archive of all past issues here & please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay for a subscription will support access for all).
Any suggestions or feedback? Let me know at email@example.com or on Twitter at @jjding99