ChinAI #175: AI Frameworks Development in China
Another issue, another white paper
Greetings from a world where…
white papers are always welcome
…As always, the searchable archive of all past issues is here. Please please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay support access for all AND compensation for awesome ChinAI contributors).
Feature Translation: AI Frameworks Development White Paper
Context: Last month, the China Academy of Information and Communications Technology (CAICT), a think tank under China’s Ministry of Industry and Information Technology, published a white paper (original Chinese) on the development of “AI Frameworks” — standard interfaces, libraries, and toolkits for designing, training, and verifying AI algorithms. Deeming these frameworks essential components of “AI infrastructure,” the white paper analyzes global trends in AI frameworks, with particular attention to developments in China.
Key Takeaways: The paper identifies four stages in the evolution of AI frameworks
Budding stage (early 2000s): tools not customized for development of neural network models, do not support GPU-based computing power, and APIs are extremely complex. Developers have to do things like coding backpropagation from scratch.
Growth stage (2012-2014): After the AlexNet breakthrough, Caffe, Chainer, and Theano emerge to help developers build complex deep neural network models and scale training on multiple GPUs.
Stability stage (2015-2019): Google open-sources TensorFlow, and Facebook releases PyTorch; in China, Baidu leads by releasing its PaddlePaddle deep learning framework in 2016.
Deepening stage (2020-): Advances in large scale models like GPT-3 place higher requirements on AI frameworks, which now have to make better use of computing power and adjust to ethical/governance issues related to trustworthy AI. White paper gives examples of next-generation AI frameworks in China: Huawei’s Mindspore and Megvii’s MegEngine — both launched in 2020.
*Note: Google doc translation has more discussion about AI frameworks and trustworthy AI
Diversification under a duopoly: The global AI framework landscape is dominated by a duopoly of Google’s TensorFlow and Facebook’s PyTorch. Notably, TensorFlow dominates in terms of industry users., whereas PyTorch is much stronger in academic users. CAICT cites data from Papers With Code to prove this point: the number of papers based on PyTorch in 2021 accounts for 59% of all papers based on AI frameworks, much higher than the second-ranked TensorFlow (12%).
Table 1 below gives Github statistics on commits, forks, stars, and contributors for mainstream AI frameworks.
Two Chinese firms’ AI frameworks — Huawei’s MindSpore and Baidu’s PaddlePaddle — have attracted attention from developers in China and abroad
PaddlePaddle has gained more traction internationally (see Table 1 Github figures), but Mindspore “is the most active, most paid attention to, and most used framework in the Chinese community, and it is the leader of China's open source ecosystem.”
Indicators based on Gitee, the largest open source code hosting platform in China, give some support for that point (see Table 2 below). One note of caution on Gitee-based figures: MegEngine, a well-known AI framework by Megvii has not yet been published on Gitee (only a mirror exists). For a great backstory on Gitee vs. Github, see this Restofworld article by Meaghan Tobin.
Bigger picture: Here’s how the White Paper frames the stakes: “AI frameworks will become the operating systems of the smart economy era. In the current Internet era, the operating system is the core pivot point of the IT industry. It establishes the connection between hardware and application software, and controls the entire ecology of digital devices. Through deep binding with general-purpose computing chips, it forms the two stable technical system patterns of Windows+Intel as well as Android/iOS+ARM. In the era of intelligent economy, AI frameworks play the role of the operating system in the AI technology ecosystem, and they are an important carrier of AI academic innovation and industrial commercialization, helping AI move from theory to practice and quickly enter the era of scenario-based applications. In general, the combination of "AI framework + computing power chip" determines the main technical route of AI industrial application to a certain extent.”
If you want statistics on Chinese tech in the form of patent data, publication counts, and R&D expenses, I can recommend a lot of different places. But if you want the good stuff in the form of Github commits, stars, and forks, stick around here. My plan is to finish the second half of the white paper next week!
TRANSLATION TO-DATE: CAICT AI Frameworks Development White Paper
ChinAI Links (Four to Forward)
Should-read: The Qiu files
In a longform article for Maclean’s, Justin Ling unravels a complicated tale: “Xiangguo Qiu would seem an unlikely character in a tale of international intrigue. A mild-mannered scientist who won accolades for her work fighting the deadly Ebola virus inside Canada’s most secure laboratory, her career was cut short in July 2019, when she and her husband were escorted out of her Winnipeg lab by the RCMP. Since then, she has become a central figure in a major political battle in Ottawa and the star of international conspiracy theories. She has been accused of selling state secrets, contributing to a clandestine Chinese bioweapons program, and even of helping to create COVID-19.
The story of Xiangguo Qiu is still shrouded in mystery, but former colleagues have told Maclean’s her case has more to do with tensions and warring priorities inside the lab than with anything more nefarious.”
Should-read: Hongqiao Liu’s Shuang Tan newsletter
Published by Hongqiao Liu, the Shuang Tan newsletter (name is a shorthand for China’s dual-carbon goals) tracks the world’s largest greenhouse gas emitter going net-zero. The linked article analyzes China’s recent coal policies and also goes over China’s Statistical Bulletin and Climate Bulletin. Hongqiao was previously an investigative journalist at Southern Metropolis Daily and Caixin Media, two Chinese outlets recognize for their independent reporting.
Should-read: U.S. Moving to Confront China on Trade, Industrial Policy
Reporting by Wall Street Journal team of Yuka Hayashi, Lingling Wei, and Alex Leary: “The Biden administration is preparing to confront China on its industrial subsidies and seek ways to protect America’s edge in new technologies, hardening U.S. economic policy toward the nation’s chief global rival. U.S. efforts to be rolled out in coming months could include a new investigation into Beijing’s support for sectors it considers strategic, using Section 301 of the Trade Act, according to people familiar with policy discussions…While the people didn’t cite potential targeted sectors, China has identified semiconductors, artificial intelligence, 5G wireless and electric vehicles as areas where it seeks global leadership.”
Comment: Treating AI the same as semiconductors doesn’t make much sense. The “run faster” approach is always going to be more effective than the “play prevent defense” approach for domains like AI.
Should-read: Jade Legacy
Have really enjoyed the third book by Fonda Lee in her sci-fi/fantasy trilogy about rival clans fighting for honor and jade.
Thank you for reading and engaging.
These are Jeff Ding's (sometimes) weekly translations of Chinese-language musings on AI and related topics. Jeff is a postdoctoral fellow at Stanford's Center for International Security and Cooperation, sponsored by Stanford's Institute for Human-Centered Artificial Intelligence.
Check out the archive of all past issues here & please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay for a subscription will support access for all).
Any suggestions or feedback? Let me know at chinainewsletter@gmail.com or on Twitter at @jjding99