Greetings from a world where…
we all make sure to drag our favorite newsletters from the promotions tab to the primary tab, right?
…As always, the searchable archive of all past issues is here. Please please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay support access for all AND compensation for awesome ChinAI contributors).
Feature Translation: A Brief History of Large Models in Wudaokou
Context: If you’ve ever been an international student in Beijing, the Wudaokou area probably conjures up wonderful feelings. Please, take me back to those Thursday nights playing beer pong at Sugar Shack. For The Washington Post’s neighborhood guide, here’s Yifan Zhang on this magical neighborhood:
To some people, Wudaokou, or WDK, which means “where five roads meet,” is where poor students and coders hang out. For others, it’s not only quintessential modern Beijing, it’s also the Center of the Universe (a nickname for WDK). You can study, fundraise, create a unicorn start-up company and go out to have fun, all within a square mile or two.
In this week’s feature translation, which takes us back to 2018, Wudaokou is where Chinese AI scholars gather to discuss the paradigm shift in language models marked by Google’s BERT. The neighborhood also becomes home to the Beijing Academy of Artificial Intelligence (BAAI), which plays a key role in the history of China’s large language model development. This longform Leiphone article (link to original Chinese) is written by Caixian Chen, who also co-authored an awesome history of Chinese University of Hong Kong as a cornerstone of China's computer vision Scene (ChinAI #201)
Key Passages: We begin with BERT:
On that day, October 11 (2018), an ordinary Thursday, Zhiyuan Liu opened up the arXiv page as usual to browse the latest work in the field of artificial intelligence (AI) uploaded by scholars from all over the world. Most of the time, the quality of papers on arXiv is uneven, and Zhiyuan Liu only browsed roughly to get general information; but this day, he was profoundly drawn to a paper attached to the Google language team.
Originally, I just clicked in and glanced at it, but the more I looked at it, the more fascinated and surprised I became. After turning off the computer, I still couldn’t regain my senses for a long time, and was overwhelmed by the thoughts in it. Sure enough, he soon discovered that this paper also attracted widespread attention from other AI scholars in China. Professors and students from schools such as Tsinghua University, Peking University, Renmin University of China, and Fudan University were also enthusiastically discussing the work.
Everyone had a vague feeling: "This may be another technological paradigm revolution in the field of AI.”
This work is the famous BERT paper that has been cited more than 70,000 times in Google Scholar - "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.”
A fun naming details about two different Chinese teams that wanted to name their models ERNIE (to compete with Sesame Street rival BERT):
Zhiyuan Liu was at Tsinghua University’s Natural Language Processing Lab. After BERT’s release, he cooperated with researchers from Huawei’s Noah’s Ark Lab and submitted a pre-trained language model named ERNIE to the top NLP academic conference, ACL, in 2019.
At the same time, Baidu’s NLP team was also working on a pre-trained language model that they named ERNIE. Since Baidu posted on arXiv first, Zhiyuan Liu and others changed their name, and so Baidu has continued to use the ERNIE name for its large language models (e.g., the ErnieBot model I covered last week).
The reaction after OpenAI releases GPT-3
From the article: “On the one hand, everyone is excited about GPT-3; on the other hand, they feel a huge gap in their hearts. Before this, most Chinese scholars felt good about themselves, and felt that the level of papers published by domestic teams was comparable to that of American universities; after GPT-3 came out, they realized that there was still such a big gap between themselves and the top international level.”
Some scholars, including Zhiyuan Liu, decided they would shift their entire research direction to large language models. This aligned with the establishment of BAAI, headquartered in — guess, where?! — Wudaokou, which became a cornerstone for gathering outstanding AI researchers in the area.
After eventually reaching a consensus to work on large models, BAAI still needed to decide on a name for the project.
“Tsinghua Professor Tang Jie suggested that the name had something to do with Wudaokou, because everyone was in Wudaokou and had deep feelings for Wudaokou, so everyone thought of a few names together…After a brainstorm, Ruihua Song from Renmin University proposed to call it "Enlightenment" (wudao), which sounds like "五道 [the neighborhood name]" (wudao), and everyone agreed.”
Great details on computing power limitations:
Chen reports: Since large models require large computing power, BAAI also began to invest heavily in computing power and other resources from October (2020). At the beginning, BAAI planned to use the existing scientific research funds to purchase 300P (Petaflops?). Mayor Chen Jining made a decision to strongly support it, and then decided to allocate funds from the special funds to purchase 700P, so the total is 1000P. However, the process of going through approvals and purchasing computing power dragged on for more than a year, so Wudao mainly relied on renting computing power in its start-up phase.”
The Beijing Mayor comes across pretty well in this article. After BAAI relayed its plan to develop large-scale language models: “Mayor Chen said excitedly: ‘This (large-scale model) is the nuclear explosion point of artificial intelligence in the future. It will bring about the vigorous development of the entire production ecosystem.’ The Beijing Municipality decided to strongly support and approve special funds for BAAI to purchase computing power.”
A very important point to conclude. BAAI did not restrict the research freedom of its affiliated scholars, and almost all of them worked at other labs and universities.
As an example, in one BAAI group, Hongxia Yang was leading Alibaba’s efforts to develop large language models. Zhilin Yang, co-founder of Recurrent AI, was working on the PanGu model with Huawei. As the article concludes: “Wudao not only serves as a bridge between scholars, but also strengthens the cooperation between academia and industry.”
This week’s translation takes us up to the launch of BAAI’s Wudao 1.0 and Wudao 2.0 models, which I analyzed over two years ago in ChinAI #145. Next week’s translation starts with a section titled “The Eve of ChatGPT.”
First half of FULL TRANSLATION: A Brief History of Large Models in Wudaokou
Thank you for reading and engaging.
*I didn’t read much else this week, but I’ll catch up next issue!
These are Jeff Ding's (sometimes) weekly translations of Chinese-language musings on AI and related topics. Jeff is an Assistant Professor of Political Science at George Washington University.
Check out the archive of all past issues here & please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay for a subscription will support access for all).
Any suggestions or feedback? Let me know at chinainewsletter@gmail.com or on Twitter at @jjding99
Jeff, is the deficit too much at this point? Can China's Ai catch up, or is the first-mover advantage too strong now?