ChinAI #242: The Long Road to Speech AI (part 2)
JHU in the 1990s to large language models in the 2020s
Greetings from a world where…
sisig is now my favorite food (Hawks had a bye week)
…As always, the searchable archive of all past issues is here. Please please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay support access for all AND compensation for awesome ChinAI contributors).
Feature Translation: The Road to Speech AI (part 2)
Context: Last week, we began reading about Johns Hopkins University’s Center for Language and Speech Processing (CLSP), and some of the Chinese AI researchers connected to the center. We heard stories about Peng Xu, who got his PhD at CLSP, worked at Google, and now works on large language models as a VP of Ant Group (Alibaba affiliate). We learned about Zhifei Li, who graduated from CLSP five years after Peng Xu, made important contributions to machine translation at Google, and then started a voice interaction start-up called Mobvoi.
Why do we keep doing these deep histories of China’s AI landscape? Well, let me describe four versions of conversations I sometimes have — some of these are with real people and some of these are internal dialogues with my self:
1) Researcher: How do I improve this report on this particular aspect of China’s AI scene?
Me: Engage with at least three Chinese people who have written about this topic or who are engaged in this subfield. Engagement can be something as basic (and profound) as reading their writing.
2) Journalist: What does China think about this particular topic related to AI?
Me: Chinese people have diverse opinions. Learn the stories of at least three Chinese people who have expressed viewpoints about this particular topic.
3) AI company person: Hey, it might be really helpful to discuss some of these tough AI governance issues with our Chinese counterparts?
Me: Have you tried reading the bios of three Chinese peers? Some of them live and work just up the street from you, or even within your own company.
4) Policymaker: Will this policy be effective at ensuring the U.S.’s AI ecosystem remains more competitive than China’s?
Me: These two ecosystems are more interconnected and interdependent than meets the eye. If you are making drastic policy changes, it would probably help to read about at least three Chinese AI researchers who have traversed the two ecosystems. You could even learn their names. If you can pronounce Tchaikovsky, you can pronounce Kai Yu.
So that’s why we do these 20-pg deep history translations. Because there’s something missing, when the people in these conversations — the people researching China’s AI scene, the people reporting on China’s scene, and the people making policy related to China’s AI scene — don’t see the people of China’s AI scene.
Key Passages: Okay, where were we? Let’s start with Guoguo Chen, top scorer in the college entrance exam in Shaoxing, a city in Zhejiang province known for its cooking wine. After studying at Tsinghua for undergrad, he goes to CLSP in 2010 to study speech recognition.
At CLSP, he works with Daniel Povey, an expert known for developing the open-source speech recognition toolkit Kaldi, now the dominant framework for developing state-of-the-art speech recognition systems. According to the article, “While at JHU, Chen was also deeply involved in the work of Kaldi and contributed a lot of code to it.”
While interning at Google during his PhD, Chen continues to make important contributions to speech recognition, including the “Okay Google” hotword detection system. Before the use of such hotwords (e.g., “Hey Siri”), voice interaction methods required the microphone function to constantly remain on, which resulted in a lot of false recognitions. Chen proposed the hotword idea, and eventually all the mainstream voice assistants adopted this method.
After graduation, Chen and fellow CLSPer Xuchen Yao decided to launch a start-up called KITT.AI, which developed a conversational language engine. The article tells the rest: “And even today, for a PhD student to start a business right after graduation, risking it despite not having a green card, receiving investment from former Microsoft co-founder Paul Allen and the Amazon Alexa Fund, and being successfully acquired by a leading domestic company (Baidu) in less than three years — this is actually an unimaginable thing. But in 2017, this type of small probability event suddenly happened to Chen and Yao. ‘We are among a relatively lucky group of people who caught up with the first wave of AI.’ Chen said.”
How have Chinese CLSPers adapted to the large language model era? Let’s start with Zhifei Li’s Mobvoi:
From the article: “After working hard to integrate their technology with the WeChat interface, his product account was selected into Tencent's ‘Top Ten Public Accounts’; he also tested out an APP and successfully embedded the APP into Google Glass in 2014. After that, Mobvoi also launched the operating system Ticwear, released the smartwatch Ticwatch, and entered the hardware track... During this period, the scale of Mobvoi's team also continued to expand. By 2015, Mobvoi had become a tech company with more than 230 people focusing on voice search applications.”
As the article relates, Li tries to get funding for work on large language models, in 2020 and 2021 from different city committees, but funding doesn’t come until October 2022.
Piece concludes with another JHU alumni: Yeyi Yun, who graduated in 2017. Her company MiniMax has made big waves on the large language model track. In fact, MiniMax performs pretty well on the SuperCLUE benchmark for Chinese large language models, which we’ve covered in past issues (translation of SuperCLUE May 2023 update)
FULL TRANSLATION: The Road to Speech AI: A Gathering of Heroes at Johns Hopkins University’s CSLP
ChinAI Links (Four to Forward)
Must-read: State of AI Safety in China
Concordia AI, a social enterprise in AI governance, has made great efforts to comprehensively map out the AI safety landscape in China in this 161-pg. report. Some of their key findings:
“China has developed powerful domestic governance tools that, while currently not used to mitigate frontier AI risks, could be employed that way in the future. Existing Chinese regulations have created an algorithm registry and safety/security reviews for certain AI functions, which could be adapted to more directly deal with frontier risks.”
“Technical research in China on AI safety has become more advanced in just the last year. Numerous Chinese labs are conducting research on AI safety, albeit with varying degrees of focus and sophistication. Chinese labs predominantly employ variants of reinforcement learning from human feedback (RLHF) techniques for specification research and have conducted internationally notable research on robustness. Some Chinese researchers have also developed safety evaluations for Chinese Large Language Models (LLMs), although they do not focus on dangerous capabilities.”
Should-read: old Quora thread comparing open source automated speech recognition software
I looked around a lot for a good read on the history of Kaldi. This Quora, from back in 2015, provides some good context on how JHU’s Kaldi eventually became the leader.
Should-read: International Governance of Civilian AI — A Jurisdictional Certification Approach
Published in August 2023, this GovAI report proposes an International AI Organization that would certify country jurisdictions for compliance with certain standards related to managing AI risks. I liked how the report’s authors (Robert F. Trager, Ben Harack, Anka Reuel, Allison Carnegie, Lennart Heim, Lewis Ho, et al.) adapted insights from existing international organizations, including the International Civilian Aviation Organization.
Should-read: China expected to attend UK summit on artificial intelligence next month
Reporting from China for FT, Qianer Liu and Nian Liu cite two Chinese government officials stating that China will send at least a representative to this summit, which the UK aims to set out a global governance approach to. AI. Specifically, “the UK aims to use the summit to provide detail on the creation of an AI Safety Institute, which would bring together other countries to assess the national security implications of new technologies.”
Thank you for reading and engaging.
These are Jeff Ding's (sometimes) weekly translations of Chinese-language musings on AI and related topics. Jeff is an Assistant Professor of Political Science at George Washington University.
Check out the archive of all past issues here & please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay for a subscription will support access for all).
Any suggestions or feedback? Let me know at chinainewsletter@gmail.com or on Twitter at @jjding99