ChinAI #241: The Long Road to Speech AI
A history of Chinese AI researchers at Johns Hopkins University's Center for Language and Speech Processing
Greetings from a world where…
for the rest of the college football season, this status update will be devoted to tracking the Iowa Hawkeye offense’s march to mediocrity. Did you hear about that time we gained 2 total yards in the second half? Yeah, that happened this week
…As always, the searchable archive of all past issues is here. Please please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay support access for all AND compensation for awesome ChinAI contributors).
Feature Translation: The Road to Speech AI
Context: We’ve now transitioned from the “another issue, another white paper” phase of ChinAI into the “another issue, another deep history” phase. Thanks to Zachary Arnold for sharing this longform Leiphone article (link to original Chinese), recommended by the Emerging Technology Observatory's Scout tool.
Here’s Scout’s excellent summary: Johns Hopkins University's Center for Language and Speech Processing has played a critical role in advancing the development of artificial intelligence that can understand, translate, and produce speech. A number of Chinese scientists connected to the center have played influential roles in that process. This lengthy article, which begins in 1999 and ends in the present day, describes those Chinese scientists' journeys from the center to major companies involved in the development of speaking AI, including Google and Ant Group. Their work has contributed both to the voice assistants like Siri that became popular years ago and to the rise of the large language models gaining fame today.
Key Passages: We begin in 1999 with Peng Xu, who has just started a PhD at Johns Hopkins University’s Center for Language and Speech Processing (CLSP), after finishing a 3-year graduate program at the Chinese Academy of Sciences.
Xu chose CLSP for one main reason: to work with Frederik Jelinek, a leading expert on speech recognition. Jelinek had done groundbreaking work for IBM before coming to CLSP. At IBM, in the 1970s, he boosted IBM’s speech recognition rate from 70 to 90% and significantly increased the number of words recognized by the company’s language models.
Xuedong Huang, current CTO of Zoom, told Leiphone: “IBM was the first to do voice work. If we tell it from a historical perspective, IBM internally applied the voice method to achieve machine translation and rewrote history. This also influenced the future Transformer (breakthrough).”
Jelinek was also a bit of a unique personality: “When there were more and more Chinese students (at CLSP), Jelinek also asked his secretary to put up an ‘English only’ sign in the office, and even paid for teachers to give English classes to Peng Xu and others.
When Jelinek first came to the United States, his dream was to study law, but he was worried that his Czech accent was too strong and his English pronunciation was slightly inferior, so he had no choice but to choose the Department of Electrical Engineering at MIT. The reason why he did this was because he was afraid that his students would suffer from language disadvantages and repeat his ‘same mistakes’.”
The CLSP → Google pipeline.
In his studies at CLSP, Peng Xu experiments with improving the effectiveness of random forest algorithms (a machine learning algorithm gaining increased popularity at the time) on speech recognition. After completing his PhD, he wanted to leave the ivory tower and joined this young, up-and-coming company: Google.
From the article: “As Google continues to grow, it is also attracting more and more Chinese rising stars in AI to join. Among them, Jun Wu, also from JHU CLSP, is one of the earliest contributors to Google. Wu was three years older than Peng Xu and could be considered Peng Xu's senior peer. Since they were both (ethnic) Chinese, the two often had meals together in the lab…When Google came to Baltimore and recruited at JHU, Jun Wu personally went with the team as a member of the team.”
One of the other rising stars that went from CLSP to Google was Zhifei Li, who graduated from CLSP five years after Peng Xu. Li eventually leads the work on machine translation and speech recognition at Google, making important contributions such as Google’s mobile offline translation service.
In the second half, the article continues to trace the journeys of these Chinese AI researchers who had once gathered at CLSP. Here’s a preview:
Twenty-four years after that initial phone call where he was offered a PhD spot at JHU’s CLSP, Peng Xu is now VP of Ant Group and leads its efforts on large-scale foundation models.
After just two years at Google, Zhifei Li returned to China to launch Mobvoi, a start-up focused on voice interaction and generative AI, which submitted an IPO application to the Hong Kong stock exchange in summer 2023.
FULL TRANSLATION: The Road to Speech AI: A Gathering of Heroes at Johns Hopkins University’s CSLP
ChinAI Links (Four to Forward)
Should-read: US tackles loopholes in curbs on AI chip exports to China
Reporting for Reuters, Karen Freifeld provides an overview of the Biden administration’s new guidelines for AI chip exports, which aim to close some loopholes to last year’s controls.
Should-read: Nuclear Risk Reduction Centers - A Stable Channel in Unstable Times
In a Stanley Center publication, Rose Gottemoeller and Daniil Zhukov explore the history of nuclear risk reduction centers, as a useful channel for avoiding inadvertent nuclear escalation. They also make a case for their continued relevance today: “NRRCs could become an essential aspect of mutual confidence-building and dialogue facilitation among the P5 if the UK, France, and China could establish their own versions of the centers.”
Should-read: Foundation Model Transparency Index
A team at Stanford’s Human-Centered AI Institute has released a new index that rates the transparency of 10 major foundation models. The index integrates 100 indicators related to how models are constructed (e.g., information about training data and computational resources involved); the capabilities of the models themselves and their safety features; as well as how the models are used downstream.
Should-apply: Open Philanthropy ongoing hiring round
Open Philanthropy is hiring for several new roles on its global catastrophic risks team, including three positions in the AI Governance and Policy domain. Applications close November 9th.
Thank you for reading and engaging.
These are Jeff Ding's (sometimes) weekly translations of Chinese-language musings on AI and related topics. Jeff is an Assistant Professor of Political Science at George Washington University.
Check out the archive of all past issues here & please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay for a subscription will support access for all).
Any suggestions or feedback? Let me know at chinainewsletter@gmail.com or on Twitter at @jjding99