ChinAI #178: MT to Death

Qun Liu, Huawei's chief scientist of speech & language computing, on machine translation

Apr 04, 2022

Greetings from a world where…

t’was a blast to teach my first class on foreign policy decision-making

…As always, the searchable archive of all past issues is here. Please please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay support access for all AND compensation for awesome ChinAI contributors).

Feature Translation: Interview with Qun Liu on machine translation

Context: In January 2022, Qun Liu, a machine translation expert at Huawei’s AI research center, was elected as an ACL Fellow (ACL = top conference in natural language processing). He’s mentored many of the young researchers who lead machine translation work in China, at both universities and large tech companies. He also happens to be a “Big V” (influential user with a lot of followers) on Weibo, where his handle is “MT to death,” possibly a confession of his enduring devotion to machine translation. Recently, 智源社区 (a WeChat public account for the Beijing Academy of Artificial Intelligence) sat down with Qun Liu for a wide-ranging chat.

Key Takeaways: Some fascinating stories about the history of MT in China

Liu recalls how, while working on his PhD at Peking University, this team competed in NIST’s machine translation evaluation in 2002. Their results were disappointing, and it was a wake-up call to transition from rule-based methods to statistical ones. Open source norms hadn’t permeated academia at the time, so they had to run through a bunch of different re-implementations to get a handle on these statistical methods.
After receiving his PhD, Liu started his own team at the Institute of Computing Technology (Chinese Academy of Sciences). Their team open-sourced the first Chinese word segmentation system, and placed 5th in the 2005 NIST evaluation. Over the next two years, his team published 3 papers at ACL; before then, Chinese research institutes had only published one paper at ACL ever.
His team focused on the syntax of the source language when translating between Chinese and English (two languages with large structural differences) — a nice linguistic detail that highlights how the L in NLP matters.

On joining Huawei’s AI research center (AKA Noah’s Ark Lab), which coincided with the rise of pre-trained language models:

Liu’s team focused on bottlenecks to the commercialization of pre-trained language models like BERT. They came up with TinyBert, which greatly compresses the model size and speeds up inference speed (meaning it can run faster on mobile phones and other end-use devices). “This was our earliest breakthrough. At present, all of Huawei's product lines have basically applied TinyBERT technology,” he said.
TinyBERT impact extended to the research community. The article characterizes TinyBERT as a classic work in compressing pre-trained language models.

On the future of MT:

Liu: “Although text translation has achieved great success, real-time speech translation or automatic simultaneous translation still faces great challenges. If text machine translation can currently meet the needs of most scenarios, real-time translation is still in its infancy. But the bigger the challenge, the more rewarding the research. He believes that compared with other research directions, real-time translation is a very interesting field to study. In addition, there are still many problems in the translation of texts at present, such as the translation of papers and novels. The biggest problem is the inconsistency of terminology. One of the solutions in this regard is to introduce symbolic reasoning, which can not only improve the comprehensibility of the model, but also holds promise for reducing translation consistency errors.”

Liu believes that researching dialogue systems is harder than machine translation because the problems involved are more unbounded. In practical applications, it’s still extremely difficult to customize a customer service system for business scenarios more complicated than booking tickets and hotel rooms. Plus, the system has to be adapted to protect safety, avoid negative and offensive language, and avoid discriminatory and biased content.

Just for fun: Liu’s Weibo posts often feature the following hashtag — #natural language understanding is too difficult (自然语言理解太难了)#. Most of the posts highlight misunderstandings related to language, showing that even humans struggle to comprehend the meaning behind language, not to mention machines.

Example post from the hashtag:

“广州学校要求学生会煲汤” (A school in Guangzhou requires students to be able to make soup)
“不进学生会就好了 (Just don’t join the student union then and you’re good)
Playing on the fact that the three-word phrase 学生会 can mean either “students able to” or “student union)

FULL TRANSLATION: Interview with Qun Liu, a machine translation expert and chief scientist of speech and language computing at Huawei Noah’s Ark Lab

ChinAI Links (Four to Forward)

Should-read: The New Fire. War, Peace, and Democracy in the Age of AI

Really excited about this new MIT Press book on AI and geopolitics by Ben Buchanan, an assistant director at the White House OSTP (on leave from Georgetown) and Andrew Imbrie, a senior fellow at the Center for Security and Emerging Technology. They make a very needed challenge to the common assumption that AI favors autocracies.

*Ben has graciously offered to mail a free copy to a newsletter reader. If you’re a paying subscriber of ChinAI and you would like a free copy, reply to this email. I’ll do a random lottery and forward your name and mailing info to Ben.

Should-read: Machine Decision is not Final

Edited by Benjamin Bratton, Bogna Konior, and Anna Greenspan, Machine Decision is Not Final: China and the History and Future of AI is a really cool interdisciplinary collection on China’s engagement with AI:

Tracking the history of Chinese AI from the pre-Cultural Revolution to the post-Deng Xiaoping eras right up to contemporary debates surrounding facial recognition, the writers in this collection draw on a mixture of speculative thought experiments and cutting-edge use cases to offer singular views on topics including AI and Chinese philosophy, AI ethics and policymaking, the development of computational models in early Chinese cybernetics and the aesthetics of Sinofuturism.

*H/t to Matthijs Maas for sharing this with me

Should-read: 2022 Stanford AI Index Report

This report has become an essential annual touchstone for the state of AI. One important takeaway that undercuts the DECOUPLING narrative: “Despite rising geopolitical tensions, the United States and China had the greatest number of cross-country collaborations in AI publications from 2010 to 2021, increasing five times since 2010. The collaboration between the two countries produced 2.7 times more publications than between the United Kingdom and China—the second highest on the list.”

*Shoutout to Daniel Zhang, policy research manager at Stanford HAI, for his great work on the index. See his analysis and translation of Tencent’s explainable AI report in this past issue of ChinAI.

Should-read: Knowledge Base: China’s ‘Global Data Security Initiative’

Oftentimes we note a policy announcement but forget about it the day after. Chaeri Park, for DigiChina, tracks how Chinese officials have framed, pushed, and slowed down promoting the Global Data Security Initiative since it was initially announced (“seen by many as a response to the Clean Network program put forward by the Trump administration”).

Thank you for reading and engaging.

These are Jeff Ding's (sometimes) weekly translations of Chinese-language musings on AI and related topics. Jeff is a postdoctoral fellow at Stanford's Center for International Security and Cooperation, sponsored by Stanford's Institute for Human-Centered Artificial Intelligence.

Check out the archive of all past issues here & please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay for a subscription will support access for all).

Any suggestions or feedback? Let me know at chinainewsletter@gmail.com or on Twitter at @jjding99

ChinAI Newsletter

Discussion about this post