ChinAI #45: China's Natural Language Processing (NLP) Landscape

Looking Back and Peering Forward

Welcome to the ChinAI Newsletter!

** For a variety of reasons (health/burnout/need to actually work on my PhD), I'm planning to slow down the pace of the newsletter, transitioning to (sometimes) weekly translations — so space between issues may be one week, or two weeks, or even a month. Still happy to push out quality translation content from contributors on a consistent basis though.

These are Jeff Ding's (sometimes) weekly translations of writings on AI policy and strategy from Chinese thinkers. I'll also include general links to all things at the intersection of China and AI. Please share the subscription link if you think this stuff is cool. Here's an archive of all past issues. *Subscribers are welcome to share excerpts from these translations as long as my original translation is cited.

I'm a grad student at the University of Oxford where I'm based at the Center for the Governance of AI, Future of Humanity Institute.

A Conversation with Zhou Ming, Vice-Dean of Microsoft Research Asia, on the Development Trends in China’s NLP Scene

In the past I’ve highlighted the “language asymmetry” issue as a key roadblock to good policy research on ChinAI. Another key roadblock is the “technology abstraction” problem for AI policy research. For instance, the U.S. Department of Commerce proposed rules for export controls lists 14 AI technologies but there are significant differences among them in the level of abstraction, which range from mathematical concepts to tangible products. The best policy research on AI should use the word artificial intelligence in an abstract sense as few times as possible - policy analysts should rigorously force themselves to specify what claims they are making about “AI” in terms of the domain and technological layer they are talking about.

This week, we look at China’s progress in one particular domain of AI, natural language processing, at the layer of both fundamental research and applications in products such as chatbots.

First, the big picture: Zhou Ming cites publications at top conferences in the NLP field (ACL, EMNLP, COLING, etc.) as evidence that Chinese researchers have ranked second in the world for the past five years; second only to the United States and much more advanced than other countries. Key takeaways:

  • Zhou highlights four key MSRA achievements from 2018: 1) human-level performance in machine reading comprehension, 2) parity with professional human translators in Chinese-English news translation, 3) first place rankings in grammar checkers, 4) best performance in text-to-speech on various evaluation sets. Again, this is another example of why even with NLP there’s too much abstraction in the policy conversation. NLP differs with respect to the language you are talking about. Some of these achievements are in Chinese-language NLP which differs from English-language NLP, both of which are significantly different from “low-resource” NLP tasks such as work on languages such as Inuit which have less training data/corpuses.\

  • MSRA is not the only player: Baidu, iFlytek have partnered with universities to organize separate large-scale Chinese machine reading comprehension evaluation datasets/metrics

  • One trend to keep an eye on is international AI conferences hosted by Chinese institutions, such as the Natural Language Processing and Chinese Computing Conference (NLPCC) of the Chinese Computer Federation, which Zhou thinks will become CHina’s leading international NLP academic conference

  • MSRA as a key training ground in the NLP field for China’s broader NLP scene: “In the field of NLP, when Microsoft Research China (later renamed Microsoft Research Asia) was founded, China only had one ACL article…we developed a plan to cooperate with relevant schools and schools to improve the research level of NLP through summer schools, joint laboratories, academic conferences, and various university cooperation projects. In the past 20 years, we have trained more than 500 interns, 20 doctoral students, and 20 post-docs in the NLP field. The large majority of these people have gone on to universities or other companies.”

FULL TRANSLATION: Dialogue with MSRA Vice Dean Zhou Ming: Looking back at the past and looking forward to the future, what are the development trends of NLP

This Week's ChinAI Links

Chinese phrase of the Week:  平均聊天轮数 (ping2jun1 liao2tian1 lun2shu4) - “conversations per session” - how many times the conversation goes back and forth between humans and chatbots. The article claims Xiaoice has an average of 23 conversations per session (most common chatbots only have two cycles)

We have a variety of exciting roles open for GovAI, all with deadlines in less than a month — one I want to highlight is the Governance of AI Fellowship, a 3-month opportunity to do research with my team at the Center for the Governance of AI. For an example of cool work we do, see this recent Lawfare piece by Remco Zwetsloot and Allan Dafoe on the structural risks of AI.

Fun Washpost profile of Zhang Jiaqian who translates President’s Trump’s tweets into Chinese (a infinitely more difficult task than mine)

Greg Allen’s must-read CNAS report on understanding China’s AI strategy draws from his four separate trips to attend major diplomatic, military, and private-sector conferences focusing on AI. It cites the important work of translators of Chinese-language documents, such as Elsa Kania. Her recent translation of an article from Study Times is a fascinating one that examines whether political and ideological influences could shape or constrain the PLA's operationalization of AI.

A couple weeks ago we featured a translation of a Huxiu piece that analyzed China’s AI ecosystem through the lens of two “martial arts schools” — this MIT Tech Review piece by Karen Hao, which also built on work of ChinAI subscriber Karson Elmgren, draws from that piece to create a really nice three-chart analysis. I think this model of knowledge creation is really cool and hope more people mine past issues of the newsletter and build on it.

Thank you for reading and engaging.

Shout out to everyone who is commenting on the translations - idea is to build up a community of people interested in this stuff. You can contact me at jeffrey.ding@magd.ox.ac.uk or on Twitter at @jjding99