ChinAI #247: XiaoIce, a Strange Species of Chatbot
The history behind the world's most popular social chatbot
Greetings from a world where…
newborn calves are not afraid of tigers [初生牛犊不怕虎]
…As always, the searchable archive of all past issues is here. Please please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay support access for all AND compensation for awesome ChinAI contributors).
Feature Translation: XiaoIce Peerless
Context: For a long time now, I’ve been wanting to learn more about XiaoIce, the chatbot with 10 million active users in China. Back in August 2021, The Washington Post reported:
Xiaoice has grown so popular that she performs 14 human lifetimes’ worth of interactions each day, said Li Di, CEO of Xiaoice, which Microsoft spun off in 2020. She’s busiest from 11:30 p.m. to 1 a.m., when users unload their day’s experiences or grow emotional. Xiaoice has 10 million active users in China.
How did XiaoIce get so popular? How did Microsoft create such a public-facing technology product in China, and why did XiaoIce eventually go independent? What is the future for XiaoIce in a ChatGPT world? This week’s translation is a deep history of Xiaoice, published in AItechtalk[AI科技评论] and authored by Caixian Chen and Zibo Dong. At this point, I try to read everything that Chen writes. Previous issues translated her longform reports on the Chinese University of Hong Kong as a cornerstone of China’s computer vision scene (ChinAI #201) as well as a history of large models in the Wudaokou neighborhood of Beijing (ChinAI #232).
Key Takeaways: Up front, I think it’s important to underscore what XiaoIce has achieved to date.
Since the launch of Bing in 2009, Microsoft had been fighting for market share in China, investing heavily to compete with local competitors like Baidu. “Originally, due to the large differences between XiaoIce and Microsoft's other product forms, XiaoIce was a ‘strange species’ within Microsoft and did not have many fans,” Chen and Dong write. “But it wasn’t until Xiaoice appeared that everyone was dazzled by a surprise: ‘Microsoft finally made a product that people can use.’”
Before ChatGPT, XiaoIce was the conversation AI product that ranked highest globally in terms of highest number of dialogue rounds. This is a metric that tracks how long, on average, people are willing to engage in conversation back and forth with chatbots.
XiaoIce’s origin story begins in December 2013, when Li Di (the product manager) teams up with Jing Kun (the technical expert).
Li Di’s previous position at Microsoft was with Bing Knows, a web Q&A product based on an encyclopedia structure. After doing market research and finding that Bing Knows could not compete with local Chinese products, Li Di came up with the idea of a conversational robot.
XiaoIce’s impressive rise was driven in part by some free publicity gained in a dispute with WeChat. The XiaoIce team bought WeChat accounts on Taobao and disguised XiaoIce as real humans, eventually infiltrating nearly 1.5 million WeChat group chats. After just three days, WeChat blocked it, which only fed more attention to XiaoIce.
The elephant-sized question that looms large in the room is: how did XiaoIce miss the wave of large language models?
Around 2016, everyone wanted to join XiaoIce, thinking that it was at the technological frontier. The Google doc translation includes a fun anecdote about Ruihua Song, who was deciding between a job at Meituan and XiaoIce, and ended up choosing the latter because her intern said XiaoIce sounded cooler.
The XiaoIce team focused on cleaning data to overcome the chatbot’s shortcomings rather than investing in upgrades to the core technology and algorithms. From the article, one example that fleshes this point out: “In 2020, Xiaoice tried to sell conversation technology to Watsons (Hong Kong health care and beauty care chain store), but after using it, Watsons found that Xiaoice could only chat and could not answer customers' questions about beauty knowledge related to beauty and skin care products sold on Watsons shelves.”
Next week’s issue will translate the second half of this story, which covers XiaoIce’s future in a world of large language models. The first half of the translation already takes up 12 pages on Google docs, so make sure to dig into the details of the Full Translation: XiaoIce Peerless
ChinAI Links (Four to Forward)
Should-read: Putting China’s Top LLMs to the Test
In ChinaTalk, anonymous contributor L-Squared and Irene Zhang tested three top Chinese large language models: Moonshot Ai’s Kimi, Baidu’s ERNIE 4.0, and Zhipu AI’s ChatGLM2. It was very cool to see how all three models responded to their array of prompts, which included their responsiveness to tasks typical for an office worker as well as tests of their trustworthiness and sensitivity to political content.
Should-read: Translation Snapshot: Chinese AI White Papers
The Center for Security and Emerging Technology recently released this snapshot of a group of translations of Chinese white papers on the AI industry. This collects seven of their original translations of lengthy reports by think tanks. This is a very important channel for people to get access to primary source documents in this field.
Should-read: Innovation Job Market Papers 2023
Matt Clancy, publisher of the What’s New Under the Sun substack, collected 43 innovation-related PhD job market papers from 2023. A few topics that caught my eye: knowledge spillovers of British migration to the U.S. (1870-1940); novel methods to track the gender gap in applied STEM fields like computer science and engineering.
Should-read: The Bitter Taste of ‘Not Too Sweet’
“Not too sweet” is a familiar refrain uttered by Asian Americans to describe desserts. By Jaya Saxena for Eater, this longform piece was the best thing I read this week.
Thank you for reading and engaging.
These are Jeff Ding's (sometimes) weekly translations of Chinese-language musings on AI and related topics. Jeff is an Assistant Professor of Political Science at George Washington University.
Check out the archive of all past issues here & please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay for a subscription will support access for all).
Also! Listen to narrations of the ChinAI Newsletter in podcast format here.
Any suggestions or feedback? Let me know at chinainewsletter@gmail.com or on Twitter at @jjding99