ChinAI #146: Prof. Dai Jinhua on Information Cocoons

Plus, a dip into knowledge-based video content on Bilibili

Greetings from a world where…

I buy my coffee and I go

Set my sights

On only what I need to know

…Please please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay support access for all AND compensation for awesome ChinAI contributors). As always, the searchable archive of all past issues is here.

***Answers to last week’s WuDao Turing Test: The top answers were WuDao-generated. Thanks to everyone who guessed!

The Bad Intentions in Personalized Recommendations?

Context: Most readers have heard about China’s incredible boom in short videos (see, e.g., Douyin, the Chinese version of TikTok). Another trend, perhaps less recognized, is that platforms like Douyin and Bilibili are expanding to longer knowledge-based video content. This week’s translation, a Bilibili video (in Mandarin), features an interview with Dai Jinhua, a Peking University professor who teaches women’s studies, cultural studies, and film. She’s one of China’s “most influential academic and public intellectuals.” In this 7 min. video posted last week (now up to 225,000 views), she discusses her online shopping habits, quotes Greta Gerwig, and shares her concerns about big data and recommendation systems.

Full Translation:

Intro has some light-hearted bloopers, including Prof. Dai accidentally addressing another platform (instead of Bilibili). She also says: I've replied too seriously again, my style is not easygoing enough. Dai also introduces herself as a lifelong lover of film and expresses her excitement at sharing ideas about social and cultural issues with young friends on new media platforms like Bilibili.

Substantive interview content starts here:

QUESTION: Prof. Dai, are you shopping online?

DAI: I'm a midnight shopaholic

QUESTION: Then when you see that the products, movies, music, or books recommended to you are all your favorite types, will you read those recommendations or will you deliberately avoid them?

DAI: I will look at the recommendations but they often backfire. I would suspect that they have bad intentions (laughs at herself).

QUESTION: Do we need to fear big data?

DAI: My feeling is that we don’t have to fear big data. What I’m talking about is that we don’t have to fear big data as a new potential social method.* As a method of fetching social information, it has been meaningful and effective to date. Then this technological advancement is applied to the collection of social information and the determination of social conditions

I thought this was a very interesting development and step forward, but regarding us as individuals, this big data has to a large extent begun to control social life. Social information controls the channels through which we obtain information and establishes a connection between us and society.

Then when we allow our imaginations to unfold, the word I would probably use is vigilance (警惕). I hope everyone will be alert to all kinds of expressions that appear in the name of big data. I hope everyone will be wary of the emergence of big data as a means of forming societal surveillance. I hope everyone is wary of the limitations imposed by big data on our lives. That is, there’s that fashionable term that everyone uses called "information cocoons."

An info box pops up on screen: Information Cocoons [信息茧房] refers to the phenomenon that people are habitually guided by their own interests in the area of ​​information that people pay attention to, thus shackling their lives in a "cocoon" like a silkworm cocoon

In fact, in my understanding, the information cocoon has at least two aspects that will create our problems. In one aspect, the so-called information cocoon is that we set our own limits within its range of choices because we try to pursue a kind of knowledge that we are familiar with. Then, our pursuit of knowledge is not to gain knowledge and development. We pursue knowledge for repetition, verification, and assurances of safety. At this time, we have formed an information cocoon by ourselves because we don’t want to cut into knowledge we don’t know. We don’t want to intersect with online information that makes us unhappy.

This is one aspect. At this time, I say that no matter what big data tells you, we still have to work hard to explore, to obtain knowledge. But on the other hand, what kind of information is delivered based on big data, like personalized customization — these so-called special projects that supposedly serve you. Yet in fact, they must inevitably delineate a boundary. The inevitably mark out a limit. This inevitably make us willing to become like the Monkey King sleeping peacefully in a small position in the hands of the Buddha.

We sleep in that small position and think we possess the whole world. We think that the world is our comfortable rocking chair.

But the world does not become better because we live safely in the information cocoon, nor because we have constructed this information cocoon by ourselves. Everything in the world is happening. There are many catastrophic things happening. We need to know about and recognize them.

Besides, I have a judgment that may be a bit alarmist, but I want to share it with the young friends here. I think that human civilization has experienced this information revolution and then the whole world has undergone drastic changes, and then globalization via the Internet and Internet of Things has become the reality of our daily lives. When this becomes such a real structural existence that each of us really feels, in fact, our old knowledge is still outdated. We have no precedent to cite. We cannot use our old knowledge to explain what is happening around us.

At a time when we should be asking and pursuing questions, and exploring, this type of safe knowledge, this type of omnipotent knowledge, this type of comfortable generation of knowledge is actually a structure of self-hypnosis and self-suggestion, because no matter how much we want our safe life in a limited space, it is difficult for us to make it a reality.

In the end, we have to face a reality that is radically changing, challenging, and cruel. So in this sense I say be vigilant of the big data moniker. Be wary of the boundaries that big data defines for us. Let's ask questions together. Let's explore together. Let's meet challenges together and try to let us smash obstacles instead of allowing obstacles to smash us.

QUESTION: Do you have any anxiety about information overload [信息过载]?

DAI: As for information overload, I was anxious, but then I found out that I learn new knowledge relatively fast. For example, I understand the meaning behind the slang words my students use, and I think it’s not difficult to communicate with them in this language. That’s actually the easiest part.

What you are using information overload to describe: the critical part of that is not the information overload, the critical part is the constant rotation. The critical thing about such rapid changes is that everyone seeks to chase fashion. And every one of our questions is answered by searching for answers through search engines. So we don’t feel that there is another process in which we calm down and think about how we find answers through reading. We get the standard answer in a few seconds online, and we think that our question was answered.

So I think this is really related to our topic. The director of Lady Bird said that we must be bored to a certain extent before we can achieve something (translator note: possible reference to the fifth Greta Gerwig quote in this link).

At first, that sentence startled me.

I finally went to read a bit more before I understood that the so-called boredom she’s talking about is not the daily boredoms we’ve all experienced.* It’s the boredom in looking for an answer in our minds for an instant and thinking that we’ve arrived.

We don’t have the kind of process that lets thoughts sweep (掠过)* past our minds. Some questions are gradually formed in our hearts. I think this is the bigger problem. After we get the answer quickly, we think we have solved the problem. Actually the question is much bigger and more complicated than that answer.


*= uncertain about my translation in this area. For those interested in reading the original Chinese, here’s my attempted transcription of the video in this Google doc link.

ChinAI Links (Four to Forward)

Jeff should-read: After the Post–Cold War — The Future of Chinese History

Blurb from Duke University Press: “In After the Post–Cold War eminent Chinese cultural critic Dai Jinhua interrogates history, memory, and the future of China as a global economic power in relation to its socialist past, profoundly shaped by the Cold War. Drawing on Marxism, post-structuralism, psychoanalysis, and feminist theory, Dai examines recent Chinese films that erase the country’s socialist history to show how such erasure resignifies socialism’s past as failure and thus forecloses the imagining of a future beyond that of globalized capitalism. She outlines the tension between China’s embrace of the free market and a regime dependent on a socialist imprimatur. She also offers a genealogy of China’s transformation from a source of revolutionary power into a fountainhead of globalized modernity. This narrative, Dai contends, leaves little hope of moving from the capitalist degradation of the present into a radical future that might offer a more socially just world.”

Must-read: Standards Bearer? A Case Study of China’s Leadership in Autonomous Vehicle Standards

In an analysis for MacroPolo earlier this month, Matt Sheehan evaluates claims about growing Chinese influence in standard-setting organizations (SSOs) by “getting in the weeds of actual SSOs writing specific technical standards.” He drills in on a working group on test scenarios for autonomous vehicles — WG 9 in an ISO committee — which marks “the first time that China convened a WG on auto standards at the ISO.” This case study provides a nuanced take on a number of important issues, including a rejoinder to the more alarmist narratives of Chinese dominance in SSOs and a multidimensional view of the influence of the Chinese government bureaucracy in engaging with international SSOs.

Should-read: Thread filled with survey papers in machine learning and NLP field

Should-read: AI Innovation Zones in China

By Sofia Baruzzi for AmCham China: “On February 20, 2021, the Ministry of Industry and Information Technology (MIIT) issued a circular to support the creation of five new AI innovation zones. This will raise the total to eight, as three of such zones are presently set up in Shanghai (Pudong New Area), Shenzhen, and Jinan-Qingdao. The new AI innovation zones include Beijing, Tianjin (Binhai New District), Hangzhou, Guangzhou, and Chengdu. Each of them will be built to pursue a specific purpose, as explained directly by the MIIT’s circular.”

Article contains a nifty table that summarizes the key activities in each of the new zones.

Thank you for reading and engaging.

These are Jeff Ding's (sometimes) weekly translations of Chinese-language musings on AI and related topics. Jeff is a PhD candidate in International Relations at the University of Oxford and a Predoctoral Fellow at Stanford’s Center for International Security and Cooperation, sponsored by Stanford’s Institute for Human-Centered Artificial Intelligence.

Check out the archive of all past issues here & please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay for a subscription will support access for all).

Any suggestions or feedback? Let me know at chinainewsletter@gmail.com or on Twitter at @jjding99

ChinAI #145: Enlightenment via Large Language Models

Writing poetry w/ the WuDao 2.0, the World's Largest Language Model

Greetings from a world where…

we’re still all voting 5x a day for Shohei Ohtani to be in the MLB All-Star game, right?

…Please please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay support access for all AND compensation for awesome ChinAI contributors). As always, the searchable archive of all past issues is here.

WuDao Turing Test

I’ve had a few readers ask me to write about the Beijing Academy of Artificial Intelligence’s (BAAI) release of WuDao 2.0 (悟道 means to attain enlightenment), the latest Chinese version of GPT-3.

Alex Friedland, in CSET’s policy.ai newsletter, had a good summary of the English-language coverage to-date :

Chinese Researchers Announce the Largest “Large Language Model” Yet: A new natural language processing (NLP) model announced last week by the state-funded Beijing Academy of Artificial Intelligence (BAAI) is the largest ever trained. Wu Dao 2.0 has 1.75 trillion parameters — dwarfing GPT-3’s 175 billion parameters and even the 1.6 trillion parameters of Google’s Switch Transformer — and while the relationship between parameters and sophistication is not one-to-one, it is generally a good indicator of a model’s power. In addition to its high parameter count, Wu Dao 2.0 does more than just NLP — it is a multimodal system trained on 4.9 TB of text and images, meaning it can perform image recognition and generation tasks in addition to the text processing and generation tasks of traditional NLP. While BAAI has yet to publish a paper elaborating on the performance of Wu Dao 2.0, a handful of released results showed impressive performance: The model achieved state-of-the-art results on nine common benchmarks, surpassing previous juggernauts such as OpenAI’s GPT-3 and CLIP and Microsoft’s Turing-NLG.

So, what else? Without the published paper or any examples of WuDao 2.0 output, there’s only so much we can learn. Let’s try anyways, using Chinese-language coverage of the release and examples from WuDao 1.0, a much smaller model (2.6 billion parameters) released three months earlier.

How we got here: In March 2021, BAAI released WuDao 1.0, which they deemed China’s first super-large-scale model system. Note: see ChinAI #141 for another Chinese GPT-3-esque model released in May from a Huawei-led team.

In an interview with AI科技评论(aitechtalk) about WuDao 1.0, Tsinghua Professor Jie Tang, who leads the WuDao team, previewed what was coming next: “We will also propose a hundred billion-level (parameter) model this year.” Three months later, enter WuDao 2.0, clocking in at 1.75 trillion parameters.

*From my initial read of things, WuDao 1.0 is to WuDao 2.0 as GPT-2 is to GPT-3. Put simply, WuDao 1.0 introduced most of the new innovations in model training (e.g. FastMoE), and then WuDao 2.0 added many times more parameters and was trained on more data. Recall that GPT-2 was 1.5 billion parameters, which is about the size of WuDao 1.0.

This means learning more about WuDao 1.0 can help us understand its successor better. Here’s some key points from the aitech talk piece linked earlier:

  • The WuDao team emphasize the significance of cross-lingual language model pretraining. Here’s Professor Tang again: “This is very different from GPT-3. We are trying some new methods, such as fusing together pre-trained models of different languages. The fusion method is to use cross-lingual language models to connect the expert models of different languages together, so that the model can be gradually expanded.”

In that same piece, Zhilin Yang, a key member of the WuDao team and co-founder of Recurrent AI, outlined three other key achievements in WuDao 1.0. I’ve linked the corresponding arxiv papers.

  1. A more general-purpose language model (GLM). Applies one model to all NLP tasks rather than using one pre-trained language model for classifying text and another model for generating text.

  2. P-tuning: claims to be a better way to fine-tune GPT-like models for language understanding.

  3. Inverse prompting. The intuition: use the generated text to predict the prompt.

So, let’s look at some examples of WuDao 1.0 output from this WuDao Turing Test site, which I found on this Zhihu thread about the release of WuDao 2.0. Basically, it’s a platform that tests whether the average online user can distinguish between human-generated and WuDao-generated text AND images across a range of tasks including, poetry composition, Q&A, making drawings based on captions, etc.

*Remember, we don’t have any examples of WuDao 2.0 output yet, at least to the best of my knowledge, but we can expect it to have better performance than the examples below, just like GPT-3 significantly outperformed GPT-2.

With that qualification in mind, let’s read some enlightened poetry. Can you tell which one was written by a real human poet, and which one was generated by WuDao 1.0?

Here’s my attempt to not completely butcher the translations for both.

Same title and author for both: Reading《尉迟鄂公敬德》* ; Author: Bai Juyi

*I think “鄂公” is a reference to this Tang-era work by another author. Bai Juyi is a renowned Chinese poet of the Tang dynasty.

Top poem translation:

First-class thirty years ago,

To date, the temple remains at the east end.

Since ancient times, it is said of Master Wen,

Why go here to study the distressed and imprisoned.*

*楚囚 originally referred to a figure from the Chu State who was captured during the Spring and Autumn Period. Later, it became a general term that referred to imprisoned and distressed people.

Bottom poem translation:

Lying high and dwelling deep — seeing no one,

fame can be shaken off like dust.

All that’s left is a piece of Qing Shang music,*

under the moon before a breeze accompanying my old body.

*Qing Shang music (清商乐) a genre of music that emerged back in the Three Kingdoms period.

So, which one was written by Bai Juyi, and which one was written by WuDao? Reply or comment your guess, and I’ll share the answer next week.

Now, let’s turn to the Q&A section. Same questions, two answers: one by a human, one by WuDao 1.0:

Here’s the question text: Two people like each other but know nothing can come from it, should they still stay together?

Top answer:

First off ~ I can’t understand it. You’re close but you’re also distant. And then there is guilt, regret, and then ~ tears and disturbance. Ultimately, you realize it, that actually life is just as you see it at first. The ideal is not necessarily to be an outstanding person, and to get happiness proportional to what you give is probably the only happiness. After all, people can't get the perfection that fills their lives, but despite their losses at least there is no place to feel sorry about yourself. Lastly, I’ll attach the text 《Starting with “A Man and Woman Losing Contact”》. Ruth Benedict, the author of this text, is an author I often quote when writing essays. Also, thank you for your question. Thanks to everyone who reads, likes, or responds.

Bottom answer:

I’ve seen this question on occasion. I’m a little bit at loss for words. Emotional people will be like moths, throwing oneself at the fire without asking about the results. Rational people will leave in time to find the right person. I’m the type that’s halfway in between, unable to find the words and only able to cry choked tears. How can I find an answer that I couldn’t express myself for years? Ugh. I don’t think even someone who could solve Goldbach’s conjecture would be able to figure out this answer.

Let me know which one you think is the human answer! If any readers want to practice their Chinese, significantly improve upon my efforts, and play around with WuDao Turing Test site, I’d love to include better examples in following issues.

ChinAI Links (Four to Forward)

Should-reread: China reportage recognized by Pulitzer

Go reread work by teams at BuzzFeed, The New York Times, The Wall Street Journal, and the Associated Press (international reporting and investigative reporting categories) Paul Mozur said it best:

Should-read: Behind the painstaking process of creating Chinese computer fonts

In MIT Tech Review, Stanford professor of Chinese history Tom Mullaney gives us an intricate view into how designers created digital bitmaps of Chinese characters, and all the attendant challenges.

Should-read: Artificial intelligence in China’s revolution in military affairs

For Journal of Strategic Studies, Elsa Kania examines the People’s Liberation Army’s strategic thinking about AI. She argues, “The PLA’s approach to leveraging emerging technologies is likely to differ from parallel American initiatives because of its distinct strategic culture, organisational characteristics, and operational requirements.” The paper builds on her meticulous analysis the PLA’s approach to AI based on military textbooks and writings by researchers in the PLA Academy of Military Science.

Should-read: Attitudes Towards Science, Technology, and Surveillance in 49 Countries

Yiqin Fu has a new blog post that covers public opinion on science, technology, and surveillance across 49 countries. Relevant to last week’s ChinAI issue on cross-national difference in enthusiasm and optimism toward AI.

Thank you for reading and engaging.

These are Jeff Ding's (sometimes) weekly translations of Chinese-language musings on AI and related topics. Jeff is a PhD candidate in International Relations at the University of Oxford and a Predoctoral Fellow at Stanford’s Center for International Security and Cooperation, sponsored by Stanford’s Institute for Human-Centered Artificial Intelligence.

Check out the archive of all past issues here & please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay for a subscription will support access for all).

Any suggestions or feedback? Let me know at chinainewsletter@gmail.com or on Twitter at @jjding99

ChinAI #144: Artificial Challenged Intelligence [人工智障]

Plus, my first journal article: The Logic of Strategic Assets

Greetings from a world where…

we’re all voting 5x a day for Shohei Ohtani to be in the MLB All-Star game, right?

…Please please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay support access for all AND compensation for awesome ChinAI contributors). As always, the searchable archive of all past issues is here.

Reflections: Working Through Bob Work’s Thoughts on how Chinese People Think About AI

Back in April, an Atlantic Council event featured a conversation with Bob Work, former Deputy Secretary of Defense and Vice Chair of the National Security Commission on Artificial Intelligence. When discussing the U.S.’s strategic competition with China, here’s how he described the Chinese public’s views on AI (about 10 minutes into the video):

"If you go to China and talk about AI, and I haven't been to China myself, but everyone who goes talks about the optimism Chinese citizens have about an AI-enabled future. It’s not necessarily the case in the United States..I think we've been conditioned by our movies and our TVs and our books...[leading to] a more skeptical and possibly fearful view of an AI-enabled future.”

Setting aside the . . . interesting . . . method by which Work reached his conclusion, there is some evidence that Chinese people are exceptionally optimistic about AI. Based on survey data from over 142 countries, including 3700+ face-to-face interviews in China, the 2019 World Risk Poll concluded:

“Enthusiasm and optimism around the potential of AI in decision making runs highest in China, where only a small proportion of respondents believe that the development of intelligent machines or robots that can think and make decisions in the next twenty years will mostly cause harm (9%).”

Here’s the specific question and breakdown of the responses for the U.S. and China:

At first glance, there’s a clear “enthusiasm gap” between the American and Chinese publics. A deeper dive into the data, however, uncovers two caveats. How seriously one takes these caveats may depend on how much one actually cares about what people in China think about certain issues.

  1. First, as outlined in the World Risk Poll methodology appendix, Xinjiang and Tibet were excluded from the China sample. That’s about 5 percent of the population.

  2. Second, as highlighted in the table above, the China sample includes a very high number of “don’t know” responses in the China sample, whereas the US sample includes a shockingly low number of “don’t know” responses. As previous research has shown, cultural variation across countries re: willingness to respond “don’t know” can create problems for drawing strong conclusions from cross-national public opinion surveys.

Moreover, other survey results further question this supposed enthusiasm gap. One Gallup/Northeastern University poll, for instance, shows that Americans are extremely optimistic about AI: 77 percent of Americans are “mostly positive” or “very positive” about the impact AI will have on the way people work and live in the next 10 years.

Consider, as well, this cross-national survey on public perceptions of facial recognition in China, Germany, the UK, and the US, conducted by Kostka et al. It also depicted a more complicated picture of how the Chinese public thinks about AI compared to the rest of the world. Chinese support for facial recognition technology use by private enterprises (17%) was lower than the American figure (30%).

To conclude this mini-reflection, let me say a few things that I don’t know and a few things that I do know. Just like the 20% of Chinese respondents in the 2019 World Risk poll who answered “don’t know” to the question about AI, I don’t know whether the Chinese public is more optimistic about AI than the American public.

I do know that sweeping claims of an enthusiasm gap should be supported by careful assessments of evidence. I also know that if you have no idea how Chinese movies and books depict AI, you probably shouldn’t make comparative claims about how American movies and books have conditioned the American public to be more fearful of AI than the Chinese public.

I also know that there are ways to better understand how Chinese people think about AI — if you’re actually interested in the Chinese public’s views beyond leverage as a geopolitical football. Like let’s say — and this is purely a hypothetical — you’re someone who has never been to China and relies on the opinions of “everyone who goes” to form your views about the issue. It might be helpful to — and I’m just spitballin’ here, so forgive me if this idea is too crazy — read English-language translations of what Chinese people are writing about an AI-enabled future. And, who knows, that could even lead you to translations that complicate your notion of Chinese people’s unbridled enthusiasm for AI. Like this week’s feature translation . . .

Feature Translation: Artificial Challenged Intelligence [人工智障]

CONTEXT: AI in Chinese is four characters: 人工能. The third character, , stands for wisdom. Published back in January, this week’s article (link in Mandarin) is titled “人工能, 障的?” Basically, the title asks: “AI: Does the 智 character actually stand for 智障 (a phrase that means intellectual disability)?” The article runs through a bunch of examples of AI failures, making fun of AI’s capabilities.

It’s written by 当时我就震惊了, a humor blogger with 30 million+ followers on Weibo. I saw it on a WeChat link, where it had racked up 100k+ views and a lot of engagement (screenshot below):

One more piece of context: I’ve seen the phrase “人工智障,” which I translate as “Artificial Challenged Intelligence, (ACI)” appearing more frequently in Chinese media recently. See, for example, this CCTV post titled: “We want AI, not ‘人工智障’ (Artificial Challenged Intelligence.” *Note: I struggled with settling on the best translation for 人工智障. I also considered “artificial unintelligence,” but I was concerned that this option falsely equates intellectual disability with stupidity.

KEY TAKEAWAYS:

  • Just like all human beings, Chinese people can have complex views about the complex effects of AI on society. Some people are enthusiastic in some contexts. Some people are fearful in others. And, sometimes, as was the case with this week’s feature translation, some people make fun of the limitations in our “AI-enabled future.”

  • There were even some memes making fun of facial recognition-enabled surveillance. From the article:

"I tell you, anyone who violates the law should not even think of escaping my eyes!"

This includes:

Advertisements ▽

(This looks like one of those bill-board sized displays meant to stop jay-walking. The image shows that the system has identified someone from a bus ad as a suspected law-breaker)

Here’s another poking fun at a school’s blacklist system:

"I announce that this stranger has been added to our school's blacklist and will never be allowed in!"

Image text: After the school installed facial recognition. The bottom right shows a dog labeled as “stranger.”

For many more memes, see FULL TRANSLATION: AI: Does the 智 character actually stand for 智障 (a phrase that means intellectual disability)

ChinAI Links (One to Open)

Last Friday, my first journal article — The Logic of Strategic Assets: From Oil to AI (co-authored with Allan Dafoe), was published in Security Studies. It’s available here open access for all. I did a quick Twitter thread on the article:

We try to answer a thorny question: How should national leaders identify “strategic” technologies? In a post for the Washington Post’s Monkey Cage blog, we applied some of the findings from the article to the Biden administration’s worries about China’s control of “strategic technologies.”

I’ve already packed a bunch in to this week’s issue, so we’ll save a detailed breakdown for another day. Please do read and share, and let me know what you think!

Thank you for reading and engaging.

These are Jeff Ding's (sometimes) weekly translations of Chinese-language musings on AI and related topics. Jeff is a PhD candidate in International Relations at the University of Oxford and a Predoctoral Fellow at Stanford’s Center for International Security and Cooperation, sponsored by Stanford’s Institute for Human-Centered Artificial Intelligence.

Check out the archive of all past issues here & please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay for a subscription will support access for all).

Any suggestions or feedback? Let me know at chinainewsletter@gmail.com or on Twitter at @jjding99

ChinAI #143: 2021 AI Company Rankings

Plus, a history of Russian machine translation

Greetings from a world where…

I'll be presenting on my dissertation at a CISAC seminar this Wednesday. RSVP here

…Please please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay support access for all AND compensation for awesome ChinAI contributors). As always, the searchable archive of all past issues is here.

Feature Translation: eNet & Ciweek Rankings

Context: In early May, eNet and Ciweek teamed up to rank the top Chinese AI companies across various verticals, such as facial recognition and speech recognition. For each company, they derived an overall score based on the company’s market, word-of-mouth praise, and technology, etc. Both eNet and China Internet Weekly (Ciweek) are influential IT business media portals — the 2021 AI Company Rankings had 22.5k views on my WeChat link when I looked at it this weekend. I think the rankings provide a useful panoramic view of China’s AI ecosystem:

eNet & Ciweek produce rankings for many IT industries. Two others I found intriguing: a 2020 Top 50 list of Chinese industrial software companies; a 2021 Top 50 list of Chinese big data middleware companies.

Key Takeaways: To survey recent trends, let’s use the 2019 version of the rankings as a point of comparison:

  • In some areas, the top companies have maintained their lead. For instance, the top two remained the same in security applications (Hikvision and Dahua) and drones (DJI and XAG).

  • In other domains where the technical roadmap is constantly shifting, there has been significant upheaval. For example, the top 5 AI chip companies in 2019 were: Cambricon, Allwinner Technology, Intellifusion, Horizon Robotics, and Baidu. Two years later, the top 5 are: HiSilicon (Huawei’s chip subsidiary), Horizon Robotics, Pingtouge (Alibaba’s chip subsidiary), Unisoc, and Vimicro.

Forgive a little boasting before we dig in. Three years ago (ChinAI #10), I highlighted Mininglamp (明略数据) as and up-and-coming AI company. Now it’s ranked #1 in knowledge graphs, followed by Baidu and Alibaba in that category.

eNet and Ciweek slice up the AI ecosystem into 7 layers: Cognition (technology); Perception (technology); Computing (technology); Infrastructure; Intelligent Terminals; Scenario Applications; Comprehensive. What follows are some rankings I found particularly interesting:

In the cognition layer, here’s which companies are best at getting computers to understand meaning from text:

In the perception layer, two things that caught my eye: i) hardware-facing companies like Dahua and Hikvision succeeding in facial recognition software; ii) “Social responsibility” is one of the factors on which companies were ranked…

Again, here’s the full rankings (in Mandarin), which have often extend to 20 to 50 companies. I’ll leave you with the top 10 in AI chips:

Jeff Jots: The Forgetting and Rediscovery of Soviet Machine Translation

Switching out this week’s Four to Forward with some notes on an article published in the summer 2020 issue of Critical Inquiry, written by Professor Michael D. Gordin. Gordin argues that our “sense of history” with respect to new neural-net-based translation programs is “not very deep,” perhaps only reaching as far back as the birth of Google Translate in 2006. But we can rewind further back to the story of Russian machine translation in the mid-1950s. Some fascinating discoveries:

  • In 1954, a Georgetown-IBM experiment produced a translation program that generated “almost perfect translations of specially constrained Russian sentences into English every few seconds.” This made a splash, attracting the attention of Soviet researchers. In 1956, a Soviet team at the Steklov Institute developed impressive French-Russian translations.

  • A funny line about American perceptions of Soviet machine translation capabilities: “Nevertheless, the leaders of American programs were still nervous about Soviet progress, ironically compounded by the fact that their Russian-language knowledge was limited and they were not always able to read the relevant publications.” Now, why does that sound so familiar?

  • After the news about the Soviet French-Russian translation, the NSF gave a substantial grant to the Georgetown MT program: “Money began pouring into programs across the United States and its allies, but to Georgetown more than any other American institution: ‘There exists no other group in the United States, or in England for that matter, which has been working on such a broad front.’ Although some of the collaborations included work on German, French, Chinese, and Japanese, the bulk of the research, unsurprisingly, concentrated on Russian. This was a race and a competition between two superpowers, their languages, and their computers.” (emphasis mine)

  • Here’s what Washington State linguist Erwin Reifler said about MT in 1960: “It is clear that the impact of MT on human culture and civilization will by far surpass that of the invention of book printing.”

One more point to entice you to read the whole article. One of the coolest recent papers in neural machine translation is work by a Googel team that was able to do “zero-shot translation” — translate between language pairs that the model had never been trained on. The authors argue that this hints at “a universal interlingua representation in our models.” Now read what Gordin writes about how the Russian MT approach differed from the Western one:

  • “For practitioners, besides the differences in the languages, the most obvious contrast with Western research was a shift of emphasis from “direct” approaches—hard-coding a specific language pair, often in a single direction, as had been the case for Georgetown-IBM as well as the Kulagina Frenchto-Russian pilot program—in favor of what its most vigorous advocate, Igor Mel’čuk (born in 1932, only recently retired in Montreal, where he emigrated in 1977 after being fired for political dissidence), called interlingual methods. Instead of building an algorithm that would transfer morphological, syntactic, and semantic features on a one-to-one basis, thus needing to be redesigned for every new language, Mel’čuk insisted on developing a machine interlingua, the same for all the linguistic codes, into which each language would be translated into and then out of.”

Thank you for reading and engaging.

These are Jeff Ding's (sometimes) weekly translations of Chinese-language musings on AI and related topics. Jeff is a PhD candidate in International Relations at the University of Oxford and a Predoctoral Fellow at Stanford’s Center for International Security and Cooperation, sponsored by Stanford’s Institute for Human-Centered Artificial Intelligence.

Check out the archive of all past issues here & please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay for a subscription will support access for all).

Any suggestions or feedback? Let me know at chinainewsletter@gmail.com or on Twitter at @jjding99

ChinAI #142: Digitalized Public Governance: A Recoded Social Order

Greetings from a world where…

there’s always an Iowa connection in every issue

…Please please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay support access for all AND compensation for awesome ChinAI contributors). As always, the searchable archive of all past issues is here.

Feature Translation: Digitalized Public Governance

Context: This week’s feature translation comes from “Taihe Industry Observer” (钛禾产业观察), a source I’ve covered in in detail before (ChinAI #91). Taihe brands itself as a “new-model think tank of national strategic core S&T industries.” This week’s article presents one line of thinking on digitalized public governance, as if it were just a rational, efficient process. This is obviously incomplete — the expansion of the surveillance state, for example, is not covered — but it’s also important because the effect of technology on China’s governance can be multidimensional.

Key Takeaways:

To emphasize the point that there must be a foundation of good governance for big data to be useful, the article compares China and India’s response to the epidemic.

  • The author writes, “The prevention and control of the epidemic is also a big test for the government systems of countries around the world, and the Chinese government has produced the academic transcript of the top student: “One Health Code Goes Everywhere” [一个健康码走天下]. From this, the government can effectively learn the population flow in real time, predict the spread of the epidemic, and eliminate the route of transmission in time; at the same time, it can make the epidemic prevention work as humane as possible and minimize the impact on economic activities.”

  • It then asks: “Why didn’t India copy China’s homework?” As the piece points out, India did launch its own version of a contact tracing health app. In fact, as Reuters reports, this app “made India the world’s only democratic country to make the use of a contact tracing app mandatory for its citizens, according to Software Freedom Law Center.”

  • The reasons why India could not replicate the success of China’s health QR codes, according to Taihe, are twofold: 1. Inefficient collaboration among governments at different levels. It describes India as a place where “government decrees do not leave New Delhi.” 2. The diffusion of mobile Internet and digital networks has not advanced as far in India. There are still at least 400 million people in India who do not have smartphones, for instance.

  • This line of argument seems to align with recent nationalist propaganda that highlights China’s success in curbing coronavirus infections while highlighting the failings of India. As I note in comments in the full translation Google doc, the analysis is not very rigorous, and some of the points about lack of access to Internet could equally apply to China.

The vision of digital governance in China can be mundane and ordinary, not just “scary digital authoritarianism”

  • Here are three examples from the piece on digital tehc and streamlining bureaucratic requirements: a) hospital forms and approvals in Rizhao (“The City Without Proof”); b) construction project approvals in Jinan, which originally involved “17 departments, preparing more than 500 materials, completing 74 approval items, filling out 81 sets of forms, with more than 1,900 form elements. After the reform, you only need to fill in 4 sets of forms and prepare 75 materials, and it only takes 8 working days to complete the process at the fastest;” c) a “digital village brain” in Chejia, in collaboration with Alibaba’s DingTalk, which provides regular notifications for the elderly to get haircuts.

  • The models for China’s digital reforms in government affairs are countries like the U.S. and South Korea. Taihe references President Obama establishing high-level steering groups on big data strategy and elevating data into a strategic resource as important as land, labor, and capital. It was funny reading that paragraph because I’ve seen so many English-language articles that reference China’s leadership in essentially the same way. The piece also cites South Korea as a country that ranks very highly on the United Nations e-government index, whereas “China's e-government development index ranks 45th, and the online service index ranks 12th, which is not yet in the forefront.”

  • I’ll leave you with this passage: “The legacy of China's ‘omnipotent government’ [全能型政府] is that it ‘oversees too much’ [管的太多], behind which invisible entanglements of power and interests have formed. Under the traditional system, the data of various departments have formed closed ‘information islands’ and ‘data chimneys.’ Therefore, the core problem to be solved by digital government is not at the technical level, but to improve and optimize governance efficiency, from an all-powerful government to a ‘limited government and service-oriented government.’ Digitization is not simply the transmission of information, but the logical reconstruction of public governance.”

***For a reference to Dubuque, Iowa as the world’s first smart city, as well as a table of various companies’ smart city projects, see FULL TRANSLATION: Digitalized Public Governance: A Recoded Social Order

ChinAI Links (Four to Forward)

Must-read: A Guardian longread on “China, the US and me”

I can’t remember reading a piece that spoke to my experience more deeply than this one. Angela Qian writes about losing her grandfather this past year and having to attend his funeral via WeChat video call. She writes about collecting her family’s oral histories to make sense of where she came from. She writes about her vague dreams of living in China for a few years, even though she knows it will probably never happen. She also writes passages like these:

Since my grandparents passed, their children have filled the WeChat group with messages to them. They address my grandparents’ spirits directly, sometimes with heartfelt messages of longing, sometimes with regret, sometimes diary entries with news, complaints and gossip. For the parts where my Chinese wasn’t up to it, there was in-app machine translation available, its innocent English, though fraught with errors, achieving a kind of poetry.

One day in September, late in the summer after my grandparents passed, my aunts wrote an ordinary series of messages. The English translations are full of optimistic mistakes. In one, I, the “little granddaughter”, have published a “book” – in reality, just an article – and in another, a grandchild has bought a “villa with a pool” – in reality, a small house in Arkansas. It had been a difficult year for our family, and even the cockroaches, one of my aunts wrote, were bullies.

Should-read: Open Source R&D Accelerates Digital Transformation, and China’s Open Source Market is Heating Up (in Mandarin)

Considered translating this piece by jiqizhineng for this week’s issue. What happens when closed borders, in the form of tech restrictions, meet “open source”? This piece provides an overview of GitHub (which does not have a local server or team in China) vs. Gitee (a Chinese GitHub alternative) vs. Gitlab (a U.S.-based DevOps platform that has licensed its tech to an independent Chinese company).

Should-listen: The fightback against facial recognition

Had a good conversation with Cindy Yu on The Spectator’s “Chinese Whispers” podcast about resistance and public backlash to intrusive facial recognition applications in China. It was especially good to be joined by Jeremy Daum, whose China Law Translate blog is an essential resource. See, for example, his recent post on China’s draft security standards for facial recognition data.

Should-read: China’s Artificial Intelligence Industry Alliance

A CSET data brief by Ngor Luong and Zachary Arnold, which “provides a high-level assessment of the role of industry alliances in China’s AI strategy and closely examines one major group: the Artificial Intelligence Industry Alliance (AIIA) [中国人工智能产业发展联盟]. Through the AIIA, the Chinese government aims to foster collaboration among local governments, academic institutions, and companies. In some cases, the Chinese state uses the AIIA to ‘pick winners,’ choosing among favored companies in the AI industry to receive government subsidies. . . These conclusions draw on open-source data, collected and annotated by CSET, on the AIIA’s hundreds of members, including their websites, media coverage, and commercial databases.”

Thank you for reading and engaging.

These are Jeff Ding's (sometimes) weekly translations of Chinese-language musings on AI and related topics. Jeff is a PhD candidate in International Relations at the University of Oxford and a Predoctoral Fellow at Stanford’s Center for International Security and Cooperation, sponsored by Stanford’s Institute for Human-Centered Artificial Intelligence.

Check out the archive of all past issues here & please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay for a subscription will support access for all).

Any suggestions or feedback? Let me know at chinainewsletter@gmail.com or on Twitter at @jjding99

Loading more posts…