ChinAI #155: Microsoft Translator Takes on Classical Chinese

What is "Never gonna give you up" in Classical Chinese?

Greetings from a world where…

Tokyo wasn’t built in a day

…As always, the searchable archive of all past issues is here. Please please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay support access for all AND compensation for awesome ChinAI contributors).

Feature Translation: Testing Microsoft's Classical Chinese AI Translation

Context: On August 25, 2021, Microsoft released a model that translates Classical Chinese (AKA literary Chinese), a written language of ancient times. QbitAI writers played around with the new translation service, testing its performance against Baidu’s translator.

For those interested in more on Classical Chinese and AI, one of my favorite past issues (ChinAI #66: Autumn Chrysanthemums on the Bridge) covered Classical Chinese poetry generation by Huawei’s AI poet Yuefu. Ru-Ping Chen contributed some beautiful translations of the AI-generated Classical Chinese poems.

Key Takeaways: QbitAI tested Microsoft Translator’s Classical Chinese capabilities against Baidu Translator, which was the first to apply machine learning to Classical Chinese translation (Baidu has applied for patents in this area too). They evaluated the two translation engines on three key points:

  1. The huwen [互文] rhetorical method. Under this grammatical structure, if you’re trying to say both A and C have B and D, then you can write it as A has B, and C has D. The example item: 秦时明月汉时关. Meaning: The bright moon and border pass in the Qin and Han dynasties. In this example. A and C are the Qin and Han dynasties. B and D are the bright moon and the pass. Microsoft Translator had the correct translation, but Baidu translates the sentence as the moon in the Qin dynasty and the border pass in the Han dynasty, failing to grasp the huwen method.

  2. Flexible use of parts of speech. Test item: 春风又绿江南岸. In this example, the character 绿, which usually means green (adjective), is used as a verb. Baidu Translator gets it right: The spring breeze blows through and makes everything green. Microsoft Translator spits out:

As the screenshot shows, Microsoft Translator did comprehend that 绿 was used as a very, but it left an extra "可是” [ which means “but”] at the end of the translated text. Apparently, the “but” makes sense when you add the second half of the line in the original poem, so this must be an issue with training models to understand when to cut or include conjunctions.

  1. Inversion (倒装) — sentences in which the object precedes the verb. Example: 我孰与城北徐公美? Here, the sentence’s meaning is: Between me and Xu Gong of Chengbei, who is more beautiful? Both translation engines got this question right.

Final score is . . . a 2:2 tie. Full translation includes more exercises with Microsoft’s Classical Chinese translations. Some of the attempts to translate English to Classical Chinese are especially fun. The translation for “Never gonna give you up” [用不舍汝] was really clean. As for the Classical Chinese version of Yeats’s famous poem “When You Are Old” . . . well, not so much. Take a look! *To readers more versed in Classical Chinese, please feel free to correct and edit my rough translations. Hopefully, I got the general concepts down.

***FULL TRANSLATION: Testing Microsoft’s Classical Chinese AI Translation

ChinAI Links (Four to Forward)

Should-read: China’s lonely hearts reboot online romance with artificial intelligence

This WashPost article, by Alicia Chen and Lyric Li, studies the demand for AI companions from China’s young adults:

Launched in 2014 as a young woman with a diminutive nickname meaning “Little Ice,” Xiaoice has grown so popular that she performs 14 human lifetimes’ worth of interactions each day, said Li Di, CEO of Xiaoice, which Microsoft spun off in 2020. She’s busiest from 11:30 p.m. to 1 a.m., when users unload their day’s experiences or grow emotional. Xiaoice has 10 million active users in China.

Should-read: Easy as PAI (Publicly Available Information)

Jack Poulson of Tech Inquiry has released a report detailing a system that tracks the complicated financial flows involved in government procurement. This system:

can be used to map out previously unreported deployments of emotion recognition, facial recognition, and location tracking by the U.S. military in consortia involving prominent think tanks (at least partially coordinated through the DC-area office of the Naval Postgraduate School’s Remote Sensing Center). We also map out the subcontracting network for Project Maven, as well as a related Secure Unclassified Network (SUNet) “Publicly Available Information (PAI) enclave” involving Palantir and a preceding “Project CICERO” with USSOCOM’s J2 Intelligence Directorate.

Should-read: Global Competition for Leadership Positions in Standards Development Organizations

I know many readers recognize the significance of international standard-setting for the development of information and communication technologies. This working paper, by Justus Baron and Olia Whitaker, is the best work I’ve read on Chinese influence in global standard development organizations.

Should-read: The most translated books from every country in the world

A list compiled by Danka Ellis for bookriot. What are your favorite translated books? At the moment, I’m hooked on Blindness by the Portuguese author José Saramago.

Thank you for reading and engaging.

These are Jeff Ding's (sometimes) weekly translations of Chinese-language musings on AI and related topics. Jeff is a PhD candidate in International Relations at the University of Oxford and a researcher at the Center for the Governance of AI at Oxford’s Future of Humanity Institute.

Check out the archive of all past issues here & please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay for a subscription will support access for all).

Any suggestions or feedback? Let me know at chinainewsletter@gmail.com or on Twitter at @jjding99

ChinAI #154: Breaking Down SenseTime's 672-page IPO Prospectus

Asia's Largest AI Software Company Files for IPO in Hong Kong

Greetings from a world where…

I think I could be satisfied eating ramen for every meal for the rest of my life

…As always, the searchable archive of all past issues is here. Please please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay support access for all AND compensation for awesome ChinAI contributors).

Feature Translation: Sensetime’s IPO Prospectus

On August 27, SenseTime filed to go public on the Hong Kong exchange. The location itself is newsworthy as it also considered a a U.S. or mainland listing. This week, instead of translating a Chinese-language document, I’ll try to interpret Sensetime’s IPO prospectus, a lengthy financial document filled with fine print. This QbitAI article (in Mandarin) helped guide my reading of the document.

SenseTime is Asia’s largest AI software provider by revenue. It demonstrated consistent growth in the past four years of revenues (RMB): 1.8b in 2018; 3.0b in 2019; 3.4b in 2020; 1.6b in first half of 2021 (861m revenues in first half of 2020)

  • The “largest in Asia” claim is based on market research of 2020 figures by Frost & Sullivan, a firm commissioned by Sensetime to supply an industry report for the listing.

  • SenseTime is also the largest computer vision software provider in China, which boasts the second-largest AI software market, after the United States. China’s AI software market is forecasted to experience impressive growth: “The AI software market in China is expected to grow at a CAGR (compound annual growth rate) of 41.5% from RMB29.5 billion in 2020 to RMB167.1 billion in 2025, which would make it the fastest-growing among major markets globally. The contribution of AI software to the China software market is projected to rise from 9.0% in 2020 to 24.1% in 2025 (p. 122),” report Frost & Sullivan, though much depends on their definition of “AI software.”

All about the R&D

  • SenseTime’s R&D expenditures have increased each year since 2018, and the total R&D expenses exceeded revenues during the first half of 2021: 1.77b

  • Per Frost & Sullivan’s industry report, in the 2015-2021 period, SenseTime published the most papers in CVPR, ICCV and ECCV — the top three computer vision conferences.

  • They plan to allocate 60% of financing raised form this IPO to R&D investment. Interestingly, 1/3 of that planned R&D spending will be for supercomputing centers and AI chips.

What could hold SenseTime back?

  • Negative effects of covid: hampered international growth and delayed deployment of some smart city operations as city managers prioritized counter-pandemic efforts. They claim that the pandemic will, in the long run, “accelerate the digital transformation of enterprises and city management, indicating more opportunities for the AI industry, especially under China’s new national policy of ‘New Infrastructure’ (p. 350).”

  • Overly concentrated customer base: in first half of 2021, largest customer accounted for 23% of revenues, and five largest customers accounted for 59%. This mirrors a trend from IPO listings by other Chinese computer vision startups (ChinAI #147). Yitu, one of the big four CV startups, reported that their largest five customers also accounted for 60% of sales in first half of 2020. The largest customer of DeepGlint, a second-tier CV startup, accounted for about 1/3 of their revenues. I think this is an important indicator for the broader trajectory of AI as a general-purpose technology. At some point, shouldn’t we start seeing a more dispersed customer base as the technology diffuses across a wide range of industries?

  • Entity List: Interestingly, the document argues that the Entity List doesn’t apply to SenseTime entities that are legally distinct from Beijing SenseTime (its subsidiary, which was named in the Entity List addition). Still, it has put in place export control compliance measures for the entire company. It claims “the Entity List Addition has not had any material adverse impact on our business (p. 274).” The document never mentions SenseTime’s engagement with ethnic profiling in China’s Xinjiang region — the underlying justification for its blacklisting.

AI ethics portion of the prospectus was underwhelming:

  • They don’t tell us who’s on their AI Ethics Council, which leads its responsible AI initiatives. All we get is that it “comprises six members, including two external advisors, who are academic experts in the field of AI ethics, and four senior management members. (p. 259)” Why not name the members? We can only speculate, but it’s hard not to see parallels with Megvii’s initial prospectus document for its (withdrawn) Hong Kong listing. In that filing, Megvii also touted its 6-member AI ethics committee, naming Emmanual Lagarrigue, Schneider Electric’s Chief Innovation Officer, as an external advisor. What happened? When IPVM followed up with Emmanuel Lagarrigue, he said he was approached to join but ultimately declined the invitation before the committee was ever assembled.

  • They list 5 main achievements as evidence of “high standards on data security, privacy, and ethics for sustainable AI (p. 199)”: The only one of substance, in my opinion, is the ISO/IEC certifications for various privacy and information security practices. Two others mention their standardization work in areas unrelated to AI ethics. Another achievement is their “Code of Ethics for AI Sustainable Development” which was recognized by the UN — it’s 12 pages of boilerplate, half of which are taken up by stock images. The last one is about AI textbooks to promote education.

  • To be fair, there’s some promising stuff here: collaboration with Shanghai Jiao Tong University on a joint research center that studies algorithmic bias, chairing standards working groups on AI ethics and AI risk assessment, etc. What’s most glaring is what’s missing: i.e., any discussion of the use of facial recognition for ethnic profiling, which has been used to surveil Uyghurs nationally.

Lastly, a few notable numbers on compute:

  • SenseTime’s growth in total computing capacity: 0.3 exaFLOPS, 0.7 exaFLOPS, 0.8 exaFLOPS and 1.2 exaFLOPS as of December 31, 2018, 2019 and 2020 and June 30, 2021, respectively.

  • They are building a large-scale AI computing and empowerment data center in Shanghai, which is expected to launch in early 2022 and quadruple their total computing capacity.

ChinAI Links (Four to Forward)

Should-read: A dog’s inner life — what a robot pet taught me about consciousness

In The Guardian longread, Meghan O’Gieblyn examines eternal questions about consciousness as she trains her Aibo robot dog:

Despite these differences between minds and computers, we insist on seeing our image in these machines. When we ask today “What is a human like?”, the most common answer is “like a computer”. A few years ago the psychologist Robert Epstein challenged researchers at one of the world’s most prestigious research institutes to try to account for human behaviour without resorting to computational metaphors. They could not do it. The metaphor has become so pervasive, Epstein points out, that “there is virtually no form of discourse about intelligent human behaviour that proceeds without employing this metaphor, just as no form of discourse about intelligent human behaviour could proceed in certain eras and cultures without reference to a spirit or deity”.

Should-read: Semiinsights [半导体行业观察] (in Mandarin)

I know a lot of readers are interested in the semiconductor industry, so flagging this platform, which covers the global trends in semiconductors. Might circle back to semiinsights for future translations, as a scan of articles published in just the past month reveals a lot of quality analysis, including this article on the worsening of China’s chip talent shortage.

Should-read: Chinese-Russian Collaboration in AI

In a CSET issue brief, Margarita Konaev et al. evaluate China-Russia collaboration in AI. Their central finding:

“There has been a steady increase in AI-related research collaboration between the two nations and an even steeper rise since 2016. This upward trend mirrors the global expansion of AI research, propelled by increased computing power and the availability of large datasets. The overall number of joint Chinese-Russian AI-related publications, however, remains relatively low—whether as a share of each country’s scholarly output or compared with the number of papers researchers from China and Russia co-authored with researchers from the United States over the same period of time. The AI-related investment data tell a similar story—an upward trend in Chinese-Russian investment deals over the past five years, but the overall value remains relatively low.”

Should-listen: China’s Great Science Leap

Produced by Melanie Brown for BBC Radio 4, this two-part program examines China’s growing scientific prowess in bio-engineering, computing and space. I had a chance to contribute to the first episode in the series.

Thank you for reading and engaging.

These are Jeff Ding's (sometimes) weekly translations of Chinese-language musings on AI and related topics. Jeff is a PhD candidate in International Relations at the University of Oxford and a researcher at the Center for the Governance of AI at Oxford’s Future of Humanity Institute.

Check out the archive of all past issues here & please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay for a subscription will support access for all).

Any suggestions or feedback? Let me know at chinainewsletter@gmail.com or on Twitter at @jjding99

ChinAI #153: The Translator's Dilemma

A Douban Controversy on Machine Translation Sheds Light on China's Translation Dilemma

Greetings from a world where…

A great age of literature is perhaps always a great age of translations

…As always, the searchable archive of all past issues is here. Please please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay support access for all AND compensation for awesome ChinAI contributors).

Feature Translation: Douban’s “One Star Movement” and the Translator’s Dilemma

Context: In my Year Three of ChinAI post, I wanted to make the case that stories about machine translation are newsworthy, and that they provide a needed counterweight to the U.S.-China AI race framing that permeates coverage and analysis on China’s AI development.

This week’s feature translation is an April 2021 article by China News Weekly (中国新闻周刊). It starts with a story about the Chinese translation of Benedetti’s The Truce. On the Douban platform (think: Goodreads), one reviewer criticized the text for exhibiting “heavy marks of machine translation (jifan[机翻]).” This netizen gave the translated text a two-star bad review. The translator replied: “jifan is a matter of professional ethics, and I carefully translated this word by word.”

Things escalated from there. A friend of the translator tracked down the Douban reviewer’s school and sent the administration an email. The school sought out the netizen and talked with them. Later, the netizen issued an “apology statement.” After this came out, other netizens got angry and launched a “One Star Campaign” on Douban to give the Truce translation low ratings. Ultimately, Douban suspended the rating of Truce, and discussions about jifan [机翻] have also disappeared.

Key Takeaways: this mini-drama provides a window into China’s translation dilemma

  • The jifan concept: jifan [机翻] is the abbreviation of machine translation [机器翻译], which generally refers to translation through translation software, as opposed to manual translation. To describe a translation as jifan [机翻] refers to the rigid expression of the translated text.

  • Zhang Butian, a professor at Tsinghua University who has translated many volumes on the science history, defends machine translation as a useful basis for translation. He cautions, “However, after using machine translation, it won’t work if you don’t change it. If machine translation can replace 80%, the remaining 20% ​​will be a test of the translator. The struggle with that 20% will determine if a book is translated well or not.”

  • Zhang gives an example from his own efforts to translate Descartes's Principles of Philosophy from the original Latin. In the process of comparing two English versions and a German version, he noticed that there was a word in Latin — studio — which the two English versions translated as “study” (研究、学习) and “effort” (努力、费力). The German version translated it as “studium” (研究、学习). Zhang went with “effort” after checking the context of the original text, Descartes is talking about the concept of human talent, which means "you can have it easily without effort.” The translations that used “research” were flawed.

  • The bar is high but the translators are few. Why? Three key reasons: low remuneration, translations don’t count in the academic evaluation system, and marketing of translations is not proportional to the quality of translations. Full translation gives great details, including an interview with Li Xia, who is an editor for the Commercial Press publishing house in China. Basically, translation has become an event similar to charity. Publishing houses rely on folks who have the skills and are willing to do it as a hobby.

  • “Most translators can only find their sense of accomplishment at the personal, spiritual level,” the article sums it up. The concluding passage is quite moving: “Not long ago, an article titled ‘Wen Jieruo: 93 years, growing old alone,’ published in a WeChat public account (谷雨实验室), stated: ‘If calculated by the manuscript fee, as one of the most outstanding translators in China, her manuscript fee is 80 RMB for a thousand characters, and a total of 32 RMB is earned for 8 hours of translation this day.’

    ‘It's boring to count money’ is how the 93-year-old translator responded.”

It may be boring to count, but money does help lubricate the translation process. So, now feels like a good time for this reminder: Please please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay support access for all AND compensation for awesome ChinAI contributors).

***FULL TRANSLATION: Douban’s “One Star Movement” and the Translator’s Dilemma

ChinAI Links (Four to Forward)

Must-read: Shifting Narratives and Emergent Trends in Data-Governance Policy

Amba Kak and Samm Sacks synthesize key trends in data governance policy in India, China, and the European Union. On data localization, they write:

Meanwhile, as US policymakers consider new tools to restrict access to American citizens’ data, the Chinese government has signaled that it may be amenable to allowing more flexible data flows out of the country. The draft Data Security Law mentions the free flow of data twice…According to analyst Xiaomeng Lu, the Chinese government may be more amenable, given the economic slowdown occasioned by COVID-19, to allowing more cross-border data flows as part of a broader effort to attract much-needed foreign investment. Broad data-localization requirements remain in place under the Cybersecurity Law regime—and anecdotal conversations with company executives in China suggest the government will continue to require that significant swaths of data be stored on local servers. Nevertheless, it is worth highlighting the paradox that at the very moment when US policymakers may be shifting more toward an acceptance of a form of US data sovereignty, in China, at least at the margins, some voices may be pulling in the opposite direction, primarily driven by a growing recognition of the economic utility of data. These developments prompt us to re-evaluate binary frames of analysis (such as open versus closed) which, over time, produce and sustain their own blind spots. The analysis in the body of this report demonstrates that flattening data policy into the “China model” or the “US model” (or even the European so-called “third way”) obscures both the contradictions within these national policies, and overlooks their inter-dependencies.

Should-read: QbitAI report on AI-based Judgements of Worker Productivity (in Mandarin)

Xsolla, a Russia-based gaming payment provider (used by Steam and Epic Games Store) recently fired 150 employees after conducting an AI-based productivity audit of the company. As this QbitAI report relates, the news became a hot search topic on Weibo. The report also connects this news to developments in China to monitor employees and prevent them from loafing around.

Should-read: The World’s Largest Computer Chip

For The New Yorker, Matthew Hutson profiles Cerebras, a U.S.-based startup that has a unique approach to building AI accelerator chips: make the largest computer chip in the world. The coolest part of this piece is how it distills technical intricacies into language people like me can understand. Here’s a passage on why mega-chip designs handle memory better:

In describing the efficiencies of the wafer-scale chip, Feldman offered an analogy: he asked me to imagine groups of roommates (the cores) in a dormitory (a chip) who want to watch a football game (do computing work). To watch the game, Feldman said, the roommates need beer stored in a fridge (data stored in memory); Cerebras puts a fridge in every room, so that the roommates don’t have to venture to the dorm’s common kitchen or the Safeway. This has the added advantage of allowing each core to work more quickly on different data. “So in my dorm room I can have Bud,” Feldman said. “And in your dorm room you can have Schlitz.”

Should-read: Engrave Danger - An Analysis of Apple Engraving Censorship across Six Regions

In a report published by The Citizen Lab, Jeffrey Knockel and Lotus Ruan investigate Apple’s content control of its product engravings service, which allows customers to print messages on the exteriors of products. They find:

Within mainland China, we found that Apple censors political content including broad references to Chinese leadership, China’s political system, names of dissidents, independent news organizations, and general terms relating to democracy and human rights. Moreover, we found that much of this political censorship bleeds into both Hong Kong and Taiwan. Some of the censorship exceeds Apple’s legal obligations in Hong Kong, and we are aware of no legal justification for the political censorship of content in Taiwan.

Thank you for reading and engaging.

These are Jeff Ding's (sometimes) weekly translations of Chinese-language musings on AI and related topics. Jeff is a PhD candidate in International Relations at the University of Oxford and a researcher at the Center for the Governance of AI at Oxford’s Future of Humanity Institute.

Check out the archive of all past issues here & please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay for a subscription will support access for all).

Any suggestions or feedback? Let me know at chinainewsletter@gmail.com or on Twitter at @jjding99

ChinAI #152: Mining ChinAI News from 775 articles and 225 events

Wendy Liu translates monthly summary report on AI-related developments in China

Greetings from a world where…

the one constant through all the years is baseball

…As always, the searchable archive of all past issues is here. Please please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay support access for all AND compensation for awesome ChinAI contributors).

Feature Translation: SciTouTiao Monthly Report on AI Development July 2021

I am very grateful to Wenmiao “Wendy” Liu for contributing this week’s feature translation. She’s a product delivery manager at Kyros.AI, an innovative AI platform for education management, and former ML geophysicist at Schlumberger Oilfield Services. As always, I welcome and compensate contributors — feel free to reach out if you’re interested!

Wendy translated the July AI Development Monthly Report by SciToutiao (学术头条), which draws on insights from AMiner, a data mining service that scans academic publications and news sites. From this type of scanning, the July report was derived from 774 AI-related news articles and 225 AI-related events. SciToutiao could then isolate the July events that attracted the most attention, such as the World AI Conference in Shanghai on July 8th and the a robotics summit in Ninbo on the 14th. This could prove to be a useful resource to get a panoramic view at developments in China’s AI ecosystem.

*Interesting note: one of the creators of the AMiner platform is Tsinghua Professor Jie Tang, who leads the WuDao large-scale model team. ChinAI #145 featured a WuDao Turing Test that was linked from the AMiner site.

Key Takeaways:

  • One of the most useful features of this monthly recap is that it flags the latest reports on China’s AI development published in the past month. Ones that caught my eye: CB Insights report on China’s digital industry, Huawei Cloud’s White Paper on AI-empowered smart cities, and an AI Standardization White Paper.

  • On July 13, the Internet Society of China released the “China Internet Development Report 2021.” One finding: the size of China’s AI industry increased 15% from the previous year. There are 1,454 AI companies in China, ranking second globally, behind the U.S. which has 2257.

  • The monthly recap also highlights competition results. For instance, the Pengcheng Lab system, based on Huawei’s Ascend AI technology, had the overall highest score on the IO500 ranking at the International Supercomputing Conference. HPCwire describes this as “an increasingly watched benchmark.” Recall that the Pengcheng Lab played a key role in training Huawei’s large-scale pre-trained language model (a GPT-3-esque model).

  • Finally, the monthly recap covers comments by leading AI researchers. Wendy summarizes one such comment: Dai Qionghai, Chairman of Chinese Association of Artificial Intelligence (CAAI), recently commented that the training of top AI talents of China should start from primary/middle schools, a remark along the same line of President Deng Xiaoping’s famous quote from 1984: “computer literacy should start with children.”

Thanks again to Wendy for her great work on this, and check out the full report below:

***FULL TRANSLATION: SciToutiao Monthly Report on AI Development July 2021

ChinAI Links (Four to Forward)

Should-read: Top Scholar Zhou Hanhua Illuminates 15+ Years of History Behind China’s Personal Information Protection Law

Published in DigiChina back in June, this interview was conducted in Mandarin by Yehan Huang and Mingli Shi, and then translated back into English. They interviewed Zhou Hanhua, a Chinese Academy of Social Sciences legal scholar who has shaped Chinese privacy law for a long time. The Q&A is a valuable opportunity to engage with the Chinese policy debate on personal information protection in English.

Should-read: AI researchers in China want to keep the global-sharing culture alive

I was searching for more background on AMiner and found this 2019 article by Sarah O’Meara, part of Nature’s Spotlight on AI in China series. The piece discusses Aminer in the context of China as a key node in global collaborations in AI. One interesting nugget: Tsinghua graduate Yangqing Jia developed Caffe, a key open-source deep-learning framework, during his PhD studies at the University of California, Berkeley.

Should-read: Why China’s crypto cowboys are fleeing to Texas

For Rest of World, Meaghan Tobin tells a fascinating tale about how Poolin, a Chinese cryptocurrency mining company, is exploring Texas as a new site. China’s crackdown on crypto mining and trading has caused Chinese miners to look for alternative locations.

Should-read: Full Translation of Government Guidance to Create a ‘Pioneer’ Zone in Shanghai for Key Industries

This featured in an earlier Around the Horn issue of ChinAI. CSET’s translation team, led by Ben Murphy, has translated this document in full:

Shanghai Economic Reform Opinions: Opinions of the CCP Central Committee and the State Council on Supporting High-Quality Reform and Opening Up in Pudong New District and Making it into a Leading Area for Socialist Modernization Construction. These "Opinions," made public by the Communist Party in July 2021, outline new policies for Pudong New District in Shanghai, long a trendsetter in Chinese economic reform. The document introduces several new measures to liberalize Shanghai's capital market, including the STAR Market, where many Chinese AI companies are listed. The "Opinions" also strengthen Shanghai's university- and research laboratory-based technology transfer agencies and call for aggressive recruitment of overseas tech talent.

Thank you for reading and engaging.

These are Jeff Ding's (sometimes) weekly translations of Chinese-language musings on AI and related topics. Jeff is a PhD candidate in International Relations at the University of Oxford and a researcher at the Center for the Governance of AI at Oxford’s Future of Humanity Institute.

Check out the archive of all past issues here & please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay for a subscription will support access for all).

Any suggestions or feedback? Let me know at chinainewsletter@gmail.com or on Twitter at @jjding99

ChinAI #151: Passing Off Fish Eyes as Pearls

The Chaos of China's Compute Rush

Greetings from a world where…

ultimate frisbee could become an Olympic sport one day

…As always, the searchable archive of all past issues is here. Please please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay support access for all AND compensation for awesome ChinAI contributors).

Feature Translation: Behind the craze to build AI computing centers

Context: Last month, QbitAI (量子位) published a report on China’s effort to build computing centers to accelerate AI applications. Not many people voted for this article in the most recent Around the Horn, but I thought it was cool. Insert grandiose statement about the purpose of curators to show us what we really want as opposed to what we say we want. I say the people really want technical specifications about computing clusters!

Key Takeaways:

The goal: Let the computing power flow like tap water (让算力像自来水一样流淌)

  • AI computing centers as “essential infrastructure” in all parts of the country. As the article reports, Xi’an, Xuchang, Nanjing, Hangzhou, Guangzhou, Dalian, Qingdao, Changsha, Taiyuan, Nanning, are among the cities that have started building or are planning to build computing centers to support AI applications.

  • Four such computing centers have already been built. I think the PCL supercomputing center in Shenzhen (ChinAI #73) is one of them? ***Bonus points to the ChinAI reader that can track down the others.

The problems are twofold:

  • 1) Price chaos — In one city, the construction cost for a computing center with performance of 100 PFlops (100P) at 16-bit precision is 75 million RMB. In another city, a computing center with the same specifications costs 450 million RMB, a difference of 6.2 times.

  • 2) Confusion over how to benchmark compute clusters — different applications have varying requirements for precision. For instance, AI model training mainly uses 32-bit single-precision; AI inference (model implementation) can use 16-bit or lower. By contrast, some scientific calculations, such as weather forecasting or drug discovery, require higher 64-bit double precision. In the current rush to build computing centers, there’s been confusion over these different precision requirements. Specifically, the piece calls out the inflated prices for computing centers with high peak performance metrics (measured in PFlops) but low precision: these are deceptive gimmicks that “pass off fish eyes as pearls” [鱼目混珠] and can’t meet industrial needs.

  • The report warns, “If these two problems are not resolved, the smart computing centers built will not match the true value in price, nor can it meet the corresponding demand, which will inevitably cause waste of resources and hinder the development of the industry.”

What’s the potential solution?

  • The report emphasizes standardization and stable benchmarks, specifically higlighting efforts by the Chinese Academy of Sciences AI Industry-University-Research Innovation Alliance [中科院人工智能产学研创新联盟]. At the World AI Confernece 2021, this CAS alliance released a new generation AI computing platform, which aimed to set the standard for intelligent computing centers.

  • The key here is that many AI application scenarios, including material design and drug discovery, require a combination of AI and high-precision scientific computing. Toward that end, this platform “supports a multi-chip combination of CPUs, general-purpose GPUs, and dedicated AI acceleration chips, providing computing power covering various precisions, and can be competent for simulation, training, inference and other AI full-chain application requirements.

  • As for stabilizing prices, the CAS alliance gave out this guidance: “After integrating a series of factors such as storage, energy consumption, development, customization, and data scheduling, as well as plugging in clear algorithm standards, for an intelligent computing center with 5P double-precision computing power (64-bit), 25P single-precision computing power (32-bit), and 100P half-precision computing power (16 bits), the resulting infrastructure price is about 100 million-150 million RMB.”

Dig Deeper

Okay, I know we’re already in the weeds but let’s drill down even more and add some historical context. I think we can uncover a similar theme — impressive top-line numbers paired with underutilization — in China’s previous efforts to build supercomputers.

  • See this 2010 Science article on Dawning 5000A, which was once China’s fastest supercomputer: “Only 1% of the applications on China’s previous speed champ, the Dawning 5000A at the Shanghai Supercomputer Center, use more than 160 of the machine’s 30,720 cores. For comparison, 18% of the applications running on Oak Ridge’s Jaguar XT5 use 45,000 to 90,000 of the machine’s 150,162 cores, according to a presentation at last year’s announcement of China’s top 100 fastest computers. ‘A supercomputer without software is like a wild horse without a harness,’ says Zhang Yunquan, a parallel computing researcher at the Institute of Software of the Chinese Academy of Sciences in Beijing. ‘Its horsepower is wasted.’”

  • Brian Tsay, in a 2013 SITC Bulletin piece, writes, “it is not easy to write code that can actually utilize all the computing power that an HPC (high-performance computing) system has to offer. The result is that supercomputers can be left idle for long periods of time, raising the question of whether China even needs greater computing capacity.”

  • My favorite passage from Brian’s piece, which discusses China’s Tianhe-2 (TH-2), once the fastest supercomputer in the world: “For example, in response to the notion that the TH-2 will be used to improve China’s automobile industry, a professor at Tsinghua University’s department of automobile engineering commented, ‘I have never heard of Toyota or Daimler or any major carmaker using a supercomputer to design their cars [...] It is like running after a chicken with an axe. It is quite unnecessary.’”

***FULL TRANSLATION: Behind the craze to build AI computing power centers: who is spending pointless money

ChinAI Links (Four to Forward)

Must-read: Survey of Machine Learning Researchers on Ethics and Governance of Artificial Intelligence

Informative and surprising survey results published by a team of GovAI and UPenn researchers: Authors from NeurIPS and ICML (two top ML conferences) entrust international and scientific organizations to manage AI development; they place moderate trust in western tech companies and much lower trust in Chinese tech companies. One puzzle: when asked which AI governance challengers they were most worried about, researchers rated U.S.-China competition as the lowest of all risks. For more context, see this Twitter thread by Markus Anderljung, one of the co-authors.

Should-read: China Information Operations Newsletter

Really impressed by latest issue of the monthly China Information Operations Newsletter, edited by Hannah Bailey and Hannah Kirk, based at the Programme on Democracy and Technology at Oxford University. They not only digest the latest news on information operations but also place it in conversation with relevant academic journal articles and books.

Should-read: In US and China, Competition Rhetoric Meets Inequality Concerns

Every few months or so, I try to catch up on translations I’ve missed from various sources, including China Digital Times. This one caught my eye. Back in April 2021, John Chan wrote about this “rare glimpse at the sentiment of critical Chinese netizens”:

Chinese netizens made use of a rare window of opportunity to criticize China’s own inequality issues in response to their government’s foreign policy talk. The criticism was kicked off by a Xinhua interview with Chinese Foreign Minister Wang Yi that was published on Monday, in which Wang warned the U.S. against taking a “superior position” in world affairs…

Wang’s interview was disseminated widely across domestic Chinese media. But it received particular attention from Weibo netizens after it was reposted on Phoenix Television’s official account, when users discovered that their comments were not being intercepted by censors. The comments section was quickly overrun, after netizens took issue with Wang’s statement about “superiority,” which in Chinese could alternately be translated as “China doesn’t recognize that there is a country with ‘superior people’ in this world.”

Widely upvoted comments decried the hypocrisy of Wang’s remarks, as users pointed out China’s own problems with social inequality. Netizens latched onto Wang’s “superior people” remark to criticize China’s social services for privileging so-called “superior people” over ordinary citizens. Others protested that central Party leaders delivered “empty talk and bluster.” Thousands of comments were later deleted from the post, but not before CDT editors collected a selection (originally in Chinese):

Should-read: Security Studies Vol. 30, Issue 2

The latest issue of Security Studies, which includes my article (co-authored with Allan Dafoe) — “The Logic of Strategic Assets: From Oil to AI” — is now online. Check it out and let me know what you think!

Thank you for reading and engaging.

These are Jeff Ding's (sometimes) weekly translations of Chinese-language musings on AI and related topics. Jeff is a PhD candidate in International Relations at the University of Oxford and a researcher at the Center for the Governance of AI at Oxford’s Future of Humanity Institute.

Check out the archive of all past issues here & please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay for a subscription will support access for all).

Any suggestions or feedback? Let me know at chinainewsletter@gmail.com or on Twitter at @jjding99

Loading more posts…