ChinAI Newsletter #19: Is the Wild East of big data coming to an end? A turning point case in personal information protection

Welcome to the ChinAI Newsletter!

We've hit 1100+ subscribers: thanks all for the support, especially everyone sharing and commenting on translations!
These are Jeff Ding's weekly translations of writings on AI policy and strategy from Chinese thinkers. I'll also include general links to all things at the intersection of China and AI. Please share the subscription link if you think this stuff is cool. Here's an archive of all past issues. *Subscribers are welcome to share excerpts from these translations as long as my original translation is cited.

I'm a grad student at the University of Oxford where I'm the China lead for the Governance of AI Program, Future of Humanity Institute.

Major Case Cracking down on Privacy Violations by Big Data Companies

Gonna be honest - kinda mailed it in last week, but this week's a doozy so strap in. Last week, in what could be a turning point in China's big data/privacy scene, China's Shandong Province brought a major case on infringements of personal information by big data companies (including the well-known company Datatang 数据党). I covered it in this Twitter thread, but the quick-and-dirty details are: 57 individuals arrested, 11 companies involved, and result of a 1-year+ investigation.

The two translations this week cover this case in depth. The first is an excerpt that gives you an idea of why Datatang matters for understanding China's AI scene, but the second is probably one of the coolest translations I've done - it collects reactions and insights on the Datatang case from data compliance officers/legal counsels at some of the most important players (inc. Huawei, Meituan-Dianping, ZTE)

Why Datatang matters for China's AI scene:
- Big data companies like Datatang do a lot of the behind-the-scenes data collection and cleaning for tech giants that do AI research and development: in 2015, Datatang's business income with Baidu was 16.38 million RMB, accounting for 24% of Datatang's total revenue
- With the rise of the AI boom, AI revenue accounted for 3/4 of Datatang's 2017 total revenue; important nugget: nearly half of this AI revenue came from orders outside of China
- Datatang had its own crowdsourcing platform for data labeling which employed more than 500,000 part-time workers

Excerpt: Datatang is Suspected in a Criminal Case! Do big data companies still have good days ahead?

Digging deeper into the Datatang Case

This lengthy translation gets very technical but trust me it's kinda fun learning about the difference between imei/sn/undid/pbd/pia in data privacy tech speak.

Important questions discussed by the data protection experts providing commentary on the fall-out of the Datatang case:

  • How broadly do we define "personal information" given the increased capacity to derive someone's identity from large amounts of data? The Datatang case involved non-identity information (online browsing records, cookies, device information NOT physical addresses, identity numbers). What are the criteria for determining whether data is considered "personal information?" Should these always be considered on a case-by-case basis?

  • To what extent is the Datatang case meaningful for interpreting and understanding China's existing standards/laws on personal information protection? A key tension you'll find throughout this translation is between the interpretation of personal information protection outlined in Article 243 of the Criminal Law and the one suggested in the appendix of a new Personal Information Protection Specification. Some of the commentators argue that the Datatang case shows that the latter's broader definition of sensitive personal information, which includes things like "web browsing history," is gaining support. Others argue this may not set a precedent beyond one local county's actions in Shandong province.

  • What does this mean for China's AI industry? One commentator says, "Data assets are mines that may be detonated at any time." Others say this case is a clarion call for companies to do better auditing and due diligence on their data transactions.

One thing I'm trying to do with this newsletter is to showcase the breadth, depth, and diversity of Chinese thinking on these issues. You definitely should read the entire translation if you still subscribe to the belief that China's discussion of privacy issues is shallow.

Interesting Comments on the Datatang Incident from the DPO Community

This Week's ChinAI Links

Have been very impressed by The Information's reporting lately: check out these articles on the self-driving car map provider DeepMap trying to compete in China, Huawei's "Project Da Vinci" AI push, and behind the scenes look at the Google-JD deal

DigiChina's translation of the cybersecurity law, which has a clause on personal information at the very end, is here.

SupChina is a great resource for keeping up-to-date on the daily happenings in China.

Another goal of this newsletter is to challenge the unitary actor lens through which many view China's AI strategy/scene. In that spirit, I recommend this must-read article by Audrye Wong on how provinces shape China's foreign policy, challenging this unitary actor assumption. 


Thank you for reading and engaging.

Shout out to everyone who is commenting on the translations - idea is to build up a community of people interested in this stuff. You can contact me at or on Twitter at @jjding99