ChinAI #100: Re-igniting an age-old debate: Data vs. Algorithms

Plus, riffing on "The Industrialization of AI"

Greetings from a land where on the one hand it’s the 100th issue of ChinAI, but on the other hand it’s 100+ years after The Jungle shed light on the experience of Lithuanian immigrants in Chicago’s meatpacking districts we see horrific Covid outbreaks among meatpacking workers, many of whom are immigrants or refugees…

…as always, the archive of all past issues is here and please please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay support access for all AND compensation for awesome ChinAI contributors).

Feature Translation: The Algorithms vs. Data Debate provokes deeper thinking about AI

CONTEXT: In my opinion one of the most talented reporters on the AI beat (the lack of qualifiers in that statement is intentional) is 四月 for jiqizhineng. Their article from last week revisits a Jan 2020 Economist piece, which sparked some back and forth on Weibo among some Chinese AI professors regarding the relative importance of data vs. algorithms as drivers in China’s success in AI.

KEY TAKEAWAYS — Three points: 1) What sparked the debate, 2) Where are we at with data vs. algorithms, 3) Where are we going?

  • 1. What sparked the debate? Recently, Yungang Bao, a university professor at the Chinese Academy of Sciences, shared an Economist piece about 莫比嗨客 (MBH), a representative data labeling company. On Weibo he wrote that “The Economist featured MBH in an article on the same high level as Sensetime and Megvii, even spilling more ink on MBH.” He then called for China’s “new infrastructure” policy (covered in ChinAI #91) give more support for the data labeling industry, citing how MBH employs 300,000 people in poorer regions of China, as well as analogizing Megvii/Sensetime to Apple and MBH to Foxconn.

  • Many netizens responded to Bao’s Weibo post. arguing that many types of data used in AI cannot just be outsourced to companies like MBH for labeling (e.g. the labeling of network data requires high levels of expert knowledge). Zhou Zhihua, who leads a top AI lab at Nanjing University (ChinAI #5 covered some of his previous writings on Strong AI), also chimed in on the side of algorithms, “A powerful company must have something in terms of algorithms, but it is not like everyone can see it when the paper is published. Often, the algorithm applier does not want to be exposed, and in particular the algorithm scheme cannot be disclosed. So what you can see is only on the surface level."

  • Bao’s response cites the Economist article again: “Many of the algorithms used contain little that is not available to any computer-science graduate student on Earth. Without China’s data-labelling infrastructure, which is without peer, they would be nowhere.”

  • 2. Where are we at with data vs. algorithms? The answer is always it depends…but we can do better than leaving it at that. Personally, I think it’s very difficult to make the case that China’s data labeling industry provides a significant comparative advantage to China’s AI success. First of all, companies like MBH don’t just supply Sensetime/Megvii; I’d wager international firms make up a fair amount of the customer base. Second of all, advances in unsupervised learning — even in fields like image recognition where data labeling may be most salient — lessens the demand for labeled data. In many smart manufacturing systems, the constraint is not the amount of sensor data but rather the talent/skills to develop ML algorithms to make the most of that data.

  • 3. Where are we going? If algorithms are the bottleneck, is the solution, then, the mass production of AI algorithms? This process involves “algorithm factories,” which Xu Bing (co-founder of Sensetime) describes as “a factory where data is continuously refined in a furnace of computing power, and where batches of algorithm models are produced at a lower cost and are continuously brought into the market.” Sensetime’s “SenseParrots,” for instance is a prototype of this “algorithm factory.” Megvii also recently open-sourced MegEngine, which is a deep learning framework that is meant to help diffuse the mass production of algorithms by university students, teachers, and AI developers in SMEs and traditional industries. The article’s conclusion: “AI technology must move towards industrialization.”

FULL TRANSLATIONAlgorithms vs data, which plays the decisive role? A North-South "debate" among big shots provokes deeper thinking about AI

Reflections: The Industrialization of AI

Let’s spitball a little bit about this idea of “The Industrialization of AI (IoAI).” First, it’s important to clarify what the IoAI is not. It’s neither the application of AI to industry nor the application of AI to industrialize industry.

  • Application of AI to industry = facial recognition as an improved product in the existing identity authentication industry.

  • Application of AI to industrialize industries = the application of machine quality inspection to make production lines more automated (ChinAI #58)

The Industrialization of AI, rather, refers to a transition in the methods of producing “AI.” In a January 2020 issue of importAI, Jack Clark described IoAI as “what happens when AI goes from an artisanal, craftsperson-based profession to a repeatable, professional-based profession.” Two examples:

  • He notes, for instance, how AI software frameworks have evolved from tools built by random university students (e.g. Theano) to industry-developed systems (e.g. TensorFlow, Pytorch). Relatedly, this week’s feature translation describes Sensetime’s “SenseParrots” as something that has evolved from a technical framework to “an industrial-grade model production platform.”

  • When Jack first mentioned IoAI in an October 2018 issue, he stated that “the "emergence of new large-scale benchmarks for applied AI applications represent further evidence for the current era being ‘the Industrialization of AI’.” At that time, “AI Benchmark,” which tests the performance of AI software on different smartphones, had just been released. NIST’s Facial Recognition Vendor Test is in this vein.

This naturally leads us to ask: Can we use the industrialization of the 19th century American machine tool industry as a useful analogy for the industrialization of AI? Why not? It’s the 100th issue, let’s get wild. What was involved in the process of industrializing machine tools — the widespread adaptation of milling machines and lathes that cut and shape metal, wood, or other materials?

  • Vertical specialization: Before the 1820s the machine tool industry wasn’t really a separate industry. If you were making sewing machines, you would build your own tools to cut up metal to make a sewing machine. Do we have a separate “AI industry”?

  • Resource endowments: the American machine tool industry was very resource-intensive (this approach required a lot of wood and metal), and the U.S. had a more abundant supply of wood and metals than European competitors, which some argue explains why the U.S. took better advantage of the industrialization of machine-making. Advances upstream, like with high-speed steel, were crucial to advancing machine tool development. A lot of parallels to mull over: we know it’s more complicated than China just has more “wood” than the U.S.; what are the upstream advances that will be crucial to IoAI (e.g. in cloud computing?)

  • Standardization: This connects to the benchmark stuff that Jack is talking about. Standardization was at the heart of machine-tool enabled mass production — the idea that with more precise machine tools you could make standardized , interchangeable parts. Each manufacturer, however, had their own standards, so there was a need for a broader “standardization” that connected different firms, communities, markets, and states.

  • Technological convergence: this cluster of methods — using a sequential series of special-purpose machine tools to make stuff — could be applied to the manufacture of sewing machines, clocks, bicycles, automobiles, etc. This concept is at the heart of AI as a general-purpose technology.

More questions than answers but maybe progress is made by asking better questions. Much of this is drawn from Rosenberg’s 1963 article in The Journal of Economic History on the American machine tool industry. If there’s one article that has transformed the way I think about technological development, it would be this one.

ChinAI (Four to Forward)

Should-read: GovAI Submission to European Commission’s Consultation on AI White Paper: a European approach to excellence and trust

Bit shout-out to Stefan Torges, a GovAI fellow, who researched and drafted GovAI’s submission to the European Commission’s consultation on an AI White Paper. A key point is that excellence and trust can be mutually beneficial: “Trustworthy technology also contributes to the long-term competitiveness of the European AI sector. Accidents and misuse would risk undermining the trust necessary for this industry to flourish.” The paper goes on to outline concrete recommendations to improve the regulatory scope, types of requirements to address particular failure modes of AI applications, and more flexible AI governance.

Should-read: Wrongfully Accused by an Algorithm

By Kashmir Hill for NYT, “In what may be the first known case of its kind, a faulty facial recognition match led to a Michigan man’s arrest for a crime he did not commit.” “This is not me,” Robert Julian-Borchak Williams told investigators. “You think all Black men look alike?”

Should-read: Technology Quarterly January 2020

That January 2020 Economist article that sparked all this discussion comes from their “Technology Quarterly” section. There’s six other articles in that section that are all well-worth reading. Still the gold standard in terms of no-nonsense, concise, punchy writing.

Should-listen: My webinar on Tech Buzz China podcast

T’was really fun to do a webinar/Q&A on Tech Buzz China, run by Rui Ma and Ying-Ying Lu. It’s a really efficient 30-min. distillation of my big-picture thinking about China’s AI landscape, structured around 10 points. I’m a little nervous in the beginning but it gets more “listenable” throughout.

Thank you for reading and engaging.

These are Jeff Ding's (sometimes) weekly translations of Chinese-language musings on AI and related topics. Jeff is a PhD candidate in International Relations at the University of Oxford and a researcher at the Center for the Governance of AI at Oxford’s Future of Humanity Institute.

Check out the archive of all past issues here & please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay for a subscription will support access for all).

Any suggestions or feedback? Let me know at or on Twitter at @jjding99