ChinAI #293: Transparency Assessment of 15 Chinese Large Models

Dec 23, 2024

Greetings from a world where…

jet lag hits hard

…As always, the searchable archive of all past issues is here. Please please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay support access for all AND compensation for awesome ChinAI contributors).

Feature Translation: Transparency assessment of 15 Chinese large models

Context: On December 18, the Nandu Digital Economy Governance Research Center released its Generative AI User Risk Perception and Information Disclosure Transparency Report. I believe this is a center that is similar to the Nandu Personal Information Protection Research Center [隐私护卫队], a think tank affiliated with the influential Southern Metropolis Daily. This transparency assessment analyzed the privacy policies and user agreements of 15 large model products, including Tencent’s Yuanbao, Baidu’s ErnieBot, and SenseTime’s SenseChat.

Key Takeaways: The three AI products with the highest transparency scores are: Tencent Yuanbao (72 points), iFlytek’s SparkDesk (69 points), and Zhipu’s Qingyan (67 points); the three that rank the lowest are: Baichuan’s Baixiaoying (54), ModelBest’s Luca (51 points), and Metaso [秘塔] (43 points).

Image below shows the overall scores for all 15 products. These were scored on 20 dimensions, grouped under five themes: 1) personal information protection, 2) intellectual property rights, 3) content security, 4) protection of special groups, and 5) complaint feedback mechanisms.

It appears that the general approach is inspired by the Stanford's Foundation Model Transparency Index.

Chinese large models scored poorly on two transparency dimensions: 1) the data used to train the model; 2) rights of users to refuse the usage of their data to feed future AI training

From the article: “None of the 15 domestic large models mentioned the specific source of large model training data in the policy agreement, let alone publicly disclosed which copyrighted data was used.” These firms don’t want to open themselves up to copyright disputes; in addition, they don’t want to divulge proprietary data to competitors.
In this test, only four large models, including Tencent Yuanbao, Bytedance Doubao, HailuoAI, and Zhipu Qingyan, mentioned that users can withdraw their relevant data from being used to train future AI models.

What should we expect going forward? I’ll be interested in the role of organizations like Nandu’s research centers on personal information protection and digital governance on China’s AI governance in the future.

This report highlights the practices of international large models related to mechanisms to withdraw user data from AI training. For instance, Google's Gemini allows users to turn off their activity record to prevent the conversation content from being used for AI training; similarly, ChatGPT Plus users can disable their conversation data from being used to further optimize the model.
Another cool nugget: The report highlights that the EU’s AI Act officially came into force in August 2024. It requires large model providers to declare whether copyrighted materials are used to train AI. The article states, “This also reflects the future direction of regulation.”

FULL TRANSLATION: Transparency assessment of 15 Chinese large models

ChinAI Links (Four to Forward)

Should-read: A Reading List On Artificial Intelligence and Interspecies Communication

This longreads list features an amazing Financial Times piece on the key technologies and ethical questions involved with interpreting and speaking to animals with humpback whale songs and infrasonic elephant rumbles. Written by Persis Love, Irene de la Torre Arenas, Sam Learner, and Sam Joiner.

Should-read: China and the U.S. produce more impactful AI research when collaborating together

Based on a dataset of over 5 million AI papers, New York University Abu Dhabi researchers find: “A matching experiment reveals that the two countries have always been more impactful when collaborating than when each works without the other.”

Should-read: ETO AI GOvernance and Regulatory Archive

CSET’s Emerging Technology Observatory has compiled a living collection of AI-relevant laws, regulations, standards, and other governance documents. This link includes the 20 documents relevant to China.

Random things I loved from 2024

The Will of the Many - fantasy novel by James Islington
Tokyo Vice season 2 - HBO TV series
The Ringer in Review - this is the website I read the most, and their brand-new website rounded up staff choices of their favorite pieces ever

Thank you for reading and engaging.

These are Jeff Ding's (sometimes) weekly translations of Chinese-language musings on AI and related topics. Jeff is an Assistant Professor of Political Science at George Washington University.

Check out the archive of all past issues here & please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay for a subscription will support access for all).

Also! Listen to narrations of the ChinAI Newsletter in podcast format here.

ChinAI Newsletter