ChinAI #271: Key Chinese GenAI Security Standard Changelog
Saad Siddiqui and Shao Heng track the evolution of the TC260 standard on genAI security requirements
Greetings from a world where…
Grouse Mountain hikes > DC heat
…***Can you help ChinAI reach 200 paid subscriptions as we pass the 20k subscriber mark? Write it off as a business expense! As always, the searchable archive of all past issues is here. Please please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay support access for all AND compensation for awesome ChinAI contributors).
Feature Analysis: Changelog for TC260 standard — GenAI Security Requirements
***Saad Siddiqui is an AI policy researcher at the Safe AI Forum, which runs the International Dialogues on AI Safety, a Track 2 dialogue between leading AI scientists from both China and the West. He previously contributed a detailed analysis of changes in a Chinese expert draft law on AI governance (ChinAI #262). Ang Shao Heng is a recent graduate from Peking University who wrote his thesis on LLMs in China. He has also interned for IMDA, a government agency responsible for developing Singapore’s AI governance policies. What follows is their analysis (lightly edited by me).
Context: On 23 May 2024, the National Technical Committee 260 on Cybersecurity (TC260), under the Standardization Administration of China, published “Basic Security Requirements for Generative Artificial Intelligence Service Public Consultation Draft” (hereinto referred as “Draft National Standards”). The Draft National Standards aim to complement the Interim Measures for the Administration of Generative AI Services, by detailing security requirements for models and their training data. A previous version had been published as a technical document by TC260, first in draft form in October 2023, and then in finalized form in February this year. Technical guidance documents serve as preparatory material for national standards, outlining concrete requirements that industry can give feedback on.
It is important to note that this is a national recommended standard —its application is not mandatory but advisory — which means that companies choose whether to comply based on their needs. The excellent Geopolitechs newsletter, published by Patrick Zhang, Senior Public Policy Expert at ByteDance Management Research Institute, provides a good account of some of the differences between the different types of standards (as well as an English translation of the draft national standard).
“China's national standards system includes mandatory standards (强制性标准) and recommended standards (推荐性标准), both issued by the Standardization Administration of China (SAC). Mandatory standards usually start with "GB," while recommended standards use "GB/T."…
…In contrast, the TC260 technical documents are issued by the National Cybersecurity Standardization Technical Committee. They often serve as preparatory materials for national standards or as interim "quasi-standards" in the absence of a national standard, providing industry guidance. They do not use the "GB/T" prefix but are labelled with "TC260" or other related identifiers to distinguish them from national standards.”
Summary of Changes between Different Versions of the TC260 AI Security Standard from October 2023 to May 2024
The most recent standards included many interesting planks related to GenAI:
Service providers are required to test at least 100 samples of processed data for safety and intellectual property infringements. (Pre-training and Fine-tuning data requirements, Article 8.3.2.a)
Service providers are required to undertake human sampling of at least 4,000 samples of pre-processed data for safety and intellectual property infringements. (Pre- training and Fine-tuning data requirements, Article 8.2.3.a)
Service providers must construct a database of keywords to be used to evaluate pre-training and fine-tuning data. This database should preferably include at least 10,000 terms and be updated once a week. (Pre-training and Fine-tuning data requirements, Article B.1.c)
Requires service providers to ensure that 30% of fine-tuning data annotations are made up of security data annotations. (Data annotations requirements, Article 8.1.a)
Introduction of rules for annotation safety including the management of annotation records, selection and testing of annotation staff, and evaluation of annotated data. (Data annotations requirements, Article 6)
Introduced specific annotation labels and categories for text, image, audio, video, 3-D and time-based content. (Data annotations requirements, Annex B)
Broadly, the changes we see across the different versions of this standard accord with a trend that other observers have also noted. There is a clear loosening of requirements, either for the sake of practicality (some requirements that have been removed seemed impossible to actually comply with) or to enable some pro-development outcomes (e.g., allowing the use of international open-weights/source models). These changes emerge from a process of trial, error, and revision in collaboration with AI service providers and other stakeholders across the party-state bureaucracy.
According to the preparation description [编制说明] released by TC260, the draft national standards in particular, are written based on principles of versatility [通用性], practicality [实用性] and relevance to existing regulations [符合性].
One key change in the draft national standard is that service providers are no longer required to ensure that the models they use are approved by the algorithm registry. As the Geopolitechs newsletter notes, this removes a possible barrier to the usage of international open-source models by Chinese AI service providers.
Other changes also appear to be targeted at loosening stipulations. What was previously Section 8 and Section 9, listing a range of ‘other’ requirements and specific security assessment processes that had to be followed, have mostly been moved down into a new appendix, and been labelled ‘For Information’. This action, alongside the removal of specific clauses that covered compliance details and specifications of who could conduct model evaluations, suggests a desire to ease compliance burdens.
The key changes from October 2023 to February 2024 included specific mentions of long-term risks from AI, which mirrors conversations that were taking place internationally (e.g., via the UK AI Safety Summit).
Specifically, these were the key changes that we observed (non-exhaustive):
4 - General mention of long-term risks added as part of the general principles section, but no direct mention as part of any of the security assessments or specific risks laid out in the appendix
Some onerous suggestions removed or softened
5.1. Requirements for a data source blacklist dropped, instead an additional layer of data source verification added after data collection is complete and before model training
6.c and 6.d - Strict requirements for content to be reliable and accurate changed to requirements that AI service providers take technical measures to improve accuracy and reliability more generally
7.d. - Specific longlist of required watermarking/AI-content labelling removed, perhaps as not all of these are required or can be done in every context
Stricter and more specific policing of users required
7.g.1 - Requirement added that keywords and classification models should be used to detect information input by users. Specific requirements that if a user inputs illegal or harmful information 3 times in a row or 5 times in a day or otherwise induces the generation of illegal or harmful information, the service should be suspended
See Saad and Shao’s full detailed changelog for the various versions of this standard: TC260 standard — GenAI Security Requirements
ChinAI Links (Four to Forward)
We’ve thrown a lot of text at you in this issue, so let’s keep the recommendations short and sweet:
2024 NPC Lecture: AI and China — History, Prospects, Challenges, Strategies and Legislation: David Cowhig’s translation blog featured Sun Ninghui’s lecture to the National People’s Congress, which advocated for China to expedite the introduction of an “Artificial Intelligence Law.” H/t to Don Clarke for sharing.
AI Risk Management Should Incorporate Both Safety and Security: I’ve often noted the distinction between safety and security when translating the Chinese term AI anquan. This working paper (I contributed a small section on nuclear safety and security distinctions) develops a framework that tackles this conceptual challenge. For a summary, see this thread by lead author Xiangyu Qi, a Princeton PhD student who works on LLM safety, security, and alignment.
China’s Military AI Roadblocks — PRC Perspectives on Technological Challenges to Intelligentized Warfare: For CSET, Sam Bresnick’s “comprehensive review of dozens of Chinese-language journal articles about AI and warfare reveals that Chinese defense experts claim that Beijing is facing several technological challenges that may hinder its ability to capitalize on the advantages provided by military AI.”
Microsoft Bing’s censorship in China is even “more extreme” than Chinese companies: Joanna Chiu reports, “Bing’s censorship rules in China are so stringent that even mentioning President Xi Jinping leads to a complete block of translation results, according to new research by the University of Toronto’s Citizen Lab that has been shared exclusively with Rest of World.”
Thank you for reading and engaging.
These are Jeff Ding's (sometimes) weekly translations of Chinese-language musings on AI and related topics. Jeff is an Assistant Professor of Political Science at George Washington University.
Check out the archive of all past issues here & please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay for a subscription will support access for all).
Also! Listen to narrations of the ChinAI Newsletter in podcast format here.
Any suggestions or feedback? Let me know at chinainewsletter@gmail.com or on Twitter at @jjding99