ChinAI #254: Tencent Res. Institute Tackles Value Alignment in Large Model Security & Ethics Research Report
Greetings from a world where…
I’m devouring Patrick Keefe’s Say Nothing
…As always, the searchable archive of all past issues is here. Please please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay support access for all AND compensation for awesome ChinAI contributors).
Feature Translation: Large Model Security and Ethics Research Report 2024
Context: The past two weeks, we’ve been covering a 76-pg. research report authored by Tencent Research Institute (TRI), Tencent Zhuque lab, Tencent Hunyuan model team, Tsinghua Shenzhen International Graduate School, and Zhejiang University State Key Lab of Blockchain and Data Security. First, we’ll finish up the chapter on best practices for large model security, which features examples of what Tencent is doing in practice to protect its Hunyuan large language model. Then, we’ll look at how these researchers are thinking about value alignment in large models (Chapter 5 of the report).
Key Takeaways: In addition to red-blue security exercises, general vulnerability assessments are important for securing large models.
One interesting resource that this report flags is the list of ten critical vulnerabilities for large language models, published by the Open Worldwide Application Security Project (OWASP).
The report also stresses the importance of protecting source code, which includes monitoring abnormal operating behaviors of R&D personnel and illegitimate intrusions. The image below shows an alert message from Tencent’s “Worker Bee”[工蜂] code collaboration management tool: “Employee ____ downloaded a large number of projects in a short period of time, 5 programs downloaded in the span of 12 hours. Please pay attention to whether this behavior is needed for the project and avoid code-related information security incidents. Thanks for your understanding and support.”
There’s also an interesting passage about risks associated with external plug-ins for large language models. An anecdote featured in the report: “One time, when Tencent internally reviewed an external plug-in code, it discovered that this plug-in contained a remote code execution vulnerability. By constructing a special prompt to get the model to call the vulnerable plug-in, an attacker could gain control of the entire inference service, which could cause serious consequences such as making the inference service (inference service here means the portal where users engage with large models) become unavailable.”
The report authors closely follow progress and trends in large model value alignment.
The report goes through the latest developments in AI safety in-depth, including the White House’s Executive Order, the UK’s AI Safety Summit, the EU’s AI Act. As one of the sections titles states, “Large Model Safety and Alignment has become a Global Issue.”
To illustrate the level of detail, the authors describe OpenAI’s governance organization for AI safety: In addition to setting up technology ethics (review) committees, leading companies in the AI field are also trying to establish teams responsible for safety with more specific tasks. Taking OpenAI as an example, its internal safety and policy teams such as the Safety Systems team, superalignment team, and "Preparedness" team are jointly responsible for the risk issues of cutting-edge models. Among them, OpenAI's newly established "Preparedness" team specializes in evaluating the most advanced, yet-to-be-released AI models, rating them at four levels based on different types of perceived risks - "low", "medium", "high" and "severe" , in accordance with the new safety guidelines released by OpenAI on December 18, Open AI will only launch models rated "low" and "medium" to the public.
FULL(ish) TRANSLATION of chapters 4 and 5: Large Model Security and Ethics Research Report 2024
ChinAI Links (Four to Forward)
Must-read: Why You’ve Never Been In A Plane Crash
For Asterisk, Kyra Dempsey investigates the determinants of safety in the airline industry. She weaves a convincing narrative that ties the “blameless postmortem” process to improvements in airline safety in the United States.
Should-read: What does the Party Stand to Gain from AI?
In China Media Project, Alex Colville investigates Zhohngke Wenge, a company offering AI-based communication services (including propaganda) to clients including the Central Propaganda Department, the Ministry of Public Security, CCTV, Xinhua, and People’s Daily.
Should-read: OWASP Top 10 for large language model applications
Here’s where you can learn more about OWASP’s top 10 most critical vulnerabilities in LLM applications. The OWASP Foundation “works to improve the security of software through its community-led open source software projects, hundreds of chapters worldwide, tens of thousands of members, and by hosting local and global conferences.”
Should-apply: China + AI analyst for ChinaTalk
Very cool opportunity: ChinaTalk, a podcast and newsletter that covers U.S.-China relations and tech policy, is recruiting for analysts to conduct research on various aspects of China’s AI ecosystem.
Thank you for reading and engaging.
These are Jeff Ding's (sometimes) weekly translations of Chinese-language musings on AI and related topics. Jeff is an Assistant Professor of Political Science at George Washington University.
Check out the archive of all past issues here & please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay for a subscription will support access for all).
Also! Listen to narrations of the ChinAI Newsletter in podcast format here.
Any suggestions or feedback? Let me know at chinainewsletter@gmail.com or on Twitter at @jjding99
Nice piece on aviation safety. The way the U.S. military conducts aviation mishap investigations seems to offer a useful starting point for coming up with a model for national and even international "AI mishap" investigations.
The Safety Investigation Board (SIB) is entirely facts-based ("just the facts, ma'am," for those who get that reference), relies on privileged information/discussions, and does not assign culpability (at least in the sense of recommending disciplinary action). Its sole purpose is mishap prevention (i.e., don't let the same kind of mishap happen again). It's a true root-cause analysis.
The follow-on board, called the Accident Investigation Board (AIB), gathers evidence for possible claims, litigation, and disciplinary action against any service members involved in the mishap. It is publicly releasable.