ChinAI #257: Can Chinese companies keep up with Sora?
Insights from leaders of Chinese generative AI startup Shengshu-AI
Greetings from a world where…
March is mad
…***We’ve hit a bit of a lull in paid subscriptions lately, so please consider subscribing here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay support access for all AND compensation for awesome ChinAI contributors). As always, the searchable archive of all past issues is here.
Feature Translation: Can Chinese companies make Sora? This Tsinghua large model team gives hope
Context: Thanks for exercising your right to vote. In a close contest, the #6 option (link to original jiqizhixin article) won out: How are Chinese companies trying to keep up with Sora? This week we zoom in on Shengshu-AI (生数科技), a Beijing-based generative AI startup that has raised 14 million USD and boasts Jun Zhu (Tsinghua professor who co-directors Tsinghua’s AI lab) as chief scientist. There are hopes and expectations that Shengshu “will become the Chinese team closest to Sora.”
Key Takeaways: Sora came ahead of schedule, sparking worries about the growing gap between Chinese and Western AI models.
The article sets the scene: “At the end of 2023, many people predicted that the next year would be a year of rapid development for video generation. But unexpectedly, just after the Lunar New Year, OpenAI dropped a blockbuster — Sora, which can generate smooth and realistic 1-minute videos. Its emergence has made many researchers worry: Is the gap between Chinese and foreign AI technologies widening again?”
Zhu feels this gap in generative AI is not as wide as it was when ChatGPT came out. “It's just that everyone may be lagging behind a little in engineering technology now. The issue of video generation is also taken seriously in China, and the domestic foundation for image and video-related tasks is relatively good. Judging from the current results, the actual situation may be more optimistic than imagined.”
A core breakthrough that Sora builds on is the DiT method, which replaces the commonly used U-Net backbone network with a Transformer (thereby reproducing the scalability of large language models in visual tasks). One of the main reasons Zhu is confident is that his Tsinghua team has worked on a U-ViT model that’s very similar to Sora’s fundamental DiT innovation. In fact this paper was submitted (and later accepted) to CVPR, the top computer vision conference, two months before DiT was published.
If U-ViT and DiT were released at the same time, why did OpenAI get to Sora first?
Here’s Zhu on why Shengshu prioritized image generation tasks instead: “In fact, we were also doing video generation, it was just that our prioritization had to be based on computing power…It’s just at that time we mainly focused on the generation of short videos of a few seconds, not tens of seconds or one minute like OpenAI's Sora. There are many reasons for this, but one of the most important reasons is that the resources we have, relative (to OpenAI), are truly limited. However, from 2D images to video generation, many things are in the same vein, and many experiences (such as training experience with large-scale models) can be reused.”
Later on, Jiayu Tang, Shengshu’s CEO, estimates the development costs of Sora from a compute perspective: “the industry estimates that getting to a state of relatively sufficient resources will need to reach the level of 10,000 chips (Nvidia A series).” For context, see the rough calculation from ChinAI #255 that Bytedance (which has been one of the most aggressive in stockpiling chips) has 200,000 high-end chips such as the A100/A800/H800s. One caveat: since Shengshu has some experience with accelerating training of large-scale models, actual compute needs may be less.
It’s not just about compute. There’s a lot of technical challenges with going from very short videos to 1-minute ones. OpenAI has very rich training data accumulated from its prior Dall-E 3 model. Other factors cited include tacit knowledge and experience gained from engineering optimization and optimizing efficient compute usage in large training runs.
Other interesting sections from this lengthy interview with Tang and Zhu include: how Shengshu took inspiration from OpenAI’s “when in doubt, scale it up” motto; why they believe in positioning themselves on the “native multi-modal” track; and why Zhu thinks Sora’s paper airplanes video is the most impressive.
See FULL TRANSLATION: Can Chinese companies make Sora? This Tsinghua large model team gives hope
ChinAI Links (Four to Forward)
Should-read: Resurfacing “Those who Work for AI” (translation of GQ China piece)
Great to hear from Liu Min, who wrote this longform piece on China’s data workshops back in 2018. She jumped in the comments of the translation doc to share another expertly woven article (link to original Chinese) about how to make catchy songs in an algorithmic era dominated by 15-second short videos.
Should-read: Behind the doors of a Chinese hacking company
For AP News, Dake Kang and Zen Soo report on private hacking contractors that steal data from other countries to sell to Chinese authorities:
Though the existence of these hacking contractors is an open secret in China, little was known about how they operate. But the leaked documents from a firm called I-Soon have pulled back the curtain, revealing a seedy, sprawling industry where corners are cut and rules are murky and poorly enforced in the quest to make money.
Should-read: How the Pentagon Learned to Use Targeted Ads to Find Its Targets—and Vladimir Putin
Can advertising data collected from mobile phones be used to track the movements of Vladimir Putin’s entourage? In Wired, Byron Tau, an investigative journalist, provides an excerpt adapted from his book on the “hidden alliance of tech and government” in creating a new American surveillance state.
Should-read: Schmidt Sciences second cohort of AI2050 early career fellows
Honored to be selected as one of these fellows for my project on AI governance and the role of international organizations in improving China’s safety standards in other high-risk technologies (e.g., chemical plants, nuclear power). This grant is supporting a 2-year post-doc position that will work closely with me on this project. More info in the job ad here — please share with your networks!
Thank you for reading and engaging.
These are Jeff Ding's (sometimes) weekly translations of Chinese-language musings on AI and related topics. Jeff is an Assistant Professor of Political Science at George Washington University.
Check out the archive of all past issues here & please subscribe here to support ChinAI under a Guardian/Wikipedia-style tipping model (everyone gets the same content but those who can pay for a subscription will support access for all).
Also! Listen to narrations of the ChinAI Newsletter in podcast format here.
Any suggestions or feedback? Let me know at chinainewsletter@gmail.com or on Twitter at @jjding99
This is great
Congratulations on your selection as an AI2050 Early Career Fellow!