This paper considers whether present-day GenAI can reason through 1) determining which types of cognitive processes and procedures (aka “modes of thinking”) are used in human reasoning. Then, 2) determining which forms of human reasoning can be mimicked/reproduced by GenAI/LLMs. It finds that AI can, to a certain extent, achieve some “reasoning” at various levels of capability and with its now-familiar “jagged boundary,” (which is quite an achievement), but not at the all-encompassing and overhyped level touted by the industry. And this is certainly NOT AGI either.
AI in 2025: A Combinatorial Explosion of Possibilities, but NOT AGI (2025)
This paper discusses the rapid evolution of Generative AI (GenAI) by 2025, highlighting its transition from theoretical to engineering-focused development. It examines various technological advancements, including specialized datasets, improved training methods, multimodal models, and AI agents. It argues that despite challenges, these innovations will lead to unpredictable, transformative capabilities, but not AGI.
The Frustrating Quest to Define AGI (2025)
This paper discusses whether Artificial General Intelligence (AGI) can ever be defined properly by reviewing the various approaches, identifying their validity, and proposing alternatives.
Regarding H20 vs H800, although 20 is a smaller number than 800, that doesn't necessarily make the H20 a lower-end chip than H800. Both are modified from the H100/H200 series to comply with export controls on chips that exceed a certain compute speed AND a certain interconnect bandwidth, the H800 by reducing the interconnect and the H20 by reducing the compute.
To get the most out of a GPU, it's important to match the speed at which data is coming in to the speed at which data is coming out. If a changed algorithm requires fewer compute operations for the same amount of data, it can make better use of the higher interconnect bandwidth of an H20 without being hampered as much by the slower compute speed, whereas the same algorithm on an H800 would leave the fast compute idling while waiting for new data to arrive on the slower interconnect.
I think the argument for faster-than-electricity diffusion of AI is that there is no "general purpose electrical device," every application of electricity requires a purpose-built design, and there isn't even a single best of each kind of electrical component, so that a material-science breakthrough for extremely small capacitors doesn't necessarily translate to extremely big ones. (E.g. if the material used is too expensive to source in large quantities.)
Whereas AI research originally did go down the route of using purpose-built algorithms for every application, but now increasingly focuses on a small number of very general methods that perform well across a wide range of tasks. (Pretty much every state-of-the-art model is a Transformer nowadays, and if it isn't, most likely a CNN.) Importantly, methodological breakthroughs tend to apply across the board, so once a task becomes possible to do with general-purpose models at all, it can ride the wave of improvements for free. And while training benefits from the latest, more efficient hardware, the resulting models can often be distilled down to work on older hardware, so that these improvements can diffuse at the speed of a software upgrade as opposed to the slowness of hardware procurement cycles.
Thanks - totally agree with Point 4 on conflict of interest disclosure. It baffles me that these are not actively disclosed, either when Dario or Sam Altman speak about export controls and more...
Might be of interest. Education background of the DeepSeek team:
https://collegetowns.substack.com/p/where-did-the-deepseek-team-study
Thank you Jeff!! I love your newsletter, it contains info we don't see anywhere else, and it punctures the US big bros' hype.
You may enjoy my papers: https://curriculumredesign.org/papers/#Pubs-Technology
Does Present-Day GenAI Actually Reason? (2025)
This paper considers whether present-day GenAI can reason through 1) determining which types of cognitive processes and procedures (aka “modes of thinking”) are used in human reasoning. Then, 2) determining which forms of human reasoning can be mimicked/reproduced by GenAI/LLMs. It finds that AI can, to a certain extent, achieve some “reasoning” at various levels of capability and with its now-familiar “jagged boundary,” (which is quite an achievement), but not at the all-encompassing and overhyped level touted by the industry. And this is certainly NOT AGI either.
AI in 2025: A Combinatorial Explosion of Possibilities, but NOT AGI (2025)
This paper discusses the rapid evolution of Generative AI (GenAI) by 2025, highlighting its transition from theoretical to engineering-focused development. It examines various technological advancements, including specialized datasets, improved training methods, multimodal models, and AI agents. It argues that despite challenges, these innovations will lead to unpredictable, transformative capabilities, but not AGI.
The Frustrating Quest to Define AGI (2025)
This paper discusses whether Artificial General Intelligence (AGI) can ever be defined properly by reviewing the various approaches, identifying their validity, and proposing alternatives.
Regarding H20 vs H800, although 20 is a smaller number than 800, that doesn't necessarily make the H20 a lower-end chip than H800. Both are modified from the H100/H200 series to comply with export controls on chips that exceed a certain compute speed AND a certain interconnect bandwidth, the H800 by reducing the interconnect and the H20 by reducing the compute.
( https://semianalysis.com/2025/01/31/deepseek-debates/#deepseek-and-high-flyer has a good plot of where different GPUs fall along those two dimensions.)
To get the most out of a GPU, it's important to match the speed at which data is coming in to the speed at which data is coming out. If a changed algorithm requires fewer compute operations for the same amount of data, it can make better use of the higher interconnect bandwidth of an H20 without being hampered as much by the slower compute speed, whereas the same algorithm on an H800 would leave the fast compute idling while waiting for new data to arrive on the slower interconnect.
I think the argument for faster-than-electricity diffusion of AI is that there is no "general purpose electrical device," every application of electricity requires a purpose-built design, and there isn't even a single best of each kind of electrical component, so that a material-science breakthrough for extremely small capacitors doesn't necessarily translate to extremely big ones. (E.g. if the material used is too expensive to source in large quantities.)
Whereas AI research originally did go down the route of using purpose-built algorithms for every application, but now increasingly focuses on a small number of very general methods that perform well across a wide range of tasks. (Pretty much every state-of-the-art model is a Transformer nowadays, and if it isn't, most likely a CNN.) Importantly, methodological breakthroughs tend to apply across the board, so once a task becomes possible to do with general-purpose models at all, it can ride the wave of improvements for free. And while training benefits from the latest, more efficient hardware, the resulting models can often be distilled down to work on older hardware, so that these improvements can diffuse at the speed of a software upgrade as opposed to the slowness of hardware procurement cycles.
Thanks - totally agree with Point 4 on conflict of interest disclosure. It baffles me that these are not actively disclosed, either when Dario or Sam Altman speak about export controls and more...