Prof. Randall Balestriero - LLMs without pretraining and SSL

Machine Learning Street Talk (MLST)

#Machine Learning Street Talk #Artificial Intelligence #Tech #Machine Learning

32:48

America’s Sweethearts: Dallas Cowboys Cheerleaders is back for its second season! Kay Adams welcomes the women who assemble the squad, Kelli Finglass and Judy Trammell, to the Netflix Sports Club Podcast. They discuss the emotional rollercoaster of putting together the Dallas Cowboys Cheerleaders. Judy and Kelli open up about what it means to embrace flaws in the pursuit of perfection, how they identify that winning combo of stamina and wow factor, and what it’s like to see Thunderstruck go viral. Plus, the duo shares their hopes for the future of DCC beyond the field. Netflix Sports Club Podcast Correspondent Dani Klupenger also stops by to discuss the NBA Finals, basketball’s biggest moments with Michael Jordan and LeBron, and Kevin Durant’s international dominance. Dani and Kay detail the rise of Coco Gauff’s greatness and the most exciting storylines heading into Wimbledon. We want to hear from you! Leave us a voice message at www.speakpipe.com/NetflixSportsClub Find more from the Netflix Sports Club Podcast @NetflixSports on YouTube, TikTok, Instagram, Facebook, and X. You can catch Kay Adams @heykayadams and Dani Klupenger @daniklup on IG and X. Be sure to follow Kelli Finglass and Judy Trammel @kellifinglass and @dcc_judy on IG. Hosted by Kay Adams, the Netflix Sports Club Podcast is an all-access deep dive into the Netflix Sports universe! Each episode, Adams will speak with athletes, coaches, and a rotating cycle of familiar sports correspondents to talk about a recently released Netflix Sports series. The podcast will feature hot takes, deep analysis, games, and intimate conversations. Be sure to watch, listen, and subscribe to the Netflix Sports Club Podcast on YouTube, Spotify, Tudum, or wherever you get your podcasts. New episodes on Fridays every other week.…

about a year ago 34:30

MP3•Episode home

He also talks about how self-supervised learning (where models learn from data structure itself) and traditional supervised learning (using labeled data) are fundamentally similar, allowing researchers to apply decades of supervised learning theory to improve newer self-supervised methods.

Finally, Randall touches on fairness in AI models used for Earth data (like climate prediction), revealing that these models can be biased, performing poorly in specific locations like islands or coastlines even if they seem accurate overall, which has important implications for policy decisions based on this data.

SPONSOR MESSAGES:

***

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

Goto https://tufalabs.ai/

***

TRANSCRIPT + SHOWNOTES:

https://www.dropbox.com/scl/fi/n7yev71nsjso71jyjz1fy/RANDALLNEURIPS.pdf?rlkey=0dn4injp1sc4ts8njwf3wfmxv&dl=0

TOC:

1. Model Training Efficiency and Scale

[00:00:00] 1.1 Training Stability of Large Models on Small Datasets

[00:04:09] 1.2 Pre-training vs Random Initialization Performance Comparison

[00:07:58] 1.3 Task-Specific Models vs General LLMs Efficiency

2. Learning Paradigms and Data Distribution

[00:10:35] 2.1 Fair Language Model Paradox and Token Frequency Issues

[00:12:02] 2.2 Pre-training vs Single-task Learning Spectrum

[00:16:04] 2.3 Theoretical Equivalence of Supervised and Self-supervised Learning

[00:19:40] 2.4 Self-Supervised Learning and Supervised Learning Relationships

[00:21:25] 2.5 SSL Objectives and Heavy-tailed Data Distribution Challenges

3. Geographic Representation in ML Systems

[00:25:20] 3.1 Geographic Bias in Earth Data Models and Neural Representations

[00:28:10] 3.2 Mathematical Limitations and Model Improvements

[00:30:24] 3.3 Data Quality and Geographic Bias in ML Datasets

REFS:

[00:01:40] Research on training large language models from scratch on small datasets, Randall Balestriero et al.

https://openreview.net/forum?id=wYGBWOjq1Q

[00:10:35] The Fair Language Model Paradox (2024), Andrea Pinto, Tomer Galanti, Randall Balestriero

https://arxiv.org/abs/2410.11985

[00:12:20] Muppet: Massive Multi-task Representations with Pre-Finetuning (2021), Armen Aghajanyan et al.

https://arxiv.org/abs/2101.11038

[00:14:30] Dissociating language and thought in large language models (2023), Kyle Mahowald et al.

https://arxiv.org/abs/2301.06627

[00:16:05] The Birth of Self-Supervised Learning: A Supervised Theory, Randall Balestriero et al.

https://openreview.net/forum?id=NhYAjAAdQT

[00:21:25] VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning, Adrien Bardes, Jean Ponce, Yann LeCun

https://arxiv.org/abs/2105.04906

[00:25:20] No Location Left Behind: Measuring and Improving the Fairness of Implicit Representations for Earth Data (2025), Daniel Cai, Randall Balestriero, et al.

https://arxiv.org/abs/2502.06831

[00:33:45] Mark Ibrahim et al.'s work on geographic bias in computer vision datasets, Mark Ibrahim

https://arxiv.org/pdf/2304.12210

224 episodes

Prof. Randall Balestriero - LLMs without pretraining and SSL

Machine Learning Street Talk (MLST)

268 subscribers

published about a year ago

MP3•Episode home

SPONSOR MESSAGES:

***

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

Goto https://tufalabs.ai/

***

TRANSCRIPT + SHOWNOTES:

https://www.dropbox.com/scl/fi/n7yev71nsjso71jyjz1fy/RANDALLNEURIPS.pdf?rlkey=0dn4injp1sc4ts8njwf3wfmxv&dl=0

TOC:

1. Model Training Efficiency and Scale

[00:00:00] 1.1 Training Stability of Large Models on Small Datasets

[00:04:09] 1.2 Pre-training vs Random Initialization Performance Comparison

[00:07:58] 1.3 Task-Specific Models vs General LLMs Efficiency

2. Learning Paradigms and Data Distribution

[00:10:35] 2.1 Fair Language Model Paradox and Token Frequency Issues

[00:12:02] 2.2 Pre-training vs Single-task Learning Spectrum

[00:16:04] 2.3 Theoretical Equivalence of Supervised and Self-supervised Learning

[00:19:40] 2.4 Self-Supervised Learning and Supervised Learning Relationships

[00:21:25] 2.5 SSL Objectives and Heavy-tailed Data Distribution Challenges

3. Geographic Representation in ML Systems

[00:25:20] 3.1 Geographic Bias in Earth Data Models and Neural Representations

[00:28:10] 3.2 Mathematical Limitations and Model Improvements

[00:30:24] 3.3 Data Quality and Geographic Bias in ML Datasets

REFS:

[00:01:40] Research on training large language models from scratch on small datasets, Randall Balestriero et al.

https://openreview.net/forum?id=wYGBWOjq1Q

[00:10:35] The Fair Language Model Paradox (2024), Andrea Pinto, Tomer Galanti, Randall Balestriero

https://arxiv.org/abs/2410.11985

[00:12:20] Muppet: Massive Multi-task Representations with Pre-Finetuning (2021), Armen Aghajanyan et al.

https://arxiv.org/abs/2101.11038

[00:14:30] Dissociating language and thought in large language models (2023), Kyle Mahowald et al.

https://arxiv.org/abs/2301.06627

[00:16:05] The Birth of Self-Supervised Learning: A Supervised Theory, Randall Balestriero et al.

https://openreview.net/forum?id=NhYAjAAdQT

[00:21:25] VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning, Adrien Bardes, Jean Ponce, Yann LeCun

https://arxiv.org/abs/2105.04906

[00:25:20] No Location Left Behind: Measuring and Improving the Fairness of Implicit Representations for Earth Data (2025), Daniel Cai, Randall Balestriero, et al.

https://arxiv.org/abs/2502.06831

[00:33:45] Mark Ibrahim et al.'s work on geographic bias in computer vision datasets, Mark Ibrahim

https://arxiv.org/pdf/2304.12210

224 episodes

#Machine Learning Street Talk #Artificial Intelligence #Tech #Machine Learning

All episodes

1
Pushing compute to the limits of physics 1:23:32

1 day ago1:23:32

1:23:32

Dr. Maxwell Ramstead grills Guillaume Verdon (AKA “Beff Jezos”) who's the founder of Thermodynamic computing startup Extropic. ***SPONSOR MESSAGE*** Google Gemini 2.5 Flash is a state-of-the-art language model in the Gemini app. Sign up at https://gemini.google.com *** Guillaume shares his unique path – from dreaming about space travel as a kid to becoming a physicist, then working on quantum computing at Google, to developing a radically new form of computing hardware for machine learning. He explains how he hit roadblocks with traditional physics and computing, leading him to start his company – building "thermodynamic computers." These are based on a new design for super-efficient chips that use the natural chaos of electrons (think noise and heat) to power AI tasks, which promises to speed up AND lower the costs of modern probabilistic techniques like sampling. He is driven by the pursuit of building computers that work more like your brain, which (by the way) runs on a banana and a glass of water! Guillaume talks about his alter ego, Beff Jezos, and the "Effective Accelerationism" (e/acc) movement that he initiated. Its objective is to speed up tech progress in order to “grow civilization” (as measured by energy use and innovation), rather than “slowing down out of fear”. Guillaume argues we need to embrace variance, exploration, and optimism to avoid getting stuck or outpaced by competitors like China. He and Maxwell discuss big ideas like merging humans with AI, decentralizing intelligence, and why boundless growth (with smart constraints) is “key to humanity's future”.…

1
The Fractured Entangled Representation Hypothesis (Kenneth Stanley, Akarsh Kumar) 2:16:22

17 days ago2:16:22

2:16:22

Are the AI models you use today imposters? Please watch the intro video we did before this: https://www.youtube.com/watch?v=o1q6Hhz0MAg In this episode, hosts Dr. Tim Scarfe and Dr. Duggar are joined by AI researcher Prof. Kenneth Stanley and MIT PhD student Akash Kumar to discuss their fascinating paper, "Questioning Representational Optimism in Deep Learning." Imagine you ask two people to draw a perfect skull. One is a brilliant artist who understands anatomy, the other is a machine that just traces the image. Both drawings look identical, but the artist understands what a skull is—they know where the mouth is, how the jaw works, and that it's symmetrical. The machine just has a tangled mess of lines that happens to form the right picture. An AI with an elegant representation, has the building blocks to generate truly new ideas. The Path Is the Goal: As Kenneth Stanley puts it, "it matters not just where you get, but how you got there". Two students can ace a math test, but the one who truly understands the concepts—instead of just memorizing formulas—is the one who will go on to make new discoveries. The show is a mixture of 3 separate recordings we have done, the original Patreon warmup with Tim/Kenneth, the Tim/Keith "Steakhouse" recorded after the main interview, then the main interview with Kenneth/Akarsh/Keith/Tim. Feel free to skip around. We had to edit this in a rush as we are travelling next week but it's reasonably cleaned up. TOC: 00:00:00 Intro: Garbage vs. Amazing Representations 00:05:42 How Good Representations Form 00:11:14 Challenging the "Bitter Lesson" 00:18:04 AI Creativity & Representation Types 00:22:13 Steakhouse: Critiques & Alternatives 00:28:30 Steakhouse: Key Concepts & Goldilocks Zone 00:39:42 Steakhouse: A Sober View on AI Risk 00:43:46 Steakhouse: The Paradox of Open-Ended Search 00:47:58 Main Interview: Paper Intro & Core Concepts 00:56:44 Main Interview: Deception and Evolvability 01:36:30 Main Interview: Reinterpreting Evolution 01:56:16 Main Interview: Impostor Intelligence 02:11:15 Main Interview: Recommendations for AI Research REFS: Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis Akarsh Kumar, Jeff Clune, Joel Lehman, Kenneth O. Stanley https://arxiv.org/pdf/2505.11581 Kenneth O. Stanley, Joel Lehman Why Greatness Cannot Be Planned: The Myth of the Objective https://amzn.to/44xLaXK Original show with Kenneth from 4 years ago: https://www.youtube.com/watch?v=lhYGXYeMq_E Kenneth Stanley is SVP Open Endedness at Lila Sciences https://x.com/kenneth0stanley Akarsh Kumar (MIT) https://akarshkumar.com/ AND... Kenneth is HIRING (this is an OPPORTUNITY OF A LIFETIME!) Research Engineer: https://job-boards.greenhouse.io/lila/jobs/7890007002 Research Scientist: https://job-boards.greenhouse.io/lila/jobs/8012245002 TRANSCRIPT: https://app.rescript.info/public/share/W_T7E1OC2Wj49ccqlIOOztg2MJWaaVbovTeyxcFEQdU…

1
The Fractured Entangled Representation Hypothesis (Intro) 15:45

17 days ago15:45

15:45

What if today's incredible AI is just a brilliant "impostor"? This episode features host Dr. Tim Scarfe in conversation with guests Prof. Kenneth Stanley (ex-OpenAI), Dr. Keith Duggar (MIT), and Arkash Kumar (MIT).While AI today produces amazing results on the surface, its internal understanding is a complete mess, described as "total spaghetti" [00:00:49]. This is because it's trained with a brute-force method (SGD) that’s like building a sandcastle: it looks right from a distance, but has no real structure holding it together [00:01:45].To explain the difference, Keith Duggar shares a great analogy about his high school physics classes [00:03:18]. One class was about memorizing lots of formulas for specific situations (like the "impostor" AI). The other used calculus to derive the answers from a deeper understanding, which was much easier and more powerful. This is the core difference: one method memorizes, the other truly understands.The episode then introduces a different, more powerful way to build AI, based on Kenneth Stanley's old experiment, "Picbreeder" [00:04:45]. This method creates AI with a shockingly clean and intuitive internal model of the world. For example, it might develop a model of a skull where it understands the "mouth" as a separate component it can open and close, without ever being explicitly trained on that action [00:06:15]. This deep understanding emerges bottom-up, without massive datasets.The secret is to abandon a fixed goal and embrace "deception" [00:08:42]—the idea that the stepping stones to a great discovery often don't look anything like the final result. Instead of optimizing for a target, the AI is built through an open-ended process of exploring what's "interesting" [00:09:15]. This creates a more flexible and adaptable foundation, a bit like how evolvability wins out in nature [00:10:30].The show concludes by arguing that this choice matters immensely. The "impostor" path may be hitting a wall, requiring insane amounts of money and energy for progress and failing to deliver true creativity or continual learning [00:13:00]. The ultimate message is a call to not put all our eggs in one basket [00:14:25]. We should explore these open-ended, creative paths to discover a more genuine form of intelligence, which may be found where we least expect it.REFS:Questioning Representational Optimism in Deep Learning:The Fractured Entangled Representation HypothesisAkarsh Kumar, Jeff Clune, Joel Lehman, Kenneth O. Stanleyhttps://arxiv.org/pdf/2505.11581Kenneth O. Stanley, Joel LehmanWhy Greatness Cannot Be Planned: The Myth of the Objectivehttps://amzn.to/44xLaXKOriginal show with Kenneth from 4 years ago:https://www.youtube.com/watch?v=lhYGXYeMq_EKenneth Stanley is SVP Open Endedness at Lila Scienceshttps://x.com/kenneth0stanleyAkarsh Kumar (MIT)https://akarshkumar.com/AND... Kenneth is HIRING (this is an OPPORTUNITY OF A LIFETIME!)Research Engineer: https://job-boards.greenhouse.io/lila/jobs/7890007002Research Scientist: https://job-boards.greenhouse.io/lila/jobs/8012245002Tim's Code visualisation of FER based on Akarsh repo: https://github.com/ecsplendid/ferTRANSCRIPT: https://app.rescript.info/public/share/YKAZzZ6lwZkjTLRpVJreOOxGhLI8y4m3fAyU8NSavx0…

1
Three Red Lines We're About to Cross Toward AGI (Daniel Kokotajlo, Gary Marcus, Dan Hendrycks) 2:07:07

29 days ago2:07:07

2:07:07

What if the most powerful technology in human history is being built by people who openly admit they don't trust each other? In this explosive 2-hour debate, three AI experts pull back the curtain on the shocking psychology driving the race to Artificial General Intelligence—and why the people building it might be the biggest threat of all. Kokotajlo predicts AGI by 2028 based on compute scaling trends. Marcus argues we haven't solved basic cognitive problems from his 2001 research. The stakes? If Kokotajlo is right and Marcus is wrong about safety progress, humanity may have already lost control. Sponsor messages: ======== Google Gemini: Google Gemini features Veo3, a state-of-the-art AI video generation model in the Gemini app. Sign up at https://gemini.google.com Tufa AI Labs are hiring for ML Engineers and a Chief Scientist in Zurich/SF. They are top of the ARCv2 leaderboard! https://tufalabs.ai/ ======== Guest Powerhouse Gary Marcus - Cognitive scientist, author of "Taming Silicon Valley," and AI's most prominent skeptic who's been warning about the same fundamental problems for 25 years (https://garymarcus.substack.com/) Daniel Kokotajlo - Former OpenAI insider turned whistleblower who reveals the disturbing rationalizations of AI lab leaders in his viral "AI 2027" scenario (https://ai-2027.com/) Dan Hendrycks - Director of the Center for AI Safety who created the benchmarks used to measure AI progress and argues we have only years, not decades, to prevent catastrophe (https://danhendrycks.com/) Transcript: http://app.rescript.info/public/share/tEcx4UkToi-2jwS1cN51CW70A4Eh6QulBRxDILoXOno TOC: Introduction: The AI Arms Race 00:00:04 - The Danger of Automated AI R&D 00:00:43 - The Rationalization: "If we don't, someone else will" 00:01:56 - Sponsor Reads (Tufa AI Labs & Google Gemini) 00:02:55 - Guest Introductions The Philosophical Stakes 00:04:13 - What is the Positive Vision for AGI? 00:07:00 - The Abundance Scenario: Superintelligent Economy 00:09:06 - Differentiating AGI and Superintelligence (ASI) 00:11:41 - Sam Altman: "A Decade in a Month" 00:14:47 - Economic Inequality & The UBI Problem Policy and Red Lines 00:17:13 - The Pause Letter: Stopping vs. Delaying AI 00:20:03 - Defining Three Concrete Red Lines for AI Development 00:25:24 - Racing Towards Red Lines & The Myth of "Durable Advantage" 00:31:15 - Transparency and Public Perception 00:35:16 - The Rationalization Cascade: Why AI Labs Race to "Win" Forecasting AGI: Timelines and Methodologies 00:42:29 - The Case for Short Timelines (Median 2028) 00:47:00 - Scaling Limits: Compute, Data, and Money 00:49:36 - Forecasting Models: Bio-Anchors and Agentic Coding 00:53:15 - The 10^45 FLOP Thought Experiment The Great Debate: Cognitive Gaps vs. Scaling 00:58:41 - Gary Marcus's Counterpoint: The Unsolved Problems of Cognition 01:00:46 - Current AI Can't Play Chess Reliably 01:08:23 - Can Tools and Neurosymbolic AI Fill the Gaps? 01:16:13 - The Multi-Dimensional Nature of Intelligence 01:24:26 - The Benchmark Debate: Data Contamination and Reliability 01:31:15 - The Superhuman Coder Milestone Debate 01:37:45 - The Driverless Car Analogy The Alignment Problem 01:39:45 - Has Any Progress Been Made on Alignment? 01:42:43 - "Fairly Reasonably Scares the Sh*t Out of Me" 01:46:30 - Distinguishing Model vs. Process Alignment Scenarios and Conclusions 01:49:26 - Gary's Alternative Scenario: The Neurosymbolic Shift 01:53:35 - Will AI Become Jeff Dean? 01:58:41 - Takeoff Speeds and Exceeding Human Intelligence 02:03:19 - Final Disagreements and Closing Remarks REFS: Gary Marcus (2001) - The Algebraic Mind https://mitpress.mit.edu/9780262632683/the-algebraic-mind/ 00:59:00 Gary Marcus & Ernest Davis (2019) - Rebooting AI https://www.penguinrandomhouse.com/books/566677/rebooting-ai-by-gary-marcus-and-ernest-davis/ 01:31:59 Gary Marcus (2024) - Taming SV https://www.hachettebookgroup.com/titles/gary-marcus/taming-silicon-valley/9781541704091/ 00:03:01…

1
How AI Learned to Talk and What It Means - Prof. Christopher Summerfield 1:08:28

5 weeks ago1:08:28

1:08:28

We interview Professor Christopher Summerfield from Oxford University about his new book "These Strange New Minds: How AI Learned to Talk and What It". AI learned to understand the world just by reading text - something scientists thought was impossible. You don't need to see a cat to know what one is; you can learn everything from words alone. This is "the most astonishing scientific discovery of the 21st century."People are split: some refuse to call what AI does "thinking" even when it outperforms humans, while others believe if it acts intelligent, it is intelligent. Summerfield takes the middle ground - AI does something genuinely like human reasoning, but that doesn't make it human.Sponsor messages:========Google Gemini: Google Gemini features Veo3, a state-of-the-art AI video generation model in the Gemini app. Sign up at https://gemini.google.comTufa AI Labs are hiring for ML Engineers and a Chief Scientist in Zurich/SF. They are top of the ARCv2 leaderboard! https://tufalabs.ai/========Prof. Christopher Summerfieldhttps://www.psy.ox.ac.uk/people/christopher-summerfieldThese Strange New Minds: How AI Learned to Talk and What It Meanshttps://amzn.to/4e26BVaTable of Contents:Introduction & Setup00:00:00 Superman 3 Metaphor - Humans Absorbed by Machines00:02:01 Book Introduction & AI Debate Context00:03:45 Sponsor Segments (Google Gemini, Tufa Labs)Philosophical Foundations00:04:48 The Fractured AI Discourse00:08:21 Ancient Roots: Aristotle vs Plato (Empiricism vs Rationalism)00:10:14 Historical AI: Symbolic Logic and Its LimitsThe Language Revolution00:12:11 ChatGPT as the Rubicon Moment00:14:00 The Astonishing Discovery: Learning Reality from Words Alone00:15:47 Equivalentists vs Exceptionalists DebateCognitive Science Perspectives00:19:12 Functionalism and the Duck Test00:21:48 Brain-AI Similarities and Computational Principles00:24:53 Reconciling Chomsky: Evolution vs Learning00:28:15 Lamarckian AI vs Darwinian Human LearningThe Reality of AI Capabilities00:30:29 Anthropomorphism and the Clever Hans Effect00:32:56 The Intentional Stance and Nature of Thinking00:37:56 Three Major AI Worries: Agency, Personalization, DynamicsSocietal Risks and Complex Systems00:37:56 AI Agents and Flash Crash Scenarios00:42:50 Removing Frictions: The Lawfare Example00:46:15 Gradual Disempowerment Theory00:49:18 The Faustian Pact of TechnologyHuman Agency and Control00:51:18 The Crisis of Authenticity00:56:22 Psychology of Control vs Reward01:00:21 Dopamine Hacking and Variable ReinforcementFuture Directions01:02:27 Evolution as Goal-less Optimization01:03:31 Open-Endedness and Creative Evolution01:06:46 Writing, Creativity, and AI-Generated Content01:08:18 Closing RemarksREFS:Academic References (Abbreviated)Essential Books"These Strange New Minds" - C. Summerfield [00:02:01] - Main discussion topic"The Mind is Flat" - N. Chater [00:33:45] - Summerfield's favorite on cognitive illusions"AI: A Guide for Thinking Humans" - M. Mitchell [00:04:58] - Host's previous favorite"Principia Mathematica" - Russell & Whitehead [00:11:00] - Logic Theorist reference"Syntactic Structures" - N. Chomsky (1957) [00:13:30] - Generative grammar foundation"Why Greatness Cannot Be Planned" - Stanley & Lehman [01:04:00] - Open-ended evolutionKey Papers & Studies"Gradual Disempowerment" - D. Duvenaud [00:46:45] - AI threat model"Counterfeit People" - D. Dennett (Atlantic) [00:52:45] - AI societal risks"Open-Endedness is Essential..." - DeepMind/Rocktäschel/Hughes [01:03:42]Heider & Simmel (1944) [00:30:45] - Agency attribution to shapesWhitehall Studies - M. Marmot [00:59:32] - Control and health outcomes"Clever Hans" - O. Pfungst (1911) [00:31:47] - Animal intelligence illusionHistorical References…

1
"Blurring Reality" - Chai's Social AI Platform (SPONSORED) 50:59

8 weeks ago50:59

50:59

"Blurring Reality" - Chai's Social AI Platform - sponsored This episode of MLST explores the groundbreaking work of Chai, a social AI platform that quietly built one of the world's largest AI companion ecosystems before ChatGPT's mainstream adoption. With over 10 million active users and just 13 engineers serving 2 trillion tokens per day, Chai discovered the massive appetite for AI companionship through serendipity while searching for product-market fit. CHAI sponsored this show *because they want to hire amazing engineers* -- CAREER OPPORTUNITIES AT CHAI Chai is actively hiring in Palo Alto with competitive compensation ($300K-$800K+ equity) for roles including AI Infrastructure Engineers, Software Engineers, Applied AI Researchers, and more. Fast-track qualification available for candidates with significant product launches, open source contributions, or entrepreneurial success. https://www.chai-research.com/jobs/ The conversation with founder William Beauchamp and engineers Tom Lu and Nischay Dhankhar covers Chai's innovative technical approaches including reinforcement learning from human feedback (RLHF), model blending techniques that combine smaller models to outperform larger ones, and their unique infrastructure challenges running exaflop-class compute. SPONSOR MESSAGES: *** Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers in Zurich and SF. Goto https://tufalabs.ai/ *** Key themes explored include: - The ethics of AI engagement optimization and attention hacking - Content moderation at scale with a lean engineering team - The shift from AI as utility tool to AI as social companion - How users form deep emotional bonds with artificial intelligence - The broader implications of AI becoming a social medium We also examine OpenAI's recent pivot toward companion AI with April's new GPT-4o, suggesting a fundamental shift in how we interact with artificial intelligence - from utility-focused tools to companion-like experiences that blur the lines between human and artificial intimacy. The episode also covers Chai's unconventional approach to hiring only top-tier engineers, their bootstrap funding strategy focused on user revenue over VC funding, and their rapid experimentation culture where one in five experiments succeed. TOC: 00:00:00 - Introduction: Steve Jobs' AI Vision & Chai's Scale 00:04:02 - Chapter 1: Simulators - The Birth of Social AI 00:13:34 - Chapter 2: Engineering at Chai - RLHF & Model Blending 00:21:49 - Chapter 3: Social Impact of GenAI - Ethics & Safety 00:33:55 - Chapter 4: The Lean Machine - 13 Engineers, Millions of Users 00:42:38 - Chapter 5: GPT-4o Becoming a Companion - OpenAI's Pivot 00:50:10 - Chapter 6: What Comes Next - The Future of AI Intimacy TRANSCRIPT: https://www.dropbox.com/scl/fi/yz2ewkzmwz9rbbturfbap/CHAI.pdf?rlkey=uuyk2nfhjzezucwdgntg5ubqb&dl=0…

1
Google AlphaEvolve - Discovering new science (exclusive interview) 1:13:58

10 weeks ago1:13:58

1:13:58

Today GoogleDeepMind released AlphaEvolve: a Gemini coding agent for algorithm discovery. It beat the famous Strassen algorithm for matrix multiplication set 56 years ago. Google has been killing it recently. We had early access to the paper and interviewed the researchers behind the work. AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/ Authors: Alexander Novikov*, Ngân Vũ*, Marvin Eisenberger*, Emilien Dupont*, Po-Sen Huang*, Adam Zsolt Wagner*, Sergey Shirobokov*, Borislav Kozlovskii*, Francisco J. R. Ruiz, Abbas Mehrabian, M. Pawan Kumar, Abigail See, Swarat Chaudhuri, George Holland, Alex Davies, Sebastian Nowozin, Pushmeet Kohli, Matej Balog* (* indicates equal contribution or special designation, if defined elsewhere) SPONSOR MESSAGES: *** Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/ *** AlphaEvolve works like a very smart, tireless programmer. It uses powerful AI language models (like Gemini) to generate ideas for computer code. Then, it uses an "evolutionary" process – like survival of the fittest for programs. It tries out many different program ideas, automatically tests how well they solve a problem, and then uses the best ones to inspire new, even better programs. Beyond this mathematical breakthrough, AlphaEvolve has already been used to improve real-world systems at Google, such as making their massive data centers run more efficiently and even speeding up the training of the AI models that power AlphaEvolve itself. The discussion also covers how humans work with AlphaEvolve, the challenges of making AI discover things, and the exciting future of AI helping scientists make new discoveries. In short, AlphaEvolve is a powerful new AI tool that can invent new algorithms and solve complex problems, showing how AI can be a creative partner in science and engineering. Guests: Matej Balog: https://x.com/matejbalog Alexander Novikov: https://x.com/SashaVNovikov REFS: MAP Elites [Jean-Baptiste Mouret, Jeff Clune] https://arxiv.org/abs/1504.04909 FunSearch [Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M. Pawan Kumar, Emilien Dupont, Francisco J. R. Ruiz, Jordan S. Ellenberg, Pengming Wang, Omar Fawzi, Pushmeet Kohli & Alhussein Fawzi] https://www.nature.com/articles/s41586-023-06924-6 TOC: [00:00:00] Introduction: Alpha Evolve's Breakthroughs, DeepMind's Lineage, and Real-World Impact [00:12:06] Introducing AlphaEvolve: Concept, Evolutionary Algorithms, and Architecture [00:16:56] Search Challenges: The Halting Problem and Enabling Creative Leaps [00:23:20] Knowledge Augmentation: Self-Generated Data, Meta-Prompting, and Library Learning [00:29:08] Matrix Multiplication Breakthrough: From Strassen to AlphaEvolve's 48 Multiplications [00:39:11] Problem Representation: Direct Solutions, Constructors, and Search Algorithms [00:46:06] Developer Reflections: Surprising Outcomes and Superiority over Simple LLM Sampling [00:51:42] Algorithmic Improvement: Hill Climbing, Program Synthesis, and Intelligibility [01:00:24] Real-World Application: Complex Evaluations and Robotics [01:05:39] Role of LLMs & Future: Advanced Models, Recursive Self-Improvement, and Human-AI Collaboration [01:11:22] Resource Considerations: Compute Costs of AlphaEvolve This is a trial of posting videos on Spotify, thoughts? Email me or chat in our Discord…

1
Prof. Randall Balestriero - LLMs without pretraining and SSL 34:30

13 weeks ago34:30

34:30

Randall Balestriero joins the show to discuss some counterintuitive findings in AI. He shares research showing that huge language models, even when started from scratch (randomly initialized) without massive pre-training, can learn specific tasks like sentiment analysis surprisingly well, train stably, and avoid severe overfitting, sometimes matching the performance of costly pre-trained models. This raises questions about when giant pre-training efforts are truly worth it. He also talks about how self-supervised learning (where models learn from data structure itself) and traditional supervised learning (using labeled data) are fundamentally similar, allowing researchers to apply decades of supervised learning theory to improve newer self-supervised methods. Finally, Randall touches on fairness in AI models used for Earth data (like climate prediction), revealing that these models can be biased, performing poorly in specific locations like islands or coastlines even if they seem accurate overall, which has important implications for policy decisions based on this data. SPONSOR MESSAGES: *** Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/ *** TRANSCRIPT + SHOWNOTES: https://www.dropbox.com/scl/fi/n7yev71nsjso71jyjz1fy/RANDALLNEURIPS.pdf?rlkey=0dn4injp1sc4ts8njwf3wfmxv&dl=0 TOC: 1. Model Training Efficiency and Scale [00:00:00] 1.1 Training Stability of Large Models on Small Datasets [00:04:09] 1.2 Pre-training vs Random Initialization Performance Comparison [00:07:58] 1.3 Task-Specific Models vs General LLMs Efficiency 2. Learning Paradigms and Data Distribution [00:10:35] 2.1 Fair Language Model Paradox and Token Frequency Issues [00:12:02] 2.2 Pre-training vs Single-task Learning Spectrum [00:16:04] 2.3 Theoretical Equivalence of Supervised and Self-supervised Learning [00:19:40] 2.4 Self-Supervised Learning and Supervised Learning Relationships [00:21:25] 2.5 SSL Objectives and Heavy-tailed Data Distribution Challenges 3. Geographic Representation in ML Systems [00:25:20] 3.1 Geographic Bias in Earth Data Models and Neural Representations [00:28:10] 3.2 Mathematical Limitations and Model Improvements [00:30:24] 3.3 Data Quality and Geographic Bias in ML Datasets REFS: [00:01:40] Research on training large language models from scratch on small datasets, Randall Balestriero et al. https://openreview.net/forum?id=wYGBWOjq1Q [00:10:35] The Fair Language Model Paradox (2024), Andrea Pinto, Tomer Galanti, Randall Balestriero https://arxiv.org/abs/2410.11985 [00:12:20] Muppet: Massive Multi-task Representations with Pre-Finetuning (2021), Armen Aghajanyan et al. https://arxiv.org/abs/2101.11038 [00:14:30] Dissociating language and thought in large language models (2023), Kyle Mahowald et al. https://arxiv.org/abs/2301.06627 [00:16:05] The Birth of Self-Supervised Learning: A Supervised Theory, Randall Balestriero et al. https://openreview.net/forum?id=NhYAjAAdQT [00:21:25] VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning, Adrien Bardes, Jean Ponce, Yann LeCun https://arxiv.org/abs/2105.04906 [00:25:20] No Location Left Behind: Measuring and Improving the Fairness of Implicit Representations for Earth Data (2025), Daniel Cai, Randall Balestriero, et al. https://arxiv.org/abs/2502.06831 [00:33:45] Mark Ibrahim et al.'s work on geographic bias in computer vision datasets, Mark Ibrahim https://arxiv.org/pdf/2304.12210…

1
How Machines Learn to Ignore the Noise (Kevin Ellis + Zenna Tavares) 1:16:55

15 weeks ago1:16:55

1:16:55

Prof. Kevin Ellis and Dr. Zenna Tavares talk about making AI smarter, like humans. They want AI to learn from just a little bit of information by actively trying things out, not just by looking at tons of data. They discuss two main ways AI can "think": one way is like following specific rules or steps (like a computer program), and the other is more intuitive, like guessing based on patterns (like modern AI often does). They found combining both methods works well for solving complex puzzles like ARC. A key idea is "compositionality" - building big ideas from small ones, like LEGOs. This is powerful but can also be overwhelming. Another important idea is "abstraction" - understanding things simply, without getting lost in details, and knowing there are different levels of understanding. Ultimately, they believe the best AI will need to explore, experiment, and build models of the world, much like humans do when learning something new. SPONSOR MESSAGES: *** Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/ *** TRANSCRIPT: https://www.dropbox.com/scl/fi/3ngggvhb3tnemw879er5y/BASIS.pdf?rlkey=lr2zbj3317mex1q5l0c2rsk0h&dl=0 Zenna Tavares: http://www.zenna.org/ Kevin Ellis: https://www.cs.cornell.edu/~ellisk/ TOC: 1. Compositionality and Learning Foundations [00:00:00] 1.1 Compositional Search and Learning Challenges [00:03:55] 1.2 Bayesian Learning and World Models [00:12:05] 1.3 Programming Languages and Compositionality Trade-offs [00:15:35] 1.4 Inductive vs Transductive Approaches in AI Systems 2. Neural-Symbolic Program Synthesis [00:27:20] 2.1 Integration of LLMs with Traditional Programming and Meta-Programming [00:30:43] 2.2 Wake-Sleep Learning and DreamCoder Architecture [00:38:26] 2.3 Program Synthesis from Interactions and Hidden State Inference [00:41:36] 2.4 Abstraction Mechanisms and Resource Rationality [00:48:38] 2.5 Inductive Biases and Causal Abstraction in AI Systems 3. Abstract Reasoning Systems [00:52:10] 3.1 Abstract Concepts and Grid-Based Transformations in ARC [00:56:08] 3.2 Induction vs Transduction Approaches in Abstract Reasoning [00:59:12] 3.3 ARC Limitations and Interactive Learning Extensions [01:06:30] 3.4 Wake-Sleep Program Learning and Hybrid Approaches [01:11:37] 3.5 Project MARA and Future Research Directions REFS: [00:00:25] DreamCoder, Kevin Ellis et al. https://arxiv.org/abs/2006.08381 [00:01:10] Mind Your Step, Ryan Liu et al. https://arxiv.org/abs/2410.21333 [00:06:05] Bayesian inference, Griffiths, T. L., Kemp, C., & Tenenbaum, J. B. https://psycnet.apa.org/record/2008-06911-003 [00:13:00] Induction and Transduction, Wen-Ding Li, Zenna Tavares, Yewen Pu, Kevin Ellis https://arxiv.org/abs/2411.02272 [00:23:15] Neurosymbolic AI, Garcez, Artur d'Avila et al. https://arxiv.org/abs/2012.05876 [00:33:50] Induction and Transduction (II), Wen-Ding Li, Kevin Ellis et al. https://arxiv.org/abs/2411.02272 [00:38:35] ARC, François Chollet https://arxiv.org/abs/1911.01547 [00:39:20] Causal Reactive Programs, Ria Das, Joshua B. Tenenbaum, Armando Solar-Lezama, Zenna Tavares http://www.zenna.org/publications/autumn2022.pdf [00:42:50] MuZero, Julian Schrittwieser et al. http://arxiv.org/pdf/1911.08265 [00:43:20] VisualPredicator, Yichao Liang https://arxiv.org/abs/2410.23156 [00:48:55] Bayesian models of cognition, Joshua B. Tenenbaum https://mitpress.mit.edu/9780262049412/bayesian-models-of-cognition/ [00:49:30] The Bitter Lesson, Rich Sutton http://www.incompleteideas.net/IncIdeas/BitterLesson.html [01:06:35] Program induction, Kevin Ellis, Wen-Ding Li https://arxiv.org/pdf/2411.02272 [01:06:50] DreamCoder (II), Kevin Ellis et al. https://arxiv.org/abs/2006.08381 [01:11:55] Project MARA, Zenna Tavares, Kevin Ellis https://www.basis.ai/blog/mara/…

1
Eiso Kant (CTO poolside) - Superhuman Coding Is Coming! 1:36:28

16 weeks ago1:36:28

1:36:28

Eiso Kant, CTO of poolside AI, discusses the company's approach to building frontier AI foundation models, particularly focused on software development. Their unique strategy is reinforcement learning from code execution feedback which is an important axis for scaling AI capabilities beyond just increasing model size or data volume. Kant predicts human-level AI in knowledge work could be achieved within 18-36 months, outlining poolside's vision to dramatically increase software development productivity and accessibility. SPONSOR MESSAGES: *** Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/ *** Eiso Kant: https://x.com/eisokant https://poolside.ai/ TRANSCRIPT: https://www.dropbox.com/scl/fi/szepl6taqziyqie9wgmk9/poolside.pdf?rlkey=iqar7dcwshyrpeoz0xa76k422&dl=0 TOC: 1. Foundation Models and AI Strategy [00:00:00] 1.1 Foundation Models and Timeline Predictions for AI Development [00:02:55] 1.2 Poolside AI's Corporate History and Strategic Vision [00:06:48] 1.3 Foundation Models vs Enterprise Customization Trade-offs 2. Reinforcement Learning and Model Economics [00:15:42] 2.1 Reinforcement Learning and Code Execution Feedback Approaches [00:22:06] 2.2 Model Economics and Experimental Optimization 3. Enterprise AI Implementation [00:25:20] 3.1 Poolside's Enterprise Deployment Strategy and Infrastructure [00:26:00] 3.2 Enterprise-First Business Model and Market Focus [00:27:05] 3.3 Foundation Models and AGI Development Approach [00:29:24] 3.4 DeepSeek Case Study and Infrastructure Requirements 4. LLM Architecture and Performance [00:30:15] 4.1 Distributed Training and Hardware Architecture Optimization [00:33:01] 4.2 Model Scaling Strategies and Chinchilla Optimality Trade-offs [00:36:04] 4.3 Emergent Reasoning and Model Architecture Comparisons [00:43:26] 4.4 Balancing Creativity and Determinism in AI Models [00:50:01] 4.5 AI-Assisted Software Development Evolution 5. AI Systems Engineering and Scalability [00:58:31] 5.1 Enterprise AI Productivity and Implementation Challenges [00:58:40] 5.2 Low-Code Solutions and Enterprise Hiring Trends [01:01:25] 5.3 Distributed Systems and Engineering Complexity [01:01:50] 5.4 GenAI Architecture and Scalability Patterns [01:01:55] 5.5 Scaling Limitations and Architectural Patterns in AI Code Generation 6. AI Safety and Future Capabilities [01:06:23] 6.1 Semantic Understanding and Language Model Reasoning Approaches [01:12:42] 6.2 Model Interpretability and Safety Considerations in AI Systems [01:16:27] 6.3 AI vs Human Capabilities in Software Development [01:33:45] 6.4 Enterprise Deployment and Security Architecture CORE REFS (see shownotes for URLs/more refs): [00:15:45] Research demonstrating how training on model-generated content leads to distribution collapse in AI models, Ilia Shumailov et al. (Key finding on synthetic data risk) [00:20:05] Foundational paper introducing Word2Vec for computing word vector representations, Tomas Mikolov et al. (Seminal NLP technique) [00:22:15] OpenAI O3 model's breakthrough performance on ARC Prize Challenge, OpenAI (Significant AI reasoning benchmark achievement) [00:22:40] Seminal paper proposing a formal definition of intelligence as skill-acquisition efficiency, François Chollet (Influential AI definition/philosophy) [00:30:30] Technical documentation of DeepSeek's V3 model architecture and capabilities, DeepSeek AI (Details on a major new model) [00:34:30] Foundational paper establishing optimal scaling laws for LLM training, Jordan Hoffmann et al. (Key paper on LLM scaling) [00:45:45] Seminal essay arguing that scaling computation consistently trumps human-engineered solutions in AI, Richard S. Sutton (Influential "Bitter Lesson" perspective)…

1
The Compendium - Connor Leahy and Gabriel Alfour 1:37:10

16 weeks ago1:37:10

1:37:10

Connor Leahy and Gabriel Alfour, AI researchers from Conjecture and authors of "The Compendium," joinus for a critical discussion centered on Artificial Superintelligence (ASI) safety and governance. Drawing from their comprehensive analysis in "The Compendium," they articulate a stark warning about the existential risks inherent in uncontrolled AI development, framing it through the lens of "intelligence domination"—where a sufficiently advanced AI could subordinate humanity, much like humans dominate less intelligent species. SPONSOR MESSAGES: *** Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/ *** TRANSCRIPT + REFS + NOTES: https://www.dropbox.com/scl/fi/p86l75y4o2ii40df5t7no/Compendium.pdf?rlkey=tukczgf3flw133sr9rgss0pnj&dl=0 https://www.thecompendium.ai/ https://en.wikipedia.org/wiki/Connor_Leahy https://www.conjecture.dev/about https://substack.com/@gabecc TOC: 1. AI Intelligence and Safety Fundamentals [00:00:00] 1.1 Understanding Intelligence and AI Capabilities [00:06:20] 1.2 Emergence of Intelligence and Regulatory Challenges [00:10:18] 1.3 Human vs Animal Intelligence Debate [00:18:00] 1.4 AI Regulation and Risk Assessment Approaches [00:26:14] 1.5 Competing AI Development Ideologies 2. Economic and Social Impact [00:29:10] 2.1 Labor Market Disruption and Post-Scarcity Scenarios [00:32:40] 2.2 Institutional Frameworks and Tech Power Dynamics [00:37:40] 2.3 Ethical Frameworks and AI Governance Debates [00:40:52] 2.4 AI Alignment Evolution and Technical Challenges 3. Technical Governance Framework [00:55:07] 3.1 Three Levels of AI Safety: Alignment, Corrigibility, and Boundedness [00:55:30] 3.2 Challenges of AI System Corrigibility and Constitutional Models [00:57:35] 3.3 Limitations of Current Boundedness Approaches [00:59:11] 3.4 Abstract Governance Concepts and Policy Solutions 4. Democratic Implementation and Coordination [00:59:20] 4.1 Governance Design and Measurement Challenges [01:00:10] 4.2 Democratic Institutions and Experimental Governance [01:14:10] 4.3 Political Engagement and AI Safety Advocacy [01:25:30] 4.4 Practical AI Safety Measures and International Coordination CORE REFS: [00:01:45] The Compendium (2023), Leahy et al. https://pdf.thecompendium.ai/the_compendium.pdf [00:06:50] Geoffrey Hinton Leaves Google, BBC News https://www.bbc.com/news/world-us-canada-65452940 [00:10:00] ARC-AGI, Chollet https://arcprize.org/arc-agi [00:13:25] A Brief History of Intelligence, Bennett https://www.amazon.com/Brief-History-Intelligence-Humans-Breakthroughs/dp/0063286343 [00:25:35] Statement on AI Risk, Center for AI Safety https://www.safe.ai/work/statement-on-ai-risk [00:26:15] Machines of Love and Grace, Amodei https://darioamodei.com/machines-of-loving-grace [00:26:35] The Techno-Optimist Manifesto, Andreessen https://a16z.com/the-techno-optimist-manifesto/ [00:31:55] Techno-Feudalism, Varoufakis https://www.amazon.co.uk/Technofeudalism-Killed-Capitalism-Yanis-Varoufakis/dp/1847927270 [00:42:40] Introducing Superalignment, OpenAI https://openai.com/index/introducing-superalignment/ [00:47:20] Three Laws of Robotics, Asimov https://www.britannica.com/topic/Three-Laws-of-Robotics [00:50:00] Symbolic AI (GOFAI), Haugeland https://en.wikipedia.org/wiki/Symbolic_artificial_intelligence [00:52:30] Intent Alignment, Christiano https://www.alignmentforum.org/posts/HEZgGBZTpT4Bov7nH/mapping-the-conceptual-territory-in-ai-existential-safety [00:55:10] Large Language Model Alignment: A Survey, Jiang et al. http://arxiv.org/pdf/2309.15025 [00:55:40] Constitutional Checks and Balances, Bok https://plato.stanford.edu/entries/montesquieu/…

1
ARC Prize v2 Launch! (Francois Chollet and Mike Knoop) 54:15

17 weeks ago54:15

54:15

We are joined by Francois Chollet and Mike Knoop, to launch the new version of the ARC prize! In version 2, the challenges have been calibrated with humans such that at least 2 humans could solve each task in a reasonable task, but also adversarially selected so that frontier reasoning models can't solve them. The best LLMs today get negligible performance on this challenge. https://arcprize.org/ SPONSOR MESSAGES: *** Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/ *** TRANSCRIPT: https://www.dropbox.com/scl/fi/0v9o8xcpppdwnkntj59oi/ARCv2.pdf?rlkey=luqb6f141976vra6zdtptv5uj&dl=0 TOC: 1. ARC v2 Core Design & Objectives [00:00:00] 1.1 ARC v2 Launch and Benchmark Architecture [00:03:16] 1.2 Test-Time Optimization and AGI Assessment [00:06:24] 1.3 Human-AI Capability Analysis [00:13:02] 1.4 OpenAI o3 Initial Performance Results 2. ARC Technical Evolution [00:17:20] 2.1 ARC-v1 to ARC-v2 Design Improvements [00:21:12] 2.2 Human Validation Methodology [00:26:05] 2.3 Task Design and Gaming Prevention [00:29:11] 2.4 Intelligence Measurement Framework 3. O3 Performance & Future Challenges [00:38:50] 3.1 O3 Comprehensive Performance Analysis [00:43:40] 3.2 System Limitations and Failure Modes [00:49:30] 3.3 Program Synthesis Applications [00:53:00] 3.4 Future Development Roadmap REFS: [00:00:15] On the Measure of Intelligence, François Chollet https://arxiv.org/abs/1911.01547 [00:06:45] ARC Prize Foundation, François Chollet, Mike Knoop https://arcprize.org/ [00:12:50] OpenAI o3 model performance on ARC v1, ARC Prize Team https://arcprize.org/blog/oai-o3-pub-breakthrough [00:18:30] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Jason Wei et al. https://arxiv.org/abs/2201.11903 [00:21:45] ARC-v2 benchmark tasks, Mike Knoop https://arcprize.org/blog/introducing-arc-agi-public-leaderboard [00:26:05] ARC Prize 2024: Technical Report, Francois Chollet et al. https://arxiv.org/html/2412.04604v2 [00:32:45] ARC Prize 2024 Technical Report, Francois Chollet, Mike Knoop, Gregory Kamradt https://arxiv.org/abs/2412.04604 [00:48:55] The Bitter Lesson, Rich Sutton http://www.incompleteideas.net/IncIdeas/BitterLesson.html [00:53:30] Decoding strategies in neural text generation, Sina Zarrieß https://www.mdpi.com/2078-2489/12/9/355/pdf…

1
Test-Time Adaptation: the key to reasoning with DL (Mohamed Osman) 1:03:36

17 weeks ago1:03:36

1:03:36

Mohamed Osman joins to discuss MindsAI's highest scoring entry to the ARC challenge 2024 and the paradigm of test-time fine-tuning. They explore how the team, now part of Tufa Labs in Zurich, achieved state-of-the-art results using a combination of pre-training techniques, a unique meta-learning strategy, and an ensemble voting mechanism. Mohamed emphasizes the importance of raw data input and flexibility of the network. SPONSOR MESSAGES: *** Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/ *** TRANSCRIPT + REFS: https://www.dropbox.com/scl/fi/jeavyqidsjzjgjgd7ns7h/MoFInal.pdf?rlkey=cjjmo7rgtenxrr3b46nk6yq2e&dl=0 Mohamed Osman (Tufa Labs) https://x.com/MohamedOsmanML Jack Cole (Tufa Labs) https://x.com/MindsAI_Jack How and why deep learning for ARC paper: https://github.com/MohamedOsman1998/deep-learning-for-arc/blob/main/deep_learning_for_arc.pdf TOC: 1. Abstract Reasoning Foundations [00:00:00] 1.1 Test-Time Fine-Tuning and ARC Challenge Overview [00:10:20] 1.2 Neural Networks vs Programmatic Approaches to Reasoning [00:13:23] 1.3 Code-Based Learning and Meta-Model Architecture [00:20:26] 1.4 Technical Implementation with Long T5 Model 2. ARC Solution Architectures [00:24:10] 2.1 Test-Time Tuning and Voting Methods for ARC Solutions [00:27:54] 2.2 Model Generalization and Function Generation Challenges [00:32:53] 2.3 Input Representation and VLM Limitations [00:36:21] 2.4 Architecture Innovation and Cross-Modal Integration [00:40:05] 2.5 Future of ARC Challenge and Program Synthesis Approaches 3. Advanced Systems Integration [00:43:00] 3.1 DreamCoder Evolution and LLM Integration [00:50:07] 3.2 MindsAI Team Progress and Acquisition by Tufa Labs [00:54:15] 3.3 ARC v2 Development and Performance Scaling [00:58:22] 3.4 Intelligence Benchmarks and Transformer Limitations [01:01:50] 3.5 Neural Architecture Optimization and Processing Distribution REFS: [00:01:32] Original ARC challenge paper, François Chollet https://arxiv.org/abs/1911.01547 [00:06:55] DreamCoder, Kevin Ellis et al. https://arxiv.org/abs/2006.08381 [00:12:50] Deep Learning with Python, François Chollet https://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438 [00:13:35] Deep Learning with Python, François Chollet https://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438 [00:13:35] Influence of pretraining data for reasoning, Laura Ruis https://arxiv.org/abs/2411.12580 [00:17:50] Latent Program Networks, Clement Bonnet https://arxiv.org/html/2411.08706v1 [00:20:50] T5, Colin Raffel et al. https://arxiv.org/abs/1910.10683 [00:30:30] Combining Induction and Transduction for Abstract Reasoning, Wen-Ding Li, Kevin Ellis et al. https://arxiv.org/abs/2411.02272 [00:34:15] Six finger problem, Chen et al. https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_SpatialVLM_Endowing_Vision-Language_Models_with_Spatial_Reasoning_Capabilities_CVPR_2024_paper.pdf [00:38:15] DeepSeek-R1-Distill-Llama, DeepSeek AI https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B [00:40:10] ARC Prize 2024 Technical Report, François Chollet et al. https://arxiv.org/html/2412.04604v2 [00:45:20] LLM-Guided Compositional Program Synthesis, Wen-Ding Li and Kevin Ellis https://arxiv.org/html/2503.15540 [00:54:25] Abstraction and Reasoning Corpus, François Chollet https://github.com/fchollet/ARC-AGI [00:57:10] O3 breakthrough on ARC-AGI, OpenAI https://arcprize.org/ [00:59:35] ConceptARC Benchmark, Arseny Moskvichev, Melanie Mitchell https://arxiv.org/abs/2305.07141 [01:02:05] Mixtape: Breaking the Softmax Bottleneck Efficiently, Yang, Zhilin and Dai, Zihang and Salakhutdinov, Ruslan and Cohen, William W. http://papers.neurips.cc/paper/9723-mixtape-breaking-the-softmax-bottleneck-efficiently.pdf…

1
GSMSymbolic paper - Iman Mirzadeh (Apple) 1:11:23

18 weeks ago1:11:23

1:11:23

Iman Mirzadeh from Apple, who recently published the GSM-Symbolic paper discusses the crucial distinction between intelligence and achievement in AI systems. He critiques current AI research methodologies, highlighting the limitations of Large Language Models (LLMs) in reasoning and knowledge representation. SPONSOR MESSAGES: *** Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/ *** TRANSCRIPT + RESEARCH: https://www.dropbox.com/scl/fi/mlcjl9cd5p1kem4l0vqd3/IMAN.pdf?rlkey=dqfqb74zr81a5gqr8r6c8isg3&dl=0 TOC: 1. Intelligence vs Achievement in AI Systems [00:00:00] 1.1 Intelligence vs Achievement Metrics in AI Systems [00:03:27] 1.2 AlphaZero and Abstract Understanding in Chess [00:10:10] 1.3 Language Models and Distribution Learning Limitations [00:14:47] 1.4 Research Methodology and Theoretical Frameworks 2. Intelligence Measurement and Learning [00:24:24] 2.1 LLM Capabilities: Interpolation vs True Reasoning [00:29:00] 2.2 Intelligence Definition and Measurement Approaches [00:34:35] 2.3 Learning Capabilities and Agency in AI Systems [00:39:26] 2.4 Abstract Reasoning and Symbol Understanding 3. LLM Performance and Evaluation [00:47:15] 3.1 Scaling Laws and Fundamental Limitations [00:54:33] 3.2 Connectionism vs Symbolism Debate in Neural Networks [00:58:09] 3.3 GSM-Symbolic: Testing Mathematical Reasoning in LLMs [01:08:38] 3.4 Benchmark Evaluation and Model Performance Assessment REFS: [00:01:00] AlphaZero chess AI system, Silver et al. https://arxiv.org/abs/1712.01815 [00:07:10] Game Changer: AlphaZero's Groundbreaking Chess Strategies, Sadler & Regan https://www.amazon.com/Game-Changer-AlphaZeros-Groundbreaking-Strategies/dp/9056918184 [00:11:35] Cross-entropy loss in language modeling, Voita http://lena-voita.github.io/nlp_course/language_modeling.html [00:17:20] GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in LLMs, Mirzadeh et al. https://arxiv.org/abs/2410.05229 [00:21:25] Connectionism and Cognitive Architecture: A Critical Analysis, Fodor & Pylyshyn https://www.sciencedirect.com/science/article/pii/001002779090014B [00:28:55] Brain-to-body mass ratio scaling laws, Sutskever https://www.theverge.com/2024/12/13/24320811/what-ilya-sutskever-sees-openai-model-data-training [00:29:40] On the Measure of Intelligence, Chollet https://arxiv.org/abs/1911.01547 [00:33:30] On definition of intelligence, Gignac et al. https://www.sciencedirect.com/science/article/pii/S0160289624000266 [00:35:30] Defining intelligence, Wang https://cis.temple.edu/~wangp/papers.html [00:37:40] How We Learn: Why Brains Learn Better Than Any Machine... for Now, Dehaene https://www.amazon.com/How-We-Learn-Brains-Machine/dp/0525559884 [00:39:35] Surfaces and Essences: Analogy as the Fuel and Fire of Thinking, Hofstadter and Sander https://www.amazon.com/Surfaces-Essences-Analogy-Fuel-Thinking/dp/0465018475 [00:43:15] Chain-of-thought prompting, Wei et al. https://arxiv.org/abs/2201.11903 [00:47:20] Test-time scaling laws in machine learning, Brown https://podcasts.apple.com/mv/podcast/openais-noam-brown-ilge-akkaya-and-hunter-lightman-on/id1750736528?i=1000671532058 [00:47:50] Scaling Laws for Neural Language Models, Kaplan et al. https://arxiv.org/abs/2001.08361 [00:55:15] Tensor product variable binding, Smolensky https://www.sciencedirect.com/science/article/abs/pii/000437029090007M [01:08:45] GSM-8K dataset, OpenAI https://huggingface.co/datasets/openai/gsm8k…

1
Reasoning, Robustness, and Human Feedback in AI - Max Bartolo (Cohere) 1:23:11

18 weeks ago1:23:11