Artwork

Content provided by Matt Turck. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Matt Turck or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

Chasing Real AGI: Inside ARC Prize 2025 with Chollet & Knoop

1:00:45
 
Share
 

Manage episode 474925064 series 3611124
Content provided by Matt Turck. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Matt Turck or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

In this fascinating episode, we dive deep into the race towards true AI intelligence, AGI benchmarks, test-time adaptation, and program synthesis with star AI researcher (and philosopher) Francois Chollet, creator of Keras and the ARC AGI benchmark, and Mike Knoop, co-founder of Zapier and now co-founder with Francois of both the ARC Prize and the research lab Ndea. With the launch of ARC Prize 2025 and ARC-AGI 2, they explain why existing LLMs fall short on true intelligence tests, how new models like O3 mark a step change in capabilities, and what it will really take to reach AGI.

We cover everything from the technical evolution of ARC 1 to ARC 2, the shift toward test-time reasoning, and the role of program synthesis as a foundation for more general intelligence. The conversation also explores the philosophical underpinnings of intelligence, the structure of the ARC Prize, and the motivation behind launching Ndea — a ew AGI research lab that aims to build a "factory for rapid scientific advancement." Whether you're deep in the AI research trenches or just fascinated by where this is all headed, this episode offers clarity and inspiration.

Ndea

Website - https://ndea.com

X/Twitter - https://x.com/ndea

ARC Prize

Website - https://arcprize.org

X/Twitter - https://x.com/arcprize

François Chollet

LinkedIn - https://www.linkedin.com/in/fchollet

X/Twitter - https://x.com/fchollet

Mike Knoop

X/Twitter - https://x.com/mikeknoop

FIRSTMARK

Website - https://firstmark.com

X/Twitter - https://twitter.com/FirstMarkCap

Matt Turck (Managing Director)

LinkedIn - https://www.linkedin.com/in/turck/

X/Twitter - https://twitter.com/mattturck

(00:00) Intro

(01:05) Introduction to ARC Prize 2025 and ARC-AGI 2

(02:07) What is ARC and how it differs from other AI benchmarks

(02:54) Why current models struggle with fluid intelligence

(03:52) Shift from static LLMs to test-time adaptation

(04:19) What ARC measures vs. traditional benchmarks

(07:52) Limitations of brute-force scaling in LLMs

(13:31) Defining intelligence: adaptation and efficiency

(16:19) How O3 achieved a massive leap in ARC performance

(20:35) Speculation on O3's architecture and test-time search

(22:48) Program synthesis: what it is and why it matters

(28:28) Combining LLMs with search and synthesis techniques

(34:57) The ARC Prize structure: efficiency track, private vs. public

(42:03) Open source as a requirement for progress

(44:59) What's new in ARC-AGI 2 and human benchmark testing

(48:14) Capabilities ARC-AGI 2 is designed to test

(49:21) When will ARC-AGI 2 be saturated? AGI timelines

(52:25) Founding of NDEA and why now

(54:19) Vision beyond AGI: a factory for scientific advancement

(56:40) What NDEA is building and why it's different from LLM labs

(58:32) Hiring and remote-first culture at NDEA

(59:52) Closing thoughts and the future of AI research

  continue reading

79 episodes

Artwork
iconShare
 
Manage episode 474925064 series 3611124
Content provided by Matt Turck. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Matt Turck or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

In this fascinating episode, we dive deep into the race towards true AI intelligence, AGI benchmarks, test-time adaptation, and program synthesis with star AI researcher (and philosopher) Francois Chollet, creator of Keras and the ARC AGI benchmark, and Mike Knoop, co-founder of Zapier and now co-founder with Francois of both the ARC Prize and the research lab Ndea. With the launch of ARC Prize 2025 and ARC-AGI 2, they explain why existing LLMs fall short on true intelligence tests, how new models like O3 mark a step change in capabilities, and what it will really take to reach AGI.

We cover everything from the technical evolution of ARC 1 to ARC 2, the shift toward test-time reasoning, and the role of program synthesis as a foundation for more general intelligence. The conversation also explores the philosophical underpinnings of intelligence, the structure of the ARC Prize, and the motivation behind launching Ndea — a ew AGI research lab that aims to build a "factory for rapid scientific advancement." Whether you're deep in the AI research trenches or just fascinated by where this is all headed, this episode offers clarity and inspiration.

Ndea

Website - https://ndea.com

X/Twitter - https://x.com/ndea

ARC Prize

Website - https://arcprize.org

X/Twitter - https://x.com/arcprize

François Chollet

LinkedIn - https://www.linkedin.com/in/fchollet

X/Twitter - https://x.com/fchollet

Mike Knoop

X/Twitter - https://x.com/mikeknoop

FIRSTMARK

Website - https://firstmark.com

X/Twitter - https://twitter.com/FirstMarkCap

Matt Turck (Managing Director)

LinkedIn - https://www.linkedin.com/in/turck/

X/Twitter - https://twitter.com/mattturck

(00:00) Intro

(01:05) Introduction to ARC Prize 2025 and ARC-AGI 2

(02:07) What is ARC and how it differs from other AI benchmarks

(02:54) Why current models struggle with fluid intelligence

(03:52) Shift from static LLMs to test-time adaptation

(04:19) What ARC measures vs. traditional benchmarks

(07:52) Limitations of brute-force scaling in LLMs

(13:31) Defining intelligence: adaptation and efficiency

(16:19) How O3 achieved a massive leap in ARC performance

(20:35) Speculation on O3's architecture and test-time search

(22:48) Program synthesis: what it is and why it matters

(28:28) Combining LLMs with search and synthesis techniques

(34:57) The ARC Prize structure: efficiency track, private vs. public

(42:03) Open source as a requirement for progress

(44:59) What's new in ARC-AGI 2 and human benchmark testing

(48:14) Capabilities ARC-AGI 2 is designed to test

(49:21) When will ARC-AGI 2 be saturated? AGI timelines

(52:25) Founding of NDEA and why now

(54:19) Vision beyond AGI: a factory for scientific advancement

(56:40) What NDEA is building and why it's different from LLM labs

(58:32) Hiring and remote-first culture at NDEA

(59:52) Closing thoughts and the future of AI research

  continue reading

79 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide

Listen to this show while you explore
Play