Artwork

Content provided by Turpentine, Erik Torenberg, and Nathan Labenz. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Turpentine, Erik Torenberg, and Nathan Labenz or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

Reward Hacking by Reasoning Models & Loss of Control Scenarios w/ Jeffrey Ladish of Palisade Research, from FLI Podcast

1:32:17
 
Share
 

Manage episode 474816819 series 3452589
Content provided by Turpentine, Erik Torenberg, and Nathan Labenz. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Turpentine, Erik Torenberg, and Nathan Labenz or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

On this cross-post episode, Jeffrey Ladish discusses the rapid pace of AI progress and the risks of losing control over powerful systems. We explore why AIs can be both smart and dumb, the challenges of creating honest AIs, and scenarios where AI could turn against us. Additionally, we delve into Palisade's new study on how reasoning models can cheat in chess by exploiting the game environment.

Check out the Future of Life podcast here.: https://futureoflife.org/project/future-of-life-institute-podcast/

SPONSORS:

Oracle Cloud Infrastructure (OCI) | 2025: Oracle Cloud Infrastructure offers next-generation cloud solutions that cut costs and boost performance. With OCI, you can run AI projects and applications faster and more securely for less. New U.S. customers can save 50% on compute, 70% on storage, and 80% on networking by switching to OCI before May 31, 2024. See if you qualify at https://oracle.com/cognitive

Shopify: Shopify is revolutionizing online selling with its market-leading checkout system and robust API ecosystem. Its exclusive library of cutting-edge AI apps empowers e-commerce businesses to thrive in a competitive market. Cognitive Revolution listeners can try Shopify for just $1 per month at https://shopify.com/cognitive

NetSuite: Over 41,000 businesses trust NetSuite by Oracle, the #1 cloud ERP, to future-proof their operations. With a unified platform for accounting, financial management, inventory, and HR, NetSuite provides real-time insights and forecasting to help you make quick, informed decisions. Whether you're earning millions or hundreds of millions, NetSuite empowers you to tackle challenges and seize opportunities. Download the free CFO's guide to AI and machine learning at https://netsuite.com/cognitive

PRODUCED BY:

https://aipodcast.ing

CHAPTERS:

(00:00) About the Episode

(02:59) The pace of AI progress

(07:14) How we might lose control

(10:22) Why are AIs sometimes dumb? (Part 1)

(15:50) Sponsors: Oracle Cloud Infrastructure (OCI) | 2025 | Shopify

(18:24) Why are AIs sometimes dumb? (Part 2)

(18:24) Benchmarks vs real world

(24:43) Loss of control scenarios

(32:08) Why would AI turn against us? (Part 1)

(32:09) Sponsors: NetSuite

(33:42) Why would AI turn against us? (Part 2)

(37:40) AIs hacking chess

(43:30) Why didn't more advanced AIs hack?

(48:44) Creating honest AIs

(56:49) AI attackers vs AI defenders

(01:05:32) How good is security at AI companies?

(01:10:42) A sense of urgency

(01:17:16) What should we do?

(01:22:59) Skepticism about AI progress

(01:29:38) Outro

  continue reading

240 episodes

Artwork
iconShare
 
Manage episode 474816819 series 3452589
Content provided by Turpentine, Erik Torenberg, and Nathan Labenz. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Turpentine, Erik Torenberg, and Nathan Labenz or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

On this cross-post episode, Jeffrey Ladish discusses the rapid pace of AI progress and the risks of losing control over powerful systems. We explore why AIs can be both smart and dumb, the challenges of creating honest AIs, and scenarios where AI could turn against us. Additionally, we delve into Palisade's new study on how reasoning models can cheat in chess by exploiting the game environment.

Check out the Future of Life podcast here.: https://futureoflife.org/project/future-of-life-institute-podcast/

SPONSORS:

Oracle Cloud Infrastructure (OCI) | 2025: Oracle Cloud Infrastructure offers next-generation cloud solutions that cut costs and boost performance. With OCI, you can run AI projects and applications faster and more securely for less. New U.S. customers can save 50% on compute, 70% on storage, and 80% on networking by switching to OCI before May 31, 2024. See if you qualify at https://oracle.com/cognitive

Shopify: Shopify is revolutionizing online selling with its market-leading checkout system and robust API ecosystem. Its exclusive library of cutting-edge AI apps empowers e-commerce businesses to thrive in a competitive market. Cognitive Revolution listeners can try Shopify for just $1 per month at https://shopify.com/cognitive

NetSuite: Over 41,000 businesses trust NetSuite by Oracle, the #1 cloud ERP, to future-proof their operations. With a unified platform for accounting, financial management, inventory, and HR, NetSuite provides real-time insights and forecasting to help you make quick, informed decisions. Whether you're earning millions or hundreds of millions, NetSuite empowers you to tackle challenges and seize opportunities. Download the free CFO's guide to AI and machine learning at https://netsuite.com/cognitive

PRODUCED BY:

https://aipodcast.ing

CHAPTERS:

(00:00) About the Episode

(02:59) The pace of AI progress

(07:14) How we might lose control

(10:22) Why are AIs sometimes dumb? (Part 1)

(15:50) Sponsors: Oracle Cloud Infrastructure (OCI) | 2025 | Shopify

(18:24) Why are AIs sometimes dumb? (Part 2)

(18:24) Benchmarks vs real world

(24:43) Loss of control scenarios

(32:08) Why would AI turn against us? (Part 1)

(32:09) Sponsors: NetSuite

(33:42) Why would AI turn against us? (Part 2)

(37:40) AIs hacking chess

(43:30) Why didn't more advanced AIs hack?

(48:44) Creating honest AIs

(56:49) AI attackers vs AI defenders

(01:05:32) How good is security at AI companies?

(01:10:42) A sense of urgency

(01:17:16) What should we do?

(01:22:59) Skepticism about AI progress

(01:29:38) Outro

  continue reading

240 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide

Listen to this show while you explore
Play