Artwork

Content provided by Daniel Filan. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Daniel Filan or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

38.5 - Adrià Garriga-Alonso on Detecting AI Scheming

27:41
 
Share
 

Manage episode 462005711 series 2844728
Content provided by Daniel Filan. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Daniel Filan or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Suppose we're worried about AIs engaging in long-term plans that they don't tell us about. If we were to peek inside their brains, what should we look for to check whether this was happening? In this episode Adrià Garriga-Alonso talks about his work trying to answer this question.

Patreon: https://www.patreon.com/axrpodcast

Ko-fi: https://ko-fi.com/axrpodcast

Transcript: https://axrp.net/episode/2025/01/20/episode-38_5-adria-garriga-alonso-detecting-ai-scheming.html

FAR.AI: https://far.ai/

FAR.AI on X (aka Twitter): https://x.com/farairesearch

FAR.AI on YouTube: https://www.youtube.com/@FARAIResearch

The Alignment Workshop: https://www.alignment-workshop.com/

Topics we discuss, and timestamps:

01:04 - The Alignment Workshop

02:49 - How to detect scheming AIs

05:29 - Sokoban-solving networks taking time to think

12:18 - Model organisms of long-term planning

19:44 - How and why to study planning in networks

Links:

Adrià's website: https://agarri.ga/

An investigation of model-free planning: https://arxiv.org/abs/1901.03559

Model-Free Planning: https://tuphs28.github.io/projects/interpplanning/

Planning in a recurrent neural network that plays Sokoban: https://arxiv.org/abs/2407.15421

Episode art by Hamish Doodles: hamishdoodles.com

  continue reading

54 episodes

Artwork
iconShare
 
Manage episode 462005711 series 2844728
Content provided by Daniel Filan. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Daniel Filan or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Suppose we're worried about AIs engaging in long-term plans that they don't tell us about. If we were to peek inside their brains, what should we look for to check whether this was happening? In this episode Adrià Garriga-Alonso talks about his work trying to answer this question.

Patreon: https://www.patreon.com/axrpodcast

Ko-fi: https://ko-fi.com/axrpodcast

Transcript: https://axrp.net/episode/2025/01/20/episode-38_5-adria-garriga-alonso-detecting-ai-scheming.html

FAR.AI: https://far.ai/

FAR.AI on X (aka Twitter): https://x.com/farairesearch

FAR.AI on YouTube: https://www.youtube.com/@FARAIResearch

The Alignment Workshop: https://www.alignment-workshop.com/

Topics we discuss, and timestamps:

01:04 - The Alignment Workshop

02:49 - How to detect scheming AIs

05:29 - Sokoban-solving networks taking time to think

12:18 - Model organisms of long-term planning

19:44 - How and why to study planning in networks

Links:

Adrià's website: https://agarri.ga/

An investigation of model-free planning: https://arxiv.org/abs/1901.03559

Model-Free Planning: https://tuphs28.github.io/projects/interpplanning/

Planning in a recurrent neural network that plays Sokoban: https://arxiv.org/abs/2407.15421

Episode art by Hamish Doodles: hamishdoodles.com

  continue reading

54 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play