Artwork

Content provided by Rob Wiblin and Keiran Harris and The 80000 Hours team. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Rob Wiblin and Keiran Harris and The 80000 Hours team or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

Highlights: #217 – Beth Barnes on the most important graph in AI right now — and the 7-month rule that governs its progress

40:54
 
Share
 

Manage episode 491006615 series 3320433
Content provided by Rob Wiblin and Keiran Harris and The 80000 Hours team. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Rob Wiblin and Keiran Harris and The 80000 Hours team or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

AI models today have a 50% chance of successfully completing a task that would take an expert human one hour. Seven months ago, that number was roughly 30 minutes — and seven months before that, 15 minutes.

These are substantial, multi-step tasks requiring sustained focus: building web applications, conducting machine learning research, or solving complex programming challenges.

Today’s guest, Beth Barnes, is CEO of METR (Model Evaluation & Threat Research) — the leading organisation measuring these capabilities.

These highlights are from episode #217 of The 80,000 Hours Podcast: Beth Barnes on the most important graph in AI right now — and the 7-month rule that governs its progress, and include:

  • Can we see AI scheming in the chain of thought? (00:00:34)
  • We have to test model honesty even before they're used inside AI companies (00:05:48)
  • It's essential to thoroughly test relevant real-world tasks (00:10:13)
  • Recursively self-improving AI might even be here in two years — which is alarming (00:16:09)
  • Do we need external auditors doing AI safety tests, not just the companies themselves? (00:21:55)
  • A case against safety-focused people working at frontier AI companies (00:29:30)
  • Open-weighting models is often good, and Beth has changed her attitude about it (00:34:57)

These aren't necessarily the most important or even most entertaining parts of the interview — so if you enjoy this, we strongly recommend checking out the full episode!

And if you're finding these highlights episodes valuable, please let us know by emailing [email protected].

Highlights put together by Ben Cordell, Milo McGuire, and Dominic Armstrong

  continue reading

109 episodes

Artwork
iconShare
 
Manage episode 491006615 series 3320433
Content provided by Rob Wiblin and Keiran Harris and The 80000 Hours team. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Rob Wiblin and Keiran Harris and The 80000 Hours team or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

AI models today have a 50% chance of successfully completing a task that would take an expert human one hour. Seven months ago, that number was roughly 30 minutes — and seven months before that, 15 minutes.

These are substantial, multi-step tasks requiring sustained focus: building web applications, conducting machine learning research, or solving complex programming challenges.

Today’s guest, Beth Barnes, is CEO of METR (Model Evaluation & Threat Research) — the leading organisation measuring these capabilities.

These highlights are from episode #217 of The 80,000 Hours Podcast: Beth Barnes on the most important graph in AI right now — and the 7-month rule that governs its progress, and include:

  • Can we see AI scheming in the chain of thought? (00:00:34)
  • We have to test model honesty even before they're used inside AI companies (00:05:48)
  • It's essential to thoroughly test relevant real-world tasks (00:10:13)
  • Recursively self-improving AI might even be here in two years — which is alarming (00:16:09)
  • Do we need external auditors doing AI safety tests, not just the companies themselves? (00:21:55)
  • A case against safety-focused people working at frontier AI companies (00:29:30)
  • Open-weighting models is often good, and Beth has changed her attitude about it (00:34:57)

These aren't necessarily the most important or even most entertaining parts of the interview — so if you enjoy this, we strongly recommend checking out the full episode!

And if you're finding these highlights episodes valuable, please let us know by emailing [email protected].

Highlights put together by Ben Cordell, Milo McGuire, and Dominic Armstrong

  continue reading

109 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play