84 subscribers
Go offline with the Player FM app!
Highlights: #214 – Buck Shlegeris on controlling AI that wants to take over – so we can use it anyway
Manage episode 477728641 series 3320433
Most AI safety conversations centre on alignment: ensuring AI systems share our values and goals. But despite progress, we’re unlikely to know we’ve solved the problem before the arrival of human-level and superhuman systems in as little as three years.
So some — including Buck Shlegeris, CEO of Redwood Research — are developing a backup plan to safely deploy models we fear are actively scheming to harm us: so-called “AI control.” While this may sound mad, given the reluctance of AI companies to delay deploying anything they train, not developing such techniques is probably even crazier.
These highlights are from episode #214 of The 80,000 Hours Podcast: Buck Shlegeris on controlling AI that wants to take over – so we can use it anyway, and include:
- What is AI control? (00:00:15)
- One way to catch AIs that are up to no good (00:07:00)
- What do we do once we catch a model trying to escape? (00:13:39)
- Team Human vs Team AI (00:18:24)
- If an AI escapes, is it likely to be able to beat humanity from there? (00:24:59)
- Is alignment still useful? (00:32:10)
- Could 10 safety-focused people in an AGI company do anything useful? (00:35:34)
These aren't necessarily the most important or even most entertaining parts of the interview — so if you enjoy this, we strongly recommend checking out the full episode!
And if you're finding these highlights episodes valuable, please let us know by emailing podcast@80000hours.org.
Highlights put together by Ben Cordell, Milo McGuire, and Dominic Armstrong
107 episodes
Manage episode 477728641 series 3320433
Most AI safety conversations centre on alignment: ensuring AI systems share our values and goals. But despite progress, we’re unlikely to know we’ve solved the problem before the arrival of human-level and superhuman systems in as little as three years.
So some — including Buck Shlegeris, CEO of Redwood Research — are developing a backup plan to safely deploy models we fear are actively scheming to harm us: so-called “AI control.” While this may sound mad, given the reluctance of AI companies to delay deploying anything they train, not developing such techniques is probably even crazier.
These highlights are from episode #214 of The 80,000 Hours Podcast: Buck Shlegeris on controlling AI that wants to take over – so we can use it anyway, and include:
- What is AI control? (00:00:15)
- One way to catch AIs that are up to no good (00:07:00)
- What do we do once we catch a model trying to escape? (00:13:39)
- Team Human vs Team AI (00:18:24)
- If an AI escapes, is it likely to be able to beat humanity from there? (00:24:59)
- Is alignment still useful? (00:32:10)
- Could 10 safety-focused people in an AGI company do anything useful? (00:35:34)
These aren't necessarily the most important or even most entertaining parts of the interview — so if you enjoy this, we strongly recommend checking out the full episode!
And if you're finding these highlights episodes valuable, please let us know by emailing podcast@80000hours.org.
Highlights put together by Ben Cordell, Milo McGuire, and Dominic Armstrong
107 episodes
All episodes
×
1 Highlights: #215 – Tom Davidson on how AI-enabled coups could allow a tiny group to seize power 37:19

1 Highlights: #214 – Buck Shlegeris on controlling AI that wants to take over – so we can use it anyway 41:26

1 Off the Clock #8: Leaving Las London with Matt Reardon 1:43:21

1 Highlights: #213 – Will MacAskill on AI causing a “century in a decade” — and how we’re completely unprepared 33:35

1 Highlights: #212 – Allan Dafoe on why technology is unstoppable & how to shape AI development anyway 29:21

1 Off the Clock #7: Getting on the Crazy Train with Chi Nguyen 1:24:27

1 Highlights: #211 – Sam Bowman on why housing still isn’t fixed and what would actually work 1:01:20

1 Highlights: #210 – Cameron Meyer Shorb on dismantling the myth that we can’t do anything to help wild animals 29:56

1 Highlights: #209 – Rose Chan Loui on OpenAI’s gambit to ditch its nonprofit 24:13

1 Highlights: #208 – Elizabeth Cox on the case that TV shows, movies, and novels can improve the world 29:15

1 Highlights: #207 – Sarah Eustis-Guthrie on why she shut down her charity, and why more founders should follow her lead 22:31

1 Highlights: #206 – Anil Seth on the predictive brain and how to study consciousness 19:37

1 Highlights: #205 – Sébastien Moro on the most insane things fish can do 30:55

1 Highlights: #204 – Nate Silver on making sense of SBF, and his biggest critiques of effective altruism 19:20

1 Highlights: Luisa and Keiran on free will, and the consequences of never feeling enduring guilt or shame 13:15
Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.