#66 – Michael Cohen On Input Tampering In Advanced RL Agents Hear This Idea podcast

Artwork

Science Society Social Sciences Philosophy Fin Moorhouse and Luca Righetti Fin Moorhouse Luca Righetti

Content provided by Fin Moorhouse and Luca Righetti, Fin Moorhouse, and Luca Righetti. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Fin Moorhouse and Luca Righetti, Fin Moorhouse, and Luca Righetti or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Hear This Idea « »
#66 – Michael Cohen on Input Tampering in Advanced RL Agents

2y ago 2:32:00

Share

MP3•Episode home

Content provided by Fin Moorhouse and Luca Righetti, Fin Moorhouse, and Luca Righetti. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Fin Moorhouse and Luca Righetti, Fin Moorhouse, and Luca Righetti or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Michael Cohen is is a DPhil student at the University of Oxford with Mike Osborne. He will be starting a postdoc with Professor Stuart Russell at UC Berkeley, with the Center for Human-Compatible AI. His research considers the expected behaviour of generally intelligent artificial agents, with a view to designing agents that we can expect to behave safely.

You can see more links and a full transcript at www.hearthisidea.com/episodes/cohen.

We discuss:

What is reinforcement learning, and how is it different from supervised and unsupervised learning?
Michael's recently co-authored paper titled 'Advanced artificial agents intervene in the provision of reward'
Why might it be hard to convey what we really want to RL learners — even when we know exactly what we want?
Why might advanced RL systems might tamper with their sources of input, and why could this be very bad?
What assumptions need to hold for this "input tampering" outcome?
Is reward really the optimisation target? Do models "get reward"?
What's wrong with the analogy between RL systems and evolution?

Key links:

Michael's personal website
'Advanced artificial agents intervene in the provision of reward' by Michael K. Cohen, Marcus Hutter, and Michael A. Osborne
'Pessimism About Unknown Unknowns Inspires Conservatism' by Michael Cohen and Marcus Hutter
'Intelligence and Unambitiousness Using Algorithmic Information Theory' by Michael Cohen, Badri Vallambi, and Marcus Hutter
'Quantilizers: A Safer Alternative to Maximizers for Limited Optimization' by Jessica Taylor
'RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning' by Marc Rigter, Bruno Lacerda, and Nick Hawes
'Quantilizers: A Safer Alternative to Maximizers for Limited Optimization' by Jessica Taylor
Season 40 of Survivor

… continue reading

90 episodes

#Science #Society #Social Sciences #Philosophy #Fin Moorhouse and Luca Righetti #Fin Moorhouse #Luca Righetti

Artwork

#66 – Michael Cohen on Input Tampering in Advanced RL Agents

36 subscribers

published 2y ago

Share

MP3•Episode home

Content provided by Fin Moorhouse and Luca Righetti, Fin Moorhouse, and Luca Righetti. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Fin Moorhouse and Luca Righetti, Fin Moorhouse, and Luca Righetti or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Michael Cohen is is a DPhil student at the University of Oxford with Mike Osborne. He will be starting a postdoc with Professor Stuart Russell at UC Berkeley, with the Center for Human-Compatible AI. His research considers the expected behaviour of generally intelligent artificial agents, with a view to designing agents that we can expect to behave safely.

You can see more links and a full transcript at www.hearthisidea.com/episodes/cohen.

We discuss:

What is reinforcement learning, and how is it different from supervised and unsupervised learning?
Michael's recently co-authored paper titled 'Advanced artificial agents intervene in the provision of reward'
Why might it be hard to convey what we really want to RL learners — even when we know exactly what we want?
Why might advanced RL systems might tamper with their sources of input, and why could this be very bad?
What assumptions need to hold for this "input tampering" outcome?
Is reward really the optimisation target? Do models "get reward"?
What's wrong with the analogy between RL systems and evolution?

Key links:

Michael's personal website
'Advanced artificial agents intervene in the provision of reward' by Michael K. Cohen, Marcus Hutter, and Michael A. Osborne
'Pessimism About Unknown Unknowns Inspires Conservatism' by Michael Cohen and Marcus Hutter
'Intelligence and Unambitiousness Using Algorithmic Information Theory' by Michael Cohen, Badri Vallambi, and Marcus Hutter
'Quantilizers: A Safer Alternative to Maximizers for Limited Optimization' by Jessica Taylor
'RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning' by Marc Rigter, Bruno Lacerda, and Nick Hawes
'Quantilizers: A Safer Alternative to Maximizers for Limited Optimization' by Jessica Taylor
Season 40 of Survivor

… continue reading

90 episodes

#Science #Society #Social Sciences #Philosophy #Fin Moorhouse and Luca Righetti #Fin Moorhouse #Luca Righetti

All episodes

×

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

Listen to 500+ topics

Quick Reference Guide

Top Podcasts

The Bill Simmons Podcast

Comedy of the Week

How Did This Get Made?

Doug Loves Movies

TED Talks Daily

NBC Nightly News with Tom Llamas

The World This Hour

Daily Boost Motivation and Coaching

This American Life

Sword and Scale

Help/FAQ | Upgrade | Advertise

Arts|Business|Comedy|Economics|Entertainment|News|Politics|Religion

Science|Soccer|Sports|Storytelling|Technology|True Crime

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright

Listen to this show while you explore