36 subscribers
Go offline with the Player FM app!
Podcasts Worth a Listen
SPONSORED


1 Finding Your Voice In the Healthcare System 30:52
#76 – Joe Carlsmith on Scheming AI
Manage episode 406789446 series 2607952
Joe Carlsmith is a writer, researcher, and philosopher. He works as a senior research analyst at Open Philanthropy, where he focuses on existential risk from advanced artificial intelligence. He also writes independently about various topics in philosophy and futurism, and holds a doctorate in philosophy from the University of Oxford.
You can find links and a transcript at www.hearthisidea.com/episodes/carlsmith
In this episode we talked about a report Joe recently authored, titled ‘Scheming AIs: Will AIs fake alignment during training in order to get power?’. The report “examines whether advanced AIs that perform well in training will be doing so in order to gain power later”; a behaviour Carlsmith calls scheming.
We talk about:
- Distinguishing ways AI systems can be deceptive and misaligned
- Why powerful AI systems might acquire goals that go beyond what they’re trained to do, and how those goals could lead to scheming
- Why scheming goals might perform better (or worse) in training than less worrying goals
- The ‘counting argument’ for scheming AI
- Why goals that lead to scheming might be simpler than the goals we intend
- Things Joe is still confused about, and research project ideas
You can get in touch through our website or on Twitter. Consider leaving us an honest review wherever you're listening to this — it's the best free way to support the show. Thanks for listening!
90 episodes
Manage episode 406789446 series 2607952
Joe Carlsmith is a writer, researcher, and philosopher. He works as a senior research analyst at Open Philanthropy, where he focuses on existential risk from advanced artificial intelligence. He also writes independently about various topics in philosophy and futurism, and holds a doctorate in philosophy from the University of Oxford.
You can find links and a transcript at www.hearthisidea.com/episodes/carlsmith
In this episode we talked about a report Joe recently authored, titled ‘Scheming AIs: Will AIs fake alignment during training in order to get power?’. The report “examines whether advanced AIs that perform well in training will be doing so in order to gain power later”; a behaviour Carlsmith calls scheming.
We talk about:
- Distinguishing ways AI systems can be deceptive and misaligned
- Why powerful AI systems might acquire goals that go beyond what they’re trained to do, and how those goals could lead to scheming
- Why scheming goals might perform better (or worse) in training than less worrying goals
- The ‘counting argument’ for scheming AI
- Why goals that lead to scheming might be simpler than the goals we intend
- Things Joe is still confused about, and research project ideas
You can get in touch through our website or on Twitter. Consider leaving us an honest review wherever you're listening to this — it's the best free way to support the show. Thanks for listening!
90 episodes
All episodes
×
1 #83 – Max Smeets on Barriers To Cyberweapons 1:36:19

1 #82 – Tom Kalil on Institutions for Innovation (with Matt Clancy) 1:17:37

1 #81 – Cynthia Schuck on Quantifying Animal Welfare 1:37:16

1 #80 – Dan Williams on How Persuasion Works 1:48:43

1 #79 – Tamay Besiroglu on Explosive Growth from AI 2:09:19

1 #78 – Jacob Trefethen on Global Health R&D 2:30:16

1 #77 – Elizabeth Seger on Open Sourcing AI 1:20:49


1 #75 – Eric Schwitzgebel on Digital Consciousness and the Weirdness of the World 1:58:50

1 #74 – Sonia Ben Ouagrham-Gormley on Barriers to Bioweapons 1:54:05

1 Bonus: 'How I Learned To Love Shrimp' & David Coman-Hidy 1:18:47

1 #73 – Michelle Lavery on the Science of Animal Welfare 1:27:35

1 #72 – Richard Bruns on Indoor Air Quality 1:47:33

1 #71 – Saloni Dattani on Malaria Vaccines and Missing Data in Global Health 2:52:57

1 #70 – Liv Boeree on Healthy vs Unhealthy Competition 1:40:11

1 #69 – Jon Y (Asianometry) on Problems And Progress in Semiconductor Manufacturing 1:46:50

1 #68 – Steven Teles on what the Conservative Legal Movement Teaches about Policy Advocacy 1:39:01

1 #67 – Guive Assadi on Whether Humanity Will Choose Its Future 2:00:07

1 #66 – Michael Cohen on Input Tampering in Advanced RL Agents 2:32:00

1 #65 – Katja Grace on Slowing Down AI and Whether the X-Risk Case Holds Up 1:43:43

1 #64 – Michael Aird on Strategies for Reducing AI Existential Risk 3:12:56


1 #62 – Anders Sandberg on Exploratory Engineering, Value Diversity, and Grand Futures 52:52

1 #61 – Rory Stewart on GiveDirectly and Massively Scaling Cash Transfers 1:00:00

1 #60 – Jaime Sevilla on Trends in Machine Learning 1:30:47

1 #59 – Chris Miller on the History of Semiconductors, TSMC, and the CHIPS Act 32:11

1 Bonus: Preventing an AI-Related Catastrophe 2:40:14

1 #58 – Carl Robichaud on Reducing the Risks of Nuclear War 3:41:33

1 Bonus: Damon Binder on Economic History and the Future of Physics 4:00:35

1 #57 – Greg Nemet on Technological Change and How Solar Became Cheap 1:49:19

1 #56 – Dewi Erwan on BlueDot Impact and Scaling High-Impact Organisations 2:40:49

1 #55 – Jassi Pannu and Joshua Monrad on Pandemic Preparedness 2:48:38


1 #50 – Doyne Farmer on Complexity and Predicting Technological Progress 1:26:04

1 #49 – Ajay Karpur on Metagenomic Sequencing 1:21:56

1 #48 – Spencer Weart on the Discovery of Global Warming 2:34:21

1 #47 – Jason Crawford on Progress Studies 1:50:02

1 #45 – Lord Bird on the UK Future Generations Bill 1:10:32

1 #44 – Sam Hilton on Charity Entrepreneurship, Exploratory Altruism, and Longtermist Policy 1:48:17

1 #43 – Glen Weyl on Pluralism, Radical Markets, and Social Technology 1:36:52

1 #42 – Habiba Islam on Planning a High-Impact Career and Ambitious Altruism 1:27:15
Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.