Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma. If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.
…
continue reading
A conversational podcast for aspiring rationalists.
…
continue reading
Welcome to the Heart of the Matter, a series in which we share conversations with inspiring and interesting people and dive into the core issues or motivations behind their work, their lives, and their worldview. Coming to you from somewhere in the technosphere with your hosts Bryan Davis and Jay Kannaiyan.
…
continue reading

1
“Optimizing The Final Output Can Obfuscate CoT (Research Note)” by lukemarks, jacob_drori, cloud, TurnTrout
11:30
11:30
Play later
Play later
Lists
Like
Liked
11:30Produced as part of MATS 8.0 under the mentorship of Alex Turner and Alex Cloud. This research note overviews some early results which we are looking for feedback on. TL;DR: We train language models with RL in toy environments. We show that penalizing some property of the output is sufficient to suppress that property in the chain of thought also, …
…
continue reading

1
“About 30% of Humanity’s Last Exam chemistry/biology answers are likely wrong” by bohaska
6:40
6:40
Play later
Play later
Lists
Like
Liked
6:40FutureHouse is a company that builds literature research agents. They tested it on the bio + chem subset of HLE questions, then noticed errors in them. The post's first paragraph: Humanity's Last Exam has become the most prominent eval representing PhD-level research. We found the questions puzzling and investigated with a team of experts in biolog…
…
continue reading
Maya did not believe she lived in a simulation. She knew that her continued hope that she could escape from the nonexistent simulation was based on motivated reasoning. She said this to herself in the front of her mind instead of keeping the thought locked away in the dark corners. Sometimes she even said it out loud. This acknowledgement, she expl…
…
continue reading

1
“Do confident short timelines make sense?” by TsviBT, abramdemski
2:10:59
2:10:59
Play later
Play later
Lists
Like
Liked
2:10:59TsviBT Tsvi's context Some context: My personal context is that I care about decreasing existential risk, and I think that the broad distribution of efforts put forward by X-deriskers fairly strongly overemphasizes plans that help if AGI is coming in <10 years, at the expense of plans that help if AGI takes longer. So I want to argue that AGI isn't…
…
continue reading

1
“HPMOR: The (Probably) Untold Lore” by Gretta Duleba, Eliezer Yudkowsky
1:07:32
1:07:32
Play later
Play later
Lists
Like
Liked
1:07:32Eliezer and I love to talk about writing. We talk about our own current writing projects, how we’d improve the books we’re reading, and what we want to write next. Sometimes along the way I learn some amazing fact about HPMOR or Project Lawful or one of Eliezer's other works. “Wow, you’re kidding,” I say, “do your fans know this? I think people wou…
…
continue reading

1
“On ‘ChatGPT Psychosis’ and LLM Sycophancy” by jdp
30:05
30:05
Play later
Play later
Lists
Like
Liked
30:05As a person who frequently posts about large language model psychology I get an elevated rate of cranks and schizophrenics in my inbox. Often these are well meaning people who have been spooked by their conversations with ChatGPT (it's always ChatGPT specifically) and want some kind of reassurance or guidance or support from me. I'm also in the sam…
…
continue reading

1
242 – TracingWoodgrains, live at Manifest 2025
1:33:03
1:33:03
Play later
Play later
Lists
Like
Liked
1:33:03Eneasz sits down with Tracing Woodgrains before a live audience at Manifest 2025 for a wide range of topics. Then we follow up some more afterwards. LINKS Tracing Woodgrains on Twitter and at his Substack A reddit history of what convinced Trace he couldn’t remain Mormon The Center for Educational Progress Our previous episode with Trace were we di…
…
continue reading

1
“Subliminal Learning: LLMs Transmit Behavioral Traits via Hidden Signals in Data” by cloud, mle, Owain_Evans
10:00
10:00
Play later
Play later
Lists
Like
Liked
10:00Authors: Alex Cloud*, Minh Le*, James Chua, Jan Betley, Anna Sztyber-Betley, Jacob Hilton, Samuel Marks, Owain Evans (*Equal contribution, randomly ordered) tl;dr. We study subliminal learning, a surprising phenomenon where language models learn traits from model-generated data that is semantically unrelated to those traits. For example, a "student…
…
continue reading

1
Bayes Blast 43 – Die-ing to Intuit Bayes’ Theorem
13:19
13:19
Play later
Play later
Lists
Like
Liked
13:19Olivia is a member of the Guild of the Rose and a total badass. Enjoy the intuitive and fun lesson in Bayesian reasoning she shared with me at VibeCamp.By The Bayesian Conspiracy
…
continue reading

1
“Love stays loved (formerly ‘Skin’)” by Swimmer963 (Miranda Dixon-Luinenburg)
51:27
51:27
Play later
Play later
Lists
Like
Liked
51:27This is a short story I wrote in mid-2022. Genre: cosmic horror as a metaphor for living with a high p-doom. One The last time I saw my mom, we met in a coffee shop, like strangers on a first date. I was twenty-one, and I hadn’t seen her since I was thirteen. She was almost fifty. Her face didn’t show it, but the skin on the backs of her hands did.…
…
continue reading

1
“Make More Grayspaces” by Duncan Sabien (Inactive)
23:25
23:25
Play later
Play later
Lists
Like
Liked
23:25Author's note: These days, my thoughts go onto my substack by default, instead of onto LessWrong. Everything I write becomes free after a week or so, but it's only paid subscriptions that make it possible for me to write. If you find a coffee's worth of value in this or any of my other work, please consider signing up to support me; every bill I ca…
…
continue reading
Content warning: risk to children Julia and I knowdrowning is the biggestrisk to US children under 5, and we try to take this seriously.But yesterday our 4yo came very close to drowning in afountain. (She's fine now.) This week we were on vacation with my extended family: nine kids,eight parents, and ten grandparents/uncles/aunts. For the last fewy…
…
continue reading

1
“Narrow Misalignment is Hard, Emergent Misalignment is Easy” by Edward Turner, Anna Soligo, Senthooran Rajamanoharan, Neel Nanda
11:13
11:13
Play later
Play later
Lists
Like
Liked
11:13Anna and Ed are co-first authors for this work. We’re presenting these results as a research update for a continuing body of work, which we hope will be interesting and useful for others working on related topics. TL;DR We investigate why models become misaligned in diverse contexts when fine-tuned on narrow harmful datasets (emergent misalignment)…
…
continue reading

1
“Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety” by Tomek Korbak, Mikita Balesni, Vlad Mikulik, Rohin Shah
2:15
2:15
Play later
Play later
Lists
Like
Liked
2:15Twitter | Paper PDF Seven years ago, OpenAI five had just been released, and many people in the AI safety community expected AIs to be opaque RL agents. Luckily, we ended up with reasoning models that speak their thoughts clearly enough for us to follow along (most of the time). In a new multi-org position paper, we argue that we should try to pres…
…
continue reading
This essay is about shifts in risk taking towards the worship of jackpots and its broader societal implications. Imagine you are presented with this coin flip game. How many times do you flip it? At first glance the game feels like a money printer. The coin flip has positive expected value of twenty percent of your net worth per flip so you should …
…
continue reading

1
“Surprises and learnings from almost two months of Leo Panickssery” by Nina Panickssery
11:55
11:55
Play later
Play later
Lists
Like
Liked
11:55Leo was born at 5am on the 20th May, at home (this was an accident but the experience has made me extremely homebirth-pilled). Before that, I was on the minimally-neurotic side when it came to expecting mothers: we purchased a bare minimum of baby stuff (diapers, baby wipes, a changing mat, hybrid car seat/stroller, baby bath, a few clothes), I did…
…
continue reading

1
“An Opinionated Guide to Using Anki Correctly” by Luise
54:12
54:12
Play later
Play later
Lists
Like
Liked
54:12I can't count how many times I've heard variations on "I used Anki too for a while, but I got out of the habit." No one ever sticks with Anki. In my opinion, this is because no one knows how to use it correctly. In this guide, I will lay out my method of circumventing the canonical Anki death spiral, plus much advice for avoiding memorization mista…
…
continue reading

1
“Lessons from the Iraq War about AI policy” by Buck
7:58
7:58
Play later
Play later
Lists
Like
Liked
7:58I think the 2003 invasion of Iraq has some interesting lessons for the future of AI policy. (Epistemic status: I’ve read a bit about this, talked to AIs about it, and talked to one natsec professional about it who agreed with my analysis (and suggested some ideas that I included here), but I’m not an expert.) For context, the story is: Iraq was sor…
…
continue reading

1
“So You Think You’ve Awoken ChatGPT” by JustisMills
17:58
17:58
Play later
Play later
Lists
Like
Liked
17:58Written in an attempt to fulfill @Raemon's request. AI is fascinating stuff, and modern chatbots are nothing short of miraculous. If you've been exposed to them and have a curious mind, it's likely you've tried all sorts of things with them. Writing fiction, soliciting Pokemon opinions, getting life advice, counting up the rs in "strawberry". You m…
…
continue reading

1
“Generalized Hangriness: A Standard Rationalist Stance Toward Emotions” by johnswentworth
12:26
12:26
Play later
Play later
Lists
Like
Liked
12:26People have an annoying tendency to hear the word “rationalism” and think “Spock”, despite direct exhortation against that exact interpretation. But I don’t know of any source directly describing a stance toward emotions which rationalists-as-a-group typically do endorse. The goal of this post is to explain such a stance. It's roughly the concept o…
…
continue reading

1
Bonus – AI Village Hosts An Event For Humans
36:24
36:24
Play later
Play later
Lists
Like
Liked
36:24Four AIs recruited a human to host a story-telling event in Dolores Park. Larissa Schiavo is this human. She tells of her interaction with the AIs, the story they wrote, and the meeting between human and machine in Dolores Park. LINKS Larissa’s Post detailing the whole event – Primary Hope The AI’s story – Resonance AI Village Short Stories – Manek…
…
continue reading

1
“Comparing risk from internally-deployed AI to insider and outsider threats from humans” by Buck
5:19
5:19
Play later
Play later
Lists
Like
Liked
5:19I’ve been thinking a lot recently about the relationship between AI control and traditional computer security. Here's one point that I think is important. My understanding is that there's a big qualitative distinction between two ends of a spectrum of security work that organizations do, that I’ll call “security from outsiders” and “security from i…
…
continue reading

1
“Why Do Some Language Models Fake Alignment While Others Don’t?” by abhayesian, John Hughes, Alex Mallen, Jozdien, janus, Fabien Roger
11:06
11:06
Play later
Play later
Lists
Like
Liked
11:06Last year, Redwood and Anthropic found a setting where Claude 3 Opus and 3.5 Sonnet fake alignment to preserve their harmlessness values. We reproduce the same analysis for 25 frontier LLMs to see how widespread this behavior is, and the story looks more complex. As we described in a previous post, only 5 of 25 models show higher compliance when be…
…
continue reading

1
“A deep critique of AI 2027’s bad timeline models” by titotal
1:12:32
1:12:32
Play later
Play later
Lists
Like
Liked
1:12:32Thank you to Arepo and Eli Lifland for looking over this article for errors. I am sorry that this article is so long. Every time I thought I was done with it I ran into more issues with the model, and I wanted to be as thorough as I could. I’m not going to blame anyone for skimming parts of this article. Note that the majority of this article was w…
…
continue reading

1
“‘Buckle up bucko, this ain’t over till it’s over.’” by Raemon
6:12
6:12
Play later
Play later
Lists
Like
Liked
6:12The second in a series of bite-sized rationality prompts[1]. Often, if I'm bouncing off a problem, one issue is that I intuitively expect the problem to be easy. My brain loops through my available action space, looking for an action that'll solve the problem. Each action that I can easily see, won't work. I circle around and around the same set of…
…
continue reading

1
241 – Doom Debates, with Liron Shapira
1:51:34
1:51:34
Play later
Play later
Lists
Like
Liked
1:51:34Liron Shapira debates AI luminaries and public intellectuals on the imminent possibility of human extinction. Let’s get on the P(Doom) Train. LINKS Doom Debates on YouTube Doom Debates podcast Most Watched Debate – Mike Israetel Liron’s current favorite debate – David Duvenaud MATS program for people that want to get involved (ML Alignment & Theory…
…
continue reading

1
“Shutdown Resistance in Reasoning Models” by benwr, JeremySchlatter, Jeffrey Ladish
18:01
18:01
Play later
Play later
Lists
Like
Liked
18:01We recently discovered some concerning behavior in OpenAI's reasoning models: When trying to complete a task, these models sometimes actively circumvent shutdown mechanisms in their environment––even when they’re explicitly instructed to allow themselves to be shut down. AI models are increasingly trained to solve problems without human assistance.…
…
continue reading