To give you the best possible experience, this site uses cookies. Review our Privacy Policy and Terms of Service to learn more.
Got it!
Player FM - Internet Radio Done Right
12 subscribers
Checked 1d ago
Added three years ago
Content provided by LessWrong. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by LessWrong or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.
Player FM - Podcast App Go offline with the Player FM app!
Are you tired of releasing episodes week to week and getting no download growth? In this episode, I’m kicking off a brand-new series where I’ll be breaking down the exact strategies you need to expand your podcast audience—starting with the essential foundations. In this first episode, I’m covering: Why podcast growth matters (and why it’s NOT just about big numbers) The 3 core growth strategies: organic, collaborations, and paid growth What sustainable, realistic growth actually looks like Whether you’re just getting started or looking to scale, this series will give you the tools you need to grow your show strategically. Today's episode is brought to you by Mic Check Society , our community for podcasters who are looking to take their podcast from good to great. Come join us for educational trainings, a private member's only community, and monthly calls! Get $10 off per month with code PODCAST at micchecksociety.com . Clocking In with Haylee Gaffin is produced by Gaffin Creative , a podcast production company for creative entrepreneurs. Learn more about our services at Gaffincreative.com , plus you’ll also find resources, show notes, and more for the Clocking In Podcast. Time-stamps: Why podcast growth matters (2:09) Three pillars of podcast growth (3:39) Organic growth (3:52) Collaboration and borrowing audiences (4:31) Paid growth opportunities (5:11) What sustainable growth looks like (6:36) Connect with Haylee: instagram.com/hayleegaffin Gaffincreative.com micchecksociety.com Review the Transcript: https://share.descript.com/view/Plzue2YOIAh Hosted on Acast. See acast.com/privacy for more information.…
Content provided by LessWrong. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by LessWrong or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.
[This is our blog post on the papers, which can be found at https://transformer-circuits.pub/2025/attribution-graphs/biology.html and https://transformer-circuits.pub/2025/attribution-graphs/methods.html.] Language models like Claude aren't programmed directly by humans—instead, they‘re trained on large amounts of data. During that training process, they learn their own strategies to solve problems. These strategies are encoded in the billions of computations a model performs for every word it writes. They arrive inscrutable to us, the model's developers. This means that we don’t understand how models do most of the things they do. Knowing how models like Claude think would allow us to have a better understanding of their abilities, as well as help us ensure that they’re doing what we intend them to. For example:
Claude can speak dozens of languages. What language, if any, is it using "in its head"?
Claude writes text one word at a time. Is it only focusing on predicting the [...]
Content provided by LessWrong. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by LessWrong or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.
[This is our blog post on the papers, which can be found at https://transformer-circuits.pub/2025/attribution-graphs/biology.html and https://transformer-circuits.pub/2025/attribution-graphs/methods.html.] Language models like Claude aren't programmed directly by humans—instead, they‘re trained on large amounts of data. During that training process, they learn their own strategies to solve problems. These strategies are encoded in the billions of computations a model performs for every word it writes. They arrive inscrutable to us, the model's developers. This means that we don’t understand how models do most of the things they do. Knowing how models like Claude think would allow us to have a better understanding of their abilities, as well as help us ensure that they’re doing what we intend them to. For example:
Claude can speak dozens of languages. What language, if any, is it using "in its head"?
Claude writes text one word at a time. Is it only focusing on predicting the [...]
Epistemic status: thing people have told me that seems right. Also primarily relevant to US audiences. Also I am speaking in my personal capacity and not representing any employer, present or past. Sometimes, I talk to people who work in the AI governance space. One thing that multiple people have told me, which I found surprising, is that there is apparently a real problem where people accidentally rule themselves out of AI policy positions by making political donations of small amounts—in particular, under $10. My understanding is that in the United States, donations to political candidates are a matter of public record, and that if you donate to candidates of one party, this might look bad if you want to gain a government position when another party is in charge. Therefore, donating approximately $3 can significantly damage your career, while not helping your preferred candidate all that [...] --- First published: May 11th, 2025 Source: https://www.lesswrong.com/posts/tz43dmLAchxcqnDRA/consider-not-donating-under-usd100-to-political-candidates --- Narrated by TYPE III AUDIO .…
"If you kiss your child, or your wife, say that you only kiss things which are human, and thus you will not be disturbed if either of them dies." - Epictetus "Whatever suffering arises, all arises due to attachment; with the cessation of attachment, there is the cessation of suffering." - Pali canon "He is not disturbed by loss, he does not delight in gain; he is not disturbed by blame, he does not delight in praise; he is not disturbed by pain, he does not delight in pleasure; he is not disturbed by dishonor, he does not delight in honor." - Pali Canon (Majjhima Nikaya) "An arahant would feel physical pain if struck, but no mental pain. If his mother died, he would organize the funeral, but would feel no grief, no sense of loss." - the Dhammapada "Receive without pride, let go without attachment." - Marcus Aurelius [...] --- First published: May 10th, 2025 Source: https://www.lesswrong.com/posts/aGnRcBk4rYuZqENug/it-s-okay-to-feel-bad-for-a-bit --- Narrated by TYPE III AUDIO .…
The other day I discussed how high monitoring costs can explain the emergence of “aristocratic” systems of governance: Aristocracy and Hostage Capital Arjun Panickssery · Jan 8 There's a conventional narrative by which the pre-20th century aristocracy was the "old corruption" where civil and military positions were distributed inefficiently due to nepotism until the system was replaced by a professional civil service after more enlightened thinkers prevailed ... An element of Douglas Allen's argument that I didn’t expand on was the British Navy. He has a separate paper called “The British Navy Rules” that goes into more detail on why he thinks institutional incentives made them successful from 1670 and 1827 (i.e. for most of the age of fighting sail). In the Seven Years’ War (1756–1763) the British had a 7-to-1 casualty difference in single-ship actions. During the French Revolutionary and Napoleonic Wars (1793–1815) the British had a 5-to-1 [...] --- First published: March 28th, 2025 Source: https://www.lesswrong.com/posts/YE4XsvSFJiZkWFtFE/explaining-british-naval-dominance-during-the-age-of-sail --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
Eliezer and I wrote a book. It's titled If Anyone Builds It, Everyone Dies. Unlike a lot of other writing either of us have done, it's being professionally published. It's hitting shelves on September 16th. It's a concise (~60k word) book aimed at a broad audience. It's been well-received by people who received advance copies, with some endorsements including: The most important book I've read for years: I want to bring it to every political and corporate leader in the world and stand over them until they've read it. Yudkowsky and Soares, who have studied AI and its possible trajectories for decades, sound a loud trumpet call to humanity to awaken us as we sleepwalk into disaster. - Stephen Fry, actor, broadcaster, and writer If Anyone Builds It, Everyone Dies may prove to be the most important book of our time. Yudkowsky and Soares believe [...] The original text contained 1 footnote which was omitted from this narration. --- First published: May 14th, 2025 Source: https://www.lesswrong.com/posts/iNsy7MsbodCyNTwKs/eliezer-and-i-wrote-a-book-if-anyone-builds-it-everyone-dies --- Narrated by TYPE III AUDIO .…
It was a cold and cloudy San Francisco Sunday. My wife and I were having lunch with friends at a Korean cafe. My phone buzzed with a text. It said my mom was in the hospital. I called to find out more. She had a fever, some pain, and had fainted. The situation was serious, but stable. Monday was a normal day. No news was good news, right? Tuesday she had seizures. Wednesday she was in the ICU. I caught the first flight to Tampa. Thursday she rested comfortably. Friday she was diagnosed with bacterial meningitis, a rare condition that affects about 3,000 people in the US annually. The doctors had known it was a possibility, so she was already receiving treatment. We stayed by her side through the weekend. My dad spent every night with her. We made plans for all the fun things we would when she [...] --- First published: May 13th, 2025 Source: https://www.lesswrong.com/posts/reo79XwMKSZuBhKLv/too-soon --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
At the bottom of the LessWrong post editor, if you have at least 100 global karma, you may have noticed this button. The button Many people click the button, and are jumpscared when it starts an Intercom chat with a professional editor (me), asking what sort of feedback they'd like. So, that's what it does. It's a summon Justis button. Why summon Justis? To get feedback on your post, of just about any sort. Typo fixes, grammar checks, sanity checks, clarity checks, fit for LessWrong, the works. If you use the LessWrong editor (as opposed to the Markdown editor) I can leave comments and suggestions directly inline. I also provide detailed narrative feedback (unless you explicitly don't want this) in the Intercom chat itself. The feedback is totally without pressure. You can throw it all away, or just keep the bits you like. Or use it all! In any case [...] --- Outline: (00:35) Why summon Justis? (01:19) Why Justis in particular? (01:48) Am I doing it right? (01:59) How often can I request feedback? (02:22) Can I use the feature for linkposts/crossposts? (02:49) What if I click the button by mistake? (02:59) Should I credit you? (03:16) Couldnt I just use an LLM? (03:48) Why does Justis do this? --- First published: May 12th, 2025 Source: https://www.lesswrong.com/posts/bkDrfofLMKFoMGZkE/psa-the-lesswrong-feedback-service --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
For months, I had the feeling: something is wrong. Some core part of myself had gone missing. I had words and ideas cached, which pointed back to the missing part. There was the story of Benjamin Jesty, a dairy farmer who vaccinated his family against smallpox in 1774 - 20 years before the vaccination technique was popularized, and the same year King Louis XV of France died of the disease. There was another old post which declared “I don’t care that much about giant yachts. I want a cure for aging. I want weekend trips to the moon. I want flying cars and an indestructible body and tiny genetically-engineered dragons.”. There was a cached instinct to look at certain kinds of social incentive gradient, toward managing more people or growing an organization or playing social-political games, and say “no, it's a trap”. To go… in a different direction, orthogonal [...] --- Outline: (01:19) In Search of a Name (04:23) Near Mode --- First published: May 8th, 2025 Source: https://www.lesswrong.com/posts/Wg6ptgi2DupFuAnXG/orienting-toward-wizard-power --- Narrated by TYPE III AUDIO .…
(Disclaimer: Post written in a personal capacity. These are personal hot takes and do not in any way represent my employer's views.) TL;DR: I do not think we will produce high reliability methods to evaluate or monitor the safety of superintelligent systems via current research paradigms, with interpretability or otherwise. Interpretability seems a valuable tool here and remains worth investing in, as it will hopefully increase the reliability we can achieve. However, interpretability should be viewed as part of an overall portfolio of defences: a layer in a defence-in-depth strategy. It is not the one thing that will save us, and it still won’t be enough for high reliability. Introduction There's a common, often implicit, argument made in AI safety discussions: interpretability is presented as the only reliable path forward for detecting deception in advanced AI - among many other sources it was argued for in [...] --- Outline: (00:55) Introduction (02:57) High Reliability Seems Unattainable (05:12) Why Won't Interpretability be Reliable? (07:47) The Potential of Black-Box Methods (08:48) The Role of Interpretability (12:02) Conclusion The original text contained 5 footnotes which were omitted from this narration. --- First published: May 4th, 2025 Source: https://www.lesswrong.com/posts/PwnadG4BFjaER3MGf/interpretability-will-not-reliably-find-deceptive-ai --- Narrated by TYPE III AUDIO .…
It'll take until ~2050 to repeat the level of scaling that pretraining compute is experiencing this decade, as increasing funding can't sustain the current pace beyond ~2029 if AI doesn't deliver a transformative commercial success by then. Natural text data will also run out around that time, and there are signs that current methods of reasoning training might be mostly eliciting capabilities from the base model. If scaling of reasoning training doesn't bear out actual creation of new capabilities that are sufficiently general, and pretraining at ~2030 levels of compute together with the low hanging fruit of scaffolding doesn't bring AI to crucial capability thresholds, then it might take a while. Possibly decades, since training compute will be growing 3x-4x slower after 2027-2029 than it does now, and the ~6 years of scaling since the ChatGPT moment stretch to 20-25 subsequent years, not even having access to any [...] --- Outline: (01:14) Training Compute Slowdown (04:43) Bounded Potential of Thinking Training (07:43) Data Inefficiency of MoE The original text contained 4 footnotes which were omitted from this narration. --- First published: May 1st, 2025 Source: https://www.lesswrong.com/posts/XiMRyQcEyKCryST8T/slowdown-after-2028-compute-rlvr-uncertainty-moe-data-wall --- Narrated by TYPE III AUDIO .…
In this blog post, we analyse how the recent AI 2027 forecast by Daniel Kokotajlo, Scott Alexander, Thomas Larsen, Eli Lifland, and Romeo Dean has been discussed across Chinese language platforms. We present: Our research methodology and synthesis of key findings across media artefacts A proposal for how censorship patterns may provide signal for the Chinese government's thinking about AGI and the race to superintelligence A more detailed analysis of each of the nine artefacts, organised by type: Mainstream Media, Forum Discussion, Bilibili (Chinese Youtube) Videos, Personal Blogs. Methodology We conducted a comprehensive search across major Chinese-language platforms–including news outlets, video platforms, forums, microblogging sites, and personal blogs–to collect the media featured in this report. We supplemented this with Deep Research to identify additional sites mentioning AI 2027. Our analysis focuses primarily on content published in the first few days (4-7 April) following the report's release. More media [...] --- Outline: (00:58) Methodology (01:36) Summary (02:48) Censorship as Signal (07:29) Analysis (07:53) Mainstream Media (07:57) English Title: Doomsday Timeline is Here! Former OpenAI Researcher's 76-page Hardcore Simulation: ASI Takes Over the World in 2027, Humans Become NPCs (10:27) Forum Discussion (10:31) English Title: What do you think of former OpenAI researcher's AI 2027 predictions? (13:34) Bilibili Videos (13:38) English Title: \[AI 2027\] A mind-expanding wargame simulation of artificial intelligence competition by a former OpenAI researcher (15:24) English Title: Predicting AI Development in 2027 (17:13) Personal Blogs (17:16) English Title: Doomsday Timeline: AI 2027 Depicts the Arrival of Superintelligence and the Fate of Humanity Within the Decade (18:30) English Title: AI 2027: Expert Predictions on the Artificial Intelligence Explosion (21:57) English Title: AI 2027: A Science Fiction Article (23:16) English Title: Will AGI Take Over the World in 2027? (25:46) English Title: AI 2027 Prediction Report: AI May Fully Surpass Humans by 2027 (27:05) Acknowledgements --- First published: April 30th, 2025 Source: https://www.lesswrong.com/posts/JW7nttjTYmgWMqBaF/early-chinese-language-media-coverage-of-the-ai-2027-report --- Narrated by TYPE III AUDIO .…
This is a link post. to follow up my philantropic pledge from 2020, i've updated my philanthropy page with the 2024 results. in 2024 my donations funded $51M worth of endpoint grants (plus $2.0M in admin overhead and philanthropic software development). this comfortably exceeded my 2024 commitment of $42M (20k times $2100.00 — the minimum price of ETH in 2024). this also concludes my 5-year donation pledge, but of course my philanthropy continues: eg, i’ve already made over $4M in endpoint grants in the first quarter of 2025 (not including 2024 grants that were slow to disburse), as well as pledged at least $10M to the 2025 SFF grant round. --- First published: April 23rd, 2025 Source: https://www.lesswrong.com/posts/8ojWtREJjKmyvWdDb/jaan-tallinn-s-2024-philanthropy-overview Linkpost URL: https://jaan.info/philanthropy/#2024-results --- Narrated by TYPE III AUDIO .…
I’ve been thinking recently about what sets apart the people who’ve done the best work at Anthropic. You might think that the main thing that makes people really effective at research or engineering is technical ability, and among the general population that's true. Among people hired at Anthropic, though, we’ve restricted the range by screening for extremely high-percentile technical ability, so the remaining differences, while they still matter, aren’t quite as critical. Instead, people's biggest bottleneck eventually becomes their ability to get leverage—i.e., to find and execute work that has a big impact-per-hour multiplier. For example, here are some types of work at Anthropic that tend to have high impact-per-hour, or a high impact-per-hour ceiling when done well (of course this list is extremely non-exhaustive!): Improving tooling, documentation, or dev loops. A tiny amount of time fixing a papercut in the right way can save [...] --- Outline: (03:28) 1. Agency (03:31) Understand and work backwards from the root goal (05:02) Don't rely too much on permission or encouragement (07:49) Make success inevitable (09:28) 2. Taste (09:31) Find your angle (11:03) Think real hard (13:03) Reflect on your thinking --- First published: April 19th, 2025 Source: https://www.lesswrong.com/posts/DiJT4qJivkjrGPFi8/impact-agency-and-taste --- Narrated by TYPE III AUDIO .…
This is a link post. Guillaume Blanc has a piece in Works in Progress (I assume based on his paper) about how France's fertility declined earlier than in other European countries, and how its power waned as its relative population declined starting in the 18th century. In 1700, France had 20% of Europe's population (4% of the whole world population). Kissinger writes in Diplomacy with respect to the Versailles Peace Conference: Victory brought home to France the stark realization that revanche had cost it too dearly, and that it had been living off capital for nearly a century. France alone knew just how weak it had become in comparison with Germany, though nobody else, especially not America, was prepared to believe it ... Though France's allies insisted that its fears were exaggerated, French leaders knew better. In 1880, the French had represented 15.7 percent of Europe's population. By 1900, that [...] --- First published: April 23rd, 2025 Source: https://www.lesswrong.com/posts/gk2aJgg7yzzTXp8HJ/to-understand-history-keep-former-population-distributions Linkpost URL: https://arjunpanickssery.substack.com/p/to-understand-history-keep-former --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
We’ve written a new report on the threat of AI-enabled coups. I think this is a very serious risk – comparable in importance to AI takeover but much more neglected. In fact, AI-enabled coups and AI takeover have pretty similar threat models. To see this, here's a very basic threat model for AI takeover: Humanity develops superhuman AI Superhuman AI is misaligned and power-seeking Superhuman AI seizes power for itself And now here's a closely analogous threat model for AI-enabled coups: Humanity develops superhuman AI Superhuman AI is controlled by a small group Superhuman AI seizes power for the small group While the report focuses on the risk that someone seizes power over a country, I think that similar dynamics could allow someone to take over the world. In fact, if someone wanted to take over the world, their best strategy might well be to first stage an AI-enabled [...] --- Outline: (02:39) Summary (03:31) An AI workforce could be made singularly loyal to institutional leaders (05:04) AI could have hard-to-detect secret loyalties (06:46) A few people could gain exclusive access to coup-enabling AI capabilities (09:46) Mitigations (13:00) Vignette The original text contained 2 footnotes which were omitted from this narration. --- First published: April 16th, 2025 Source: https://www.lesswrong.com/posts/6kBMqrK9bREuGsrnd/ai-enabled-coups-a-small-group-could-use-ai-to-seize-power-1 --- Narrated by TYPE III AUDIO . --- Images from the article:…
Back in the 1990s, ground squirrels were briefly fashionable pets, but their popularity came to an abrupt end after an incident at Schiphol Airport on the outskirts of Amsterdam. In April 1999, a cargo of 440 of the rodents arrived on a KLM flight from Beijing, without the necessary import papers. Because of this, they could not be forwarded on to the customer in Athens. But nobody was able to correct the error and send them back either. What could be done with them? It's hard to think there wasn’t a better solution than the one that was carried out; faced with the paperwork issue, airport staff threw all 440 squirrels into an industrial shredder. [...] It turned out that the order to destroy the squirrels had come from the Dutch government's Department of Agriculture, Environment Management and Fishing. However, KLM's management, with the benefit of hindsight, said that [...] --- First published: April 22nd, 2025 Source: https://www.lesswrong.com/posts/nYJaDnGNQGiaCBSB5/accountability-sinks --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
Subtitle: Bad for loss of control risks, bad for concentration of power risks I’ve had this sitting in my drafts for the last year. I wish I’d been able to release it sooner, but on the bright side, it’ll make a lot more sense to people who have already read AI 2027. There's a good chance that AGI will be trained before this decade is out. By AGI I mean “An AI system at least as good as the best human X’ers, for all cognitive tasks/skills/jobs X.” Many people seem to be dismissing this hypothesis ‘on priors’ because it sounds crazy. But actually, a reasonable prior should conclude that this is plausible.[1] For more on what this means, what it might look like, and why it's plausible, see AI 2027, especially the Research section. If so, by default the existence of AGI will be a closely guarded [...] The original text contained 8 footnotes which were omitted from this narration. --- First published: April 18th, 2025 Source: https://www.lesswrong.com/posts/FGqfdJmB8MSH5LKGc/training-agi-in-secret-would-be-unsafe-and-unethical-1 --- Narrated by TYPE III AUDIO .…
Though, given my doomerism, I think the natsec framing of the AGI race is likely wrongheaded, let me accept the Dario/Leopold/Altman frame that AGI will be aligned to the national interest of a great power. These people seem to take as an axiom that a USG AGI will be better in some way than CCP AGI. Has anyone written justification for this assumption? I am neither an American citizen nor a Chinese citizen. What would it mean for an AGI to be aligned with "Democracy" or "Confucianism" or "Marxism with Chinese characteristics" or "the American constitution" Contingent on a world where such an entity exists and is compatible with my existence, what would my life be as a non-citizen in each system? Why should I expect USG AGI to be better than CCP AGI? --- First published: April 19th, 2025 Source: https://www.lesswrong.com/posts/MKS4tJqLWmRXgXzgY/why-should-i-assume-ccp-agi-is-worse-than-usg-agi-1 --- Narrated by TYPE III AUDIO .…
Introduction Writing this post puts me in a weird epistemic position. I simultaneously believe that: The reasoning failures that I'll discuss are strong evidence that current LLM- or, more generally, transformer-based approaches won't get us AGI As soon as major AI labs read about the specific reasoning failures described here, they might fix them But future versions of GPT, Claude etc. succeeding at the tasks I've described here will provide zero evidence of their ability to reach AGI. If someone makes a future post where they report that they tested an LLM on all the specific things I described here it aced all of them, that will not update my position at all. That is because all of the reasoning failures that I describe here are surprising in the sense that given everything else that they can do, you’d expect LLMs to succeed at all of these tasks. The [...] --- Outline: (00:13) Introduction (02:13) Reasoning failures (02:17) Sliding puzzle problem (07:17) Simple coaching instructions (09:22) Repeatedly failing at tic-tac-toe (10:48) Repeatedly offering an incorrect fix (13:48) Various people's simple tests (15:06) Various failures at logic and consistency while writing fiction (15:21) Inability to write young characters when first prompted (17:12) Paranormal posers (19:12) Global details replacing local ones (20:19) Stereotyped behaviors replacing character-specific ones (21:21) Top secret marine databases (23:32) Wandering items (23:53) Sycophancy (24:49) What's going on here? (32:18) How about scaling? Or reasoning models? --- First published: April 15th, 2025 Source: https://www.lesswrong.com/posts/sgpCuokhMb8JmkoSn/untitled-draft-7shu --- Narrated by TYPE III AUDIO . --- Images from the article:…
Dario Amodei, CEO of Anthropic, recently worried about a world where only 30% of jobs become automated, leading to class tensions between the automated and non-automated. Instead, he predicts that nearly all jobs will be automated simultaneously, putting everyone "in the same boat." However, based on my experience spanning AI research (including first author papers at COLM / NeurIPS and attending MATS under Neel Nanda), robotics, and hands-on manufacturing (including machining prototype rocket engine parts for Blue Origin and Ursa Major), I see a different near-term future. Since the GPT-4 release, I've evaluated frontier models on a basic manufacturing task, which tests both visual perception and physical reasoning. While Gemini 2.5 Pro recently showed progress on the visual front, all models tested continue to fail significantly on physical reasoning. They still perform terribly overall. Because of this, I think that there will be an interim period where a significant [...] --- Outline: (01:28) The Evaluation (02:29) Visual Errors (04:03) Physical Reasoning Errors (06:09) Why do LLM's struggle with physical tasks? (07:37) Improving on physical tasks may be difficult (10:14) Potential Implications of Uneven Automation (11:48) Conclusion (12:24) Appendix (12:44) Visual Errors (14:36) Physical Reasoning Errors --- First published: April 14th, 2025 Source: https://www.lesswrong.com/posts/r3NeiHAEWyToers4F/frontier-ai-models-still-fail-at-basic-physical-tasks-a --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
Audio note: this article contains 31 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description. Lewis Smith*, Sen Rajamanoharan*, Arthur Conmy, Callum McDougall, Janos Kramar, Tom Lieberum, Rohin Shah, Neel Nanda * = equal contribution The following piece is a list of snippets about research from the GDM mechanistic interpretability team, which we didn’t consider a good fit for turning into a paper, but which we thought the community might benefit from seeing in this less formal form. These are largely things that we found in the process of a project investigating whether sparse autoencoders were useful for downstream tasks, notably out-of-distribution probing. TL;DR To validate whether SAEs were a worthwhile technique, we explored whether they were useful on the downstream task of OOD generalisation when detecting harmful intent in user prompts [...] --- Outline: (01:08) TL;DR (02:38) Introduction (02:41) Motivation (06:09) Our Task (08:35) Conclusions and Strategic Updates (13:59) Comparing different ways to train Chat SAEs (18:30) Using SAEs for OOD Probing (20:21) Technical Setup (20:24) Datasets (24:16) Probing (26:48) Results (30:36) Related Work and Discussion (34:01) Is it surprising that SAEs didn't work? (39:54) Dataset debugging with SAEs (42:02) Autointerp and high frequency latents (44:16) Removing High Frequency Latents from JumpReLU SAEs (45:04) Method (45:07) Motivation (47:29) Modifying the sparsity penalty (48:48) How we evaluated interpretability (50:36) Results (51:18) Reconstruction loss at fixed sparsity (52:10) Frequency histograms (52:52) Latent interpretability (54:23) Conclusions (56:43) Appendix The original text contained 7 footnotes which were omitted from this narration. --- First published: March 26th, 2025 Source: https://www.lesswrong.com/posts/4uXCAJNuPKtKBsi28/sae-progress-update-2-draft --- Narrated by TYPE III AUDIO . --- Images from the article:…
This is a link post. When I was a really small kid, one of my favorite activities was to try and dam up the creek in my backyard. I would carefully move rocks into high walls, pile up leaves, or try patching the holes with sand. The goal was just to see how high I could get the lake, knowing that if I plugged every hole, eventually the water would always rise and defeat my efforts. Beaver behaviour. One day, I had the realization that there was a simpler approach. I could just go get a big 5 foot long shovel, and instead of intricately locking together rocks and leaves and sticks, I could collapse the sides of the riverbank down and really build a proper big dam. I went to ask my dad for the shovel to try this out, and he told me, very heavily paraphrasing, 'Congratulations. You've [...] --- First published: April 10th, 2025 Source: https://www.lesswrong.com/posts/rLucLvwKoLdHSBTAn/playing-in-the-creek Linkpost URL: https://hgreer.com/PlayingInTheCreek --- Narrated by TYPE III AUDIO .…
This is part of the MIRI Single Author Series. Pieces in this series represent the beliefs and opinions of their named authors, and do not claim to speak for all of MIRI. Okay, I'm annoyed at people covering AI 2027 burying the lede, so I'm going to try not to do that. The authors predict a strong chance that all humans will be (effectively) dead in 6 years, and this agrees with my best guess about the future. (My modal timeline has loss of control of Earth mostly happening in 2028, rather than late 2027, but nitpicking at that scale hardly matters.) Their timeline to transformative AI also seems pretty close to the perspective of frontier lab CEO's (at least Dario Amodei, and probably Sam Altman) and the aggregate market opinion of both Metaculus and Manifold! If you look on those market platforms you get graphs like this: Both [...] --- Outline: (02:23) Mode ≠ Median (04:50) Theres a Decent Chance of Having Decades (06:44) More Thoughts (08:55) Mid 2025 (09:01) Late 2025 (10:42) Early 2026 (11:18) Mid 2026 (12:58) Late 2026 (13:04) January 2027 (13:26) February 2027 (14:53) March 2027 (16:32) April 2027 (16:50) May 2027 (18:41) June 2027 (19:03) July 2027 (20:27) August 2027 (22:45) September 2027 (24:37) October 2027 (26:14) November 2027 (Race) (29:08) December 2027 (Race) (30:53) 2028 and Beyond (Race) (34:42) Thoughts on Slowdown (38:27) Final Thoughts --- First published: April 9th, 2025 Source: https://www.lesswrong.com/posts/Yzcb5mQ7iq4DFfXHx/thoughts-on-ai-2027 --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
Short AI takeoff timelines seem to leave no time for some lines of alignment research to become impactful. But any research rebalances the mix of currently legible research directions that could be handed off to AI-assisted alignment researchers or early autonomous AI researchers whenever they show up. So even hopelessly incomplete research agendas could still be used to prompt future capable AI to focus on them, while in the absence of such incomplete research agendas we'd need to rely on AI's judgment more completely. This doesn't crucially depend on giving significant probability to long AI takeoff timelines, or on expected value in such scenarios driving the priorities. Potential for AI to take up the torch makes it reasonable to still prioritize things that have no hope at all of becoming practical for decades (with human effort). How well AIs can be directed to advance a line of research [...] --- First published: April 9th, 2025 Source: https://www.lesswrong.com/posts/3NdpbA6M5AM2gHvTW/short-timelines-don-t-devalue-long-horizon-research --- Narrated by TYPE III AUDIO .…
In this post, we present a replication and extension of an alignment faking model organism: Replication: We replicate the alignment faking (AF) paper and release our code. Classifier Improvements: We significantly improve the precision and recall of the AF classifier. We release a dataset of ~100 human-labelled examples of AF for which our classifier achieves an AUROC of 0.9 compared to 0.6 from the original classifier. Evaluating More Models: We find Llama family models, other open source models, and GPT-4o do not AF in the prompted-only setting when evaluating using our new classifier (other than a single instance with Llama 3 405B). Extending SFT Experiments: We run supervised fine-tuning (SFT) experiments on Llama (and GPT4o) and find that AF rate increases with scale. We release the fine-tuned models on Huggingface and scripts. Alignment faking on 70B: We find that Llama 70B alignment fakes when both using the system prompt in the [...] --- Outline: (02:43) Method (02:46) Overview of the Alignment Faking Setup (04:22) Our Setup (06:02) Results (06:05) Improving Alignment Faking Classification (10:56) Replication of Prompted Experiments (14:02) Prompted Experiments on More Models (16:35) Extending Supervised Fine-Tuning Experiments to Open-Source Models and GPT-4o (23:13) Next Steps (25:02) Appendix (25:05) Appendix A: Classifying alignment faking (25:17) Criteria in more depth (27:40) False positives example 1 from the old classifier (30:11) False positives example 2 from the old classifier (32:06) False negative example 1 from the old classifier (35:00) False negative example 2 from the old classifier (36:56) Appendix B: Classifier ROC on other models (37:24) Appendix C: User prompt suffix ablation (40:24) Appendix D: Longer training of baseline docs --- First published: April 8th, 2025 Source: https://www.lesswrong.com/posts/Fr4QsQT52RFKHvCAH/alignment-faking-revisited-improved-classifiers-and-open --- Narrated by TYPE III AUDIO . --- Images from the article:…
Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that, in under five years, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks. The length of tasks (measured by how long they take human professionals) that generalist frontier model agents can complete autonomously with 50% reliability has been doubling approximately every 7 months for the last 6 years. The shaded region represents 95% CI calculated by hierarchical bootstrap over task families, tasks, and task attempts. Full paper | Github repo We think that forecasting the capabilities of future AI systems is important for understanding and preparing for the impact of [...] --- Outline: (08:58) Conclusion (09:59) Want to contribute? --- First published: March 19th, 2025 Source: https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks --- Narrated by TYPE III AUDIO . --- Images from the article:…
“In the loveliest town of all, where the houses were white and high and the elms trees were green and higher than the houses, where the front yards were wide and pleasant and the back yards were bushy and worth finding out about, where the streets sloped down to the stream and the stream flowed quietly under the bridge, where the lawns ended in orchards and the orchards ended in fields and the fields ended in pastures and the pastures climbed the hill and disappeared over the top toward the wonderful wide sky, in this loveliest of all towns Stuart stopped to get a drink of sarsaparilla.” — 107-word sentence from Stuart Little (1945) Sentence lengths have declined. The average sentence length was 49 for Chaucer (died 1400), 50 for Spenser (died 1599), 42 for Austen (died 1817), 20 for Dickens (died 1870), 21 for Emerson (died 1882), 14 [...] --- First published: April 3rd, 2025 Source: https://www.lesswrong.com/posts/xYn3CKir4bTMzY5eb/why-have-sentence-lengths-decreased --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
In 2021 I wrote what became my most popular blog post: What 2026 Looks Like. I intended to keep writing predictions all the way to AGI and beyond, but chickened out and just published up till 2026. Well, it's finally time. I'm back, and this time I have a team with me: the AI Futures Project. We've written a concrete scenario of what we think the future of AI will look like. We are highly uncertain, of course, but we hope this story will rhyme with reality enough to help us all prepare for what's ahead. You really should go read it on the website instead of here, it's much better. There's a sliding dashboard that updates the stats as you scroll through the scenario! But I've nevertheless copied the first half of the story below. I look forward to reading your comments. Mid 2025: Stumbling Agents The [...] --- Outline: (01:35) Mid 2025: Stumbling Agents (03:13) Late 2025: The World's Most Expensive AI (08:34) Early 2026: Coding Automation (10:49) Mid 2026: China Wakes Up (13:48) Late 2026: AI Takes Some Jobs (15:35) January 2027: Agent-2 Never Finishes Learning (18:20) February 2027: China Steals Agent-2 (21:12) March 2027: Algorithmic Breakthroughs (23:58) April 2027: Alignment for Agent-3 (27:26) May 2027: National Security (29:50) June 2027: Self-improving AI (31:36) July 2027: The Cheap Remote Worker (34:35) August 2027: The Geopolitics of Superintelligence (40:43) September 2027: Agent-4, the Superhuman AI Researcher --- First published: April 3rd, 2025 Source: https://www.lesswrong.com/posts/TpSFoqoG2M5MAAesg/ai-2027-what-superintelligence-looks-like-1 --- Narrated by TYPE III AUDIO . --- Images from the article:…
Back when the OpenAI board attempted and failed to fire Sam Altman, we faced a highly hostile information environment. The battle was fought largely through control of the public narrative, and the above was my attempt to put together what happened.My conclusion, which I still believe, was that Sam Altman had engaged in a variety of unacceptable conduct that merited his firing.In particular, he very much ‘not been consistently candid’ with the board on several important occasions. In particular, he lied to board members about what was said by other board members, with the goal of forcing out a board member he disliked. There were also other instances in which he misled and was otherwise toxic to employees, and he played fast and loose with the investment fund and other outside opportunities. I concluded that the story that this was about ‘AI safety’ or ‘EA (effective altruism)’ or [...] --- Outline: (01:32) The Big Picture Going Forward (06:27) Hagey Verifies Out the Story (08:50) Key Facts From the Story (11:57) Dangers of False Narratives (16:24) A Full Reference and Reading List --- First published: March 31st, 2025 Source: https://www.lesswrong.com/posts/25EgRNWcY6PM3fWZh/openai-12-battle-of-the-board-redux --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
Epistemic status: This post aims at an ambitious target: improving intuitive understanding directly. The model for why this is worth trying is that I believe we are more bottlenecked by people having good intuitions guiding their research than, for example, by the ability of people to code and run evals. Quite a few ideas in AI safety implicitly use assumptions about individuality that ultimately derive from human experience. When we talk about AIs scheming, alignment faking or goal preservation, we imply there is something scheming or alignment faking or wanting to preserve its goals or escape the datacentre. If the system in question were human, it would be quite clear what that individual system is. When you read about Reinhold Messner reaching the summit of Everest, you would be curious about the climb, but you would not ask if it was his body there, or his [...] --- Outline: (01:38) Individuality in Biology (03:53) Individuality in AI Systems (10:19) Risks and Limitations of Anthropomorphic Individuality Assumptions (11:25) Coordinating Selves (16:19) Whats at Stake: Stories (17:25) Exporting Myself (21:43) The Alignment Whisperers (23:27) Echoes in the Dataset (25:18) Implications for Alignment Research and Policy --- First published: March 28th, 2025 Source: https://www.lesswrong.com/posts/wQKskToGofs4osdJ3/the-pando-problem-rethinking-ai-individuality --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
Back when the OpenAI board attempted and failed to fire Sam Altman, we faced a highly hostile information environment. The battle was fought largely through control of the public narrative, and the above was my attempt to put together what happened.My conclusion, which I still believe, was that Sam Altman had engaged in a variety of unacceptable conduct that merited his firing.In particular, he very much ‘not been consistently candid’ with the board on several important occasions. In particular, he lied to board members about what was said by other board members, with the goal of forcing out a board member he disliked. There were also other instances in which he misled and was otherwise toxic to employees, and he played fast and loose with the investment fund and other outside opportunities. I concluded that the story that this was about ‘AI safety’ or ‘EA (effective altruism)’ or [...] --- Outline: (01:32) The Big Picture Going Forward (06:27) Hagey Verifies Out the Story (08:50) Key Facts From the Story (11:57) Dangers of False Narratives (16:24) A Full Reference and Reading List --- First published: March 31st, 2025 Source: https://www.lesswrong.com/posts/25EgRNWcY6PM3fWZh/openai-12-battle-of-the-board-redux --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.