Covering the biggest news of the century - the arrival of smarter-than-human AI. From the author of Simple Bench, which reveals the remaining gap between LLM and human reasoning. Hype-free, and the British accent is a freebie bonus.
…
continue reading

1
o3 breaks (some) records, but AI becomes pay-to-win
14:33
14:33
Play later
Play later
Lists
Like
Liked
14:33A green card, o3 vs Gemini 2.5, 6 Benchmarks and a whole bunch of my thoughts on what on earth is happening in AI, from here to 2030. Plus, how AI is becoming pay-to-win, and why. Crazy times, 14 mins probably wasn’t enough. https://app.grayswan.ai/ai-explained AI Insiders ($9!): https://www.patreon.com/AIExplained Chapters: 00:00 - Introduction 00…
…
continue reading

1
o3 and o4-mini - they’re great, but easy to over-hype
14:24
14:24
Play later
Play later
Lists
Like
Liked
14:24Critical analysis of the two most powerful new models behind ChatGPT, o3 and o4-mini. Not just the system cards, benchmarks, and my own tests, but some you may not have seen before. Yes, they can whip up amazing front-end in a few seconds, but you always have to ask what is in their data. Either way, they prove the gains from RL are just beginning……
…
continue reading

1
‘Speaking Dolphin’ to AI Data Dominance, 4.1 + Kling 2: 7 Developments Critically Analysed
20:09
20:09
Play later
Play later
Lists
Like
Liked
20:09This pod won’t just be about the release of GPT 4.1 in the last 48 hours, o3 build-up, Kling 2.0, a sneak-peak at the next OpenAI model, or even the new Dolphin language tool. It will be about 7 such stories that contextualise where we are in AI and what is happening. https://www.emergentmind.com/ Chapters: 00:00 - Introduction 00:30 - Kling 2.0 01…
…
continue reading

1
AI CEO: ‘Stock Crash Could Stop AI Progress’, Llama 4 Anti-climax +‘Superintelligence in 2027’...
23:51
23:51
Play later
Play later
Lists
Like
Liked
23:51The latest on Llama 4, and whether it signals a slowdown in AI, or solid progress. Plus, a deep dive on that viral prediction of superintelligence by 2027, and Amodei’s cautionary words on what could stop AI progress in its tracks. o3 news, and more, as well. Weights & Biases: https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_be…
…
continue reading

1
Gemini 2.5 Pro - It’s a Smart Chatbot … (New Simple High Score)
21:21
21:21
Play later
Play later
Lists
Like
Liked
21:21Gemini gets a new record on Simple Bench, and several other benchmarks. I’ll go deep to explore its nuances, including how it deceptively reverse engineers answers, does better on certain coding benchmarks than others, may have a universal ‘conceptual language’ … https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campai…
…
continue reading

1
Did AI Just Get Commoditized? Gemini 2.5, New DeepSeek V3, & Microsoft vs OpenAI
13:47
13:47
Play later
Play later
Lists
Like
Liked
13:47Gemini 2.5 is out, on the same day as the new DeepSeek V3 (which should power Deepseek R2). Do both models prove AI is being commoditized? Let’s find out, on this blockbuster day of AI releases. Plus exclusives from the Information, Simple indications, Vista Bench, LM Arena and more… AI Insiders ($9!): https://www.patreon.com/AIExplained Chapters: …
…
continue reading

1
Manus AI - The Calm Before the Hypestorm … (vs Deep Research + Grok 3)
12:58
12:58
Play later
Play later
Lists
Like
Liked
12:58Is Manus AI the memecoin of the AI world, or legit? I’ll compare it to OpenAI’s Deep Research, Operator, Grok 3 DeepSearch and more to find out. I’ll also let you in on some of the secrets of what makes a good hype campaign, the estimated costs of Manus AI, and where it is strong. Other news (yes, Gemini image editing and research hacking, I mean y…
…
continue reading
GPT 4.5 is here, and do you remember when AI lab CEOs like Sam Altman and Dario Amodei were betting everything on scaling up base models like this one? Well let’s find out what would have happened if the future of AI rested on models like GPT 4.5. You’ll see all the benchmarks, highlights of the paper, emotional intelligence and humor tests, Simple…
…
continue reading

1
Claude 3.7 is More Significant than its Name Implies (ft DeepSeek R2 + GPT 4.5 coming soon)
27:39
27:39
Play later
Play later
Lists
Like
Liked
27:39Claude 3.7 is here, hot on the heels of Grok 3 and a host of other developments, but how good is it really? And what does it say about the next few months in AI? I’ve read the papers, played with the model for hours, and benched it on Simple. Things aren’t slowing down. Plus the latest in humanoid robots, led by Helix and freaked out by Protoclone.…
…
continue reading

1
AGI: (gets close), Humans: ‘Who Gets the Money?’
22:17
22:17
Play later
Play later
Lists
Like
Liked
22:17A 'frontier reasoning model' from just 1000 examples (s1). A $100B Musk bid for power. Gemini 2, Rand and warning from Amodei. Here’s 7-8 developments you may have missed but which I would argue help us understand how the next few years will play out. From labour vs capital to automating rival companies and countries, and from non-profit shenanigan…
…
continue reading

1
Deep Research by OpenAI - The Ups and Downs vs DeepSeek R1 Search + Gemini Deep Research
18:32
18:32
Play later
Play later
Lists
Like
Liked
18:3212 hours ago Deep Research was unveiled, and I’ve tested it thoroughly, including vs Deepseek R1 with search, Gemini Deep Research and even R1 in Perplexity. It’s a notable step forward, with one big caveat. I’ll go through all the benchmark figures, my initial impression of the o3 model within, and much more. Deep Research: https://openai.com/inde…
…
continue reading
o3-mini is here, and yes, I’ve read the paper in full - 2 hours after release, and even the post-launch Reddit AMA. Some epic details like a FrontierMath score that made me double-take, a likely new Cursor favorite, bio risk expertise and a cost-comparison with Deepseek R1., But does it perform on basic reasoning - let’s find out. Plus, arguably th…
…
continue reading

1
Nothing Much Happens in AI, Then Everything Does All At Once
23:09
23:09
Play later
Play later
Lists
Like
Liked
23:09When it rains, it pours. OpenAI Operator tested and reviewed, with full paper analysis. Perplexity Assistant is useful. Then Stargate, is it all smoke and mirrors? Strong rumours of an o3+ model from Anthropic. Then a full breakdown of Deepseek R1, and what it’s training method says about the state of AI. It’s not open source BTW. Plus Humanity’s L…
…
continue reading

1
Altman Expects a ‘Fast Take-off’, ‘Super-Agent’ Debuting Soon and DeepSeek R1 Out
13:11
13:11
Play later
Play later
Lists
Like
Liked
13:11OpenAI looks set to debut their Operator system, and some leaks are out. At the same time Deepseek R1 releases some numbers, and Sam Altman says he might have been wrong before, and now anticipates a 'fast take-off'. Plus two papers to give you an idea of what a super-agent might be decent at doing, some more exclusive article analysis and much mor…
…
continue reading

1
OpenAI Backtracks on Superintelligence + Altman Brings His Timeline Forward
23:41
23:41
Play later
Play later
Lists
Like
Liked
23:41Sam Altman unexpectedly brings his timelines to AGI forward, while OpenAI backtrack on superintelligence. None of these changes were heralded, but they are significant. Plus the new year brings new assessments of the true capability of models to automate 'large swathes of the economy'. I'll give my prediction on that front for 2025, announcement a …
…
continue reading
o3 isn’t one of the biggest developments in AI for 2+ years because it beats a particular benchmark. It is so because it demonstrates a reusable technique through which almost any benchmark could fall, and at short notice. I’ll cover all the highlights, benchmarks broken, and what comes next. Plus, the costs OpenAI didn’t want us to know, Genesis, …
…
continue reading

1
Never Browse Alone? - Gemini 2 Live and ChatGPT Vision
13:40
13:40
Play later
Play later
Lists
Like
Liked
13:40The ‘Gemini 2 Era’ begins … with screen-sharing? But really, it’s a great free tool, for curiosity satisfying rather than bleeding-edge intelligence. I give you the benchmarks, the highlights and of course, the latest from OpenAI Advanced Voice Mode with Vision. Plus Deep Research in Gemini Advanced, Simple Bench updates, Santa and what might be fo…
…
continue reading
After a 10 month wait, OpenAI have released Sora to paying users. With just a prompt it can generate videos of up to 20 seconds in lower resolutions, and 10 seconds at 1080p if you can fork out $200/month. I’ve tested it and read the system card. The user interface is quite beautiful, even if the videos themselves operate until entirely new rules o…
…
continue reading

1
o1 Pro Mode – Full Analysis (plus o1 paper highlights)
16:43
16:43
Play later
Play later
Lists
Like
Liked
16:43Oh boy. o1 pro mode out on the same night as o1 full. I read the 49 page paper, ran my own tests, spent my fuel allowance on Pro Mode and will give you all the highlights. Suffice to say the story is not as simple as it first appears. Weights and Biases’ Weave: wandb.me/ai_explained Plus, GPT-4.5? MLE Bench, Simple Update, Image Analysis and much m…
…
continue reading

1
AI Breaks Its Silence: OpenAI’s ‘Next 12 Days’, Genie 2, and a Word of Caution
15:29
15:29
Play later
Play later
Lists
Like
Liked
15:29Calmest before the storm? Whatever analogy you want to use things had gotten quiet toward the end of 2024. But then tonight we got Genie 2, and a series of scheduled announcements from OpenAI. Sora is soon here, and o1, but I dive deeper into what it all means and whether reliability is on a path to being solved, ft: two recent papers. Assembly AI …
…
continue reading

1
New Google Model Ranked ‘No. 1 LLM’, But There’s a Problem
15:19
15:19
Play later
Play later
Lists
Like
Liked
15:19A new and mysterious Gemini model appears at the top of the leaderboard, but is that the full story? I dig behind the headline to show you some anti-climactic results, give some context with leaks in the last 48 hours of diminishing returns to scaling, and add the response of Altman, OpenAI and co. The future is about to look a lot stranger... 80,0…
…
continue reading

1
Leak: ‘GPT-5 exhibits diminishing returns’, Sam Altman: ‘lol’
15:44
15:44
Play later
Play later
Lists
Like
Liked
15:44The last few days have seen two narratives emerge. One, derived from yesterday’s OpenAI leak in TheInformation, that GPT-5/Orion is a disappointment, and less of a leap than GPT-3 to GPT-4. The second comes from a series of 4 clips (shown in this video) from Sam Altman, regarding the ‘clear path’ to AGI. Let’s go beyond the headlines (and through p…
…
continue reading

1
ChatGPT with Search, Altman Answers Anything and Simple Bench Out
15:20
15:20
Play later
Play later
Lists
Like
Liked
15:20The Google destroyer, the Perplexity crusher? Or just hype? ChatGPT with Search is here, and simultaneously Altman and co did an AMA on Reddit, covering GPT-5, Sora, SearchGPT and a lot more. Plus, the biggest news of them all: Simple Bench is out. ChatGPT with Search: https://openai.com/index/introducing-chatgpt-search/ Altman AMA (ask me anything…
…
continue reading

1
The New Claude 3.5 Sonnet: Better, Yes, But Not Just in the Way You Might Think
22:34
22:34
Play later
Play later
Lists
Like
Liked
22:34A new state of the art LLM (at least for creative writing and basic reasoning) but what lies behind the numbers that were put out? Is it for real, and are AI agents about to grab your mouse and shake your cursor? Plus, results on my own Simple Bench, and new tools from Runway (Act-One), HeyGen (Zoom Calls) and an updated NotebookLM. AI, without the…
…
continue reading