Artwork

Content provided by Philip - Host of AI Explained YT. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Philip - Host of AI Explained YT or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

o3 breaks (some) records, but AI becomes pay-to-win

14:33
 
Share
 

Manage episode 479061283 series 3611272
Content provided by Philip - Host of AI Explained YT. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Philip - Host of AI Explained YT or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

A green card, o3 vs Gemini 2.5, 6 Benchmarks and a whole bunch of my thoughts on what on earth is happening in AI, from here to 2030. Plus, how AI is becoming pay-to-win, and why. Crazy times, 14 mins probably wasn’t enough.
https://app.grayswan.ai/ai-explained
AI Insiders ($9!): https://www.patreon.com/AIExplained
Chapters:
00:00 - Introduction
00:33 - FictionLiveBench
01:37 - PHYBench
02:14 - SimpleBench
02:54 - Virology Capabilities Test
03:13 - Mathematics Performance
04:29 - Vision Benchmarks
05:43 - V* and how o3 works
06:44 - Revenue and costs for you
08:54 - Expensive RL and trade-offs
09:40 - How to spend the OOMs
13:27 - Gray Swan Arena
Green Card: https://techcrunch.com/2025/04/25/an-openai-researcher-who-worked-on-gpt-4-5-had-their-green-card-denied/
PHYBench: https://arxiv.org/pdf/2504.16074Virologytest: https://www.virologytest.ai/
How o3 Vision Works: https://arxiv.org/pdf/2312.14135 https://x.com/sainingxie/status/1912570624523829573
Visual puzzles: https://neulab.github.io/VisualPuzzles/
Fiction Bench: https://x.com/ficlive/status/1912863028141244850
https://geobench.org/
https://simple-bench.com/
AIME 2025: https://openai.com/index/introducing-o3-and-o4-mini/
USAMO: https://x.com/mbalunovic/status/1914398518896193747
NaturalBench: https://linzhiqiu.github.io/papers/naturalbench/
Where’s Waldo: https://uk.pinterest.com/pin/492792384225896298/
IMO and AlphaProof:https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/
Crazy Revenue: https://www.theinformation.com/articles/openai-forecasts-revenue-topping-125-billion-2029-agents-new-products-gain?rc=sy0ihq
Number of Users: https://www.theinformation.com/briefings/googles-gemini-user-numbers-revealed-court?rc=sy0ihq
Subscriptions pay to win: https://www.forbes.com/sites/paulmonckton/2025/04/23/google-leak-reveals-new-gemini-ai-subscription-levels/
GPU Trade-offs: https://x.com/sama/status/1915098951067554030
RL Scale-up Amodei: https://www.darioamodei.com/post/on-deepseek-and-export-controls
Log-linear Returns: https://x.com/bobmcgrewai/status/1895228291981943265
2030 Scaling: https://epoch.ai/blog/can-ai-scaling-continue-through-2030
Model Size: https://x.com/slow_developer/status/1874554473256997201
Adam on AGI: https://x.com/TheRealAdamG/status/1913998366632968381
Papers on Patreon: https://arxiv.org/pdf/2502.01839
https://arxiv.org/pdf/2504.13837
Chollet Quote: https://x.com/fchollet/status/1912934762580447447
OpenSim: https://opensim.stanford.edu/
Non-hype Newsletter: https://signaltonoise.beehiiv.com/

  continue reading

24 episodes

Artwork
iconShare
 
Manage episode 479061283 series 3611272
Content provided by Philip - Host of AI Explained YT. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Philip - Host of AI Explained YT or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

A green card, o3 vs Gemini 2.5, 6 Benchmarks and a whole bunch of my thoughts on what on earth is happening in AI, from here to 2030. Plus, how AI is becoming pay-to-win, and why. Crazy times, 14 mins probably wasn’t enough.
https://app.grayswan.ai/ai-explained
AI Insiders ($9!): https://www.patreon.com/AIExplained
Chapters:
00:00 - Introduction
00:33 - FictionLiveBench
01:37 - PHYBench
02:14 - SimpleBench
02:54 - Virology Capabilities Test
03:13 - Mathematics Performance
04:29 - Vision Benchmarks
05:43 - V* and how o3 works
06:44 - Revenue and costs for you
08:54 - Expensive RL and trade-offs
09:40 - How to spend the OOMs
13:27 - Gray Swan Arena
Green Card: https://techcrunch.com/2025/04/25/an-openai-researcher-who-worked-on-gpt-4-5-had-their-green-card-denied/
PHYBench: https://arxiv.org/pdf/2504.16074Virologytest: https://www.virologytest.ai/
How o3 Vision Works: https://arxiv.org/pdf/2312.14135 https://x.com/sainingxie/status/1912570624523829573
Visual puzzles: https://neulab.github.io/VisualPuzzles/
Fiction Bench: https://x.com/ficlive/status/1912863028141244850
https://geobench.org/
https://simple-bench.com/
AIME 2025: https://openai.com/index/introducing-o3-and-o4-mini/
USAMO: https://x.com/mbalunovic/status/1914398518896193747
NaturalBench: https://linzhiqiu.github.io/papers/naturalbench/
Where’s Waldo: https://uk.pinterest.com/pin/492792384225896298/
IMO and AlphaProof:https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/
Crazy Revenue: https://www.theinformation.com/articles/openai-forecasts-revenue-topping-125-billion-2029-agents-new-products-gain?rc=sy0ihq
Number of Users: https://www.theinformation.com/briefings/googles-gemini-user-numbers-revealed-court?rc=sy0ihq
Subscriptions pay to win: https://www.forbes.com/sites/paulmonckton/2025/04/23/google-leak-reveals-new-gemini-ai-subscription-levels/
GPU Trade-offs: https://x.com/sama/status/1915098951067554030
RL Scale-up Amodei: https://www.darioamodei.com/post/on-deepseek-and-export-controls
Log-linear Returns: https://x.com/bobmcgrewai/status/1895228291981943265
2030 Scaling: https://epoch.ai/blog/can-ai-scaling-continue-through-2030
Model Size: https://x.com/slow_developer/status/1874554473256997201
Adam on AGI: https://x.com/TheRealAdamG/status/1913998366632968381
Papers on Patreon: https://arxiv.org/pdf/2502.01839
https://arxiv.org/pdf/2504.13837
Chollet Quote: https://x.com/fchollet/status/1912934762580447447
OpenSim: https://opensim.stanford.edu/
Non-hype Newsletter: https://signaltonoise.beehiiv.com/

  continue reading

24 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide

Listen to this show while you explore
Play