Go offline with the Player FM app!
Claude 3.7 is More Significant than its Name Implies (ft DeepSeek R2 + GPT 4.5 coming soon)
Manage episode 468470654 series 3611272
Claude 3.7 is here, hot on the heels of Grok 3 and a host of other developments, but how good is it really? And what does it say about the next few months in AI? I’ve read the papers, played with the model for hours, and benched it on Simple. Things aren’t slowing down. Plus the latest in humanoid robots, led by Helix and freaked out by Protoclone. And reports of GPT 4.5 and DeepSeek R2.
GraySwan Competition! https://app.grayswan.ai/arena/challenge/agent-red-teaming
https://x.com/GraySwanAI/status/1894084923260043282
Chapters:
00:00 - Introduction
01:25 - Claude 3.7 New Stats/Demos
05:22 - 128k Output
06:13 - Pokemon
06:58 - Just a tool?
09:54 - DeepSeek R2
10:20 - Claude 3.7 System Card/Paper Highlights
17:18 - Simple Record Score/Competition
20:37 - Grok 3 + Redteaming prizes
22:26 - Google Co-scientist
24:02 - Humanoid Robot Developments
3.7 Release Notes: https://www.anthropic.com/news/claude-3-7-sonnet
vs o3 and Grok 3: https://x.com/12exyz/status/1891723056931827959
Extended Thinking: https://www.anthropic.com/research/visible-extended-thinking?s=09
System Prompt: https://docs.anthropic.com/en/release-notes/system-prompts#feb-24th-2025
System Card: https://assets.anthropic.com/m/785e231869ea8b3b/original/claude-3-7-sonnet-system-card.pdf
Unfaithful CoT: https://arxiv.org/pdf/2305.04388
Original Constitution: https://www.anthropic.com/news/claudes-constitution
Responsible Scaling Policy: https://assets.anthropic.com/m/24a47b00f10301cd/original/Anthropic-Responsible-Scaling-Policy-2024-10-15.pdf
Amodei and Hassabis:https://www.youtube.com/watch?v=4poqjZlM8Lo
https://simple-bench.com/
400 Weekly Users: https://x.com/bradlightcap/status/1892579908179882057
Grok 3 Jailbroken: https://x.com/LinusEkenstam/status/1893832876581380280
Google Co-Scientist: https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/
But Hassabis Says Years Away: https://www.youtube.com/watch?v=yr0GiSgUvPU&t=156s
DeepSeek R2 Reuters: https://www.reuters.com/technology/artificial-intelligence/deepseek-rushes-launch-new-ai-model-china-goes-all-2025-02-25/
Protoclone: https://www.reddit.com/r/interestingasfuck/comments/1it9rpp/protoclone_the_worlds_first_bipedal/
Helix: https://www.figure.ai/news/helix
TechTrance: https://www.youtube.com/@TheTechTrance/videos
24 episodes
Manage episode 468470654 series 3611272
Claude 3.7 is here, hot on the heels of Grok 3 and a host of other developments, but how good is it really? And what does it say about the next few months in AI? I’ve read the papers, played with the model for hours, and benched it on Simple. Things aren’t slowing down. Plus the latest in humanoid robots, led by Helix and freaked out by Protoclone. And reports of GPT 4.5 and DeepSeek R2.
GraySwan Competition! https://app.grayswan.ai/arena/challenge/agent-red-teaming
https://x.com/GraySwanAI/status/1894084923260043282
Chapters:
00:00 - Introduction
01:25 - Claude 3.7 New Stats/Demos
05:22 - 128k Output
06:13 - Pokemon
06:58 - Just a tool?
09:54 - DeepSeek R2
10:20 - Claude 3.7 System Card/Paper Highlights
17:18 - Simple Record Score/Competition
20:37 - Grok 3 + Redteaming prizes
22:26 - Google Co-scientist
24:02 - Humanoid Robot Developments
3.7 Release Notes: https://www.anthropic.com/news/claude-3-7-sonnet
vs o3 and Grok 3: https://x.com/12exyz/status/1891723056931827959
Extended Thinking: https://www.anthropic.com/research/visible-extended-thinking?s=09
System Prompt: https://docs.anthropic.com/en/release-notes/system-prompts#feb-24th-2025
System Card: https://assets.anthropic.com/m/785e231869ea8b3b/original/claude-3-7-sonnet-system-card.pdf
Unfaithful CoT: https://arxiv.org/pdf/2305.04388
Original Constitution: https://www.anthropic.com/news/claudes-constitution
Responsible Scaling Policy: https://assets.anthropic.com/m/24a47b00f10301cd/original/Anthropic-Responsible-Scaling-Policy-2024-10-15.pdf
Amodei and Hassabis:https://www.youtube.com/watch?v=4poqjZlM8Lo
https://simple-bench.com/
400 Weekly Users: https://x.com/bradlightcap/status/1892579908179882057
Grok 3 Jailbroken: https://x.com/LinusEkenstam/status/1893832876581380280
Google Co-Scientist: https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/
But Hassabis Says Years Away: https://www.youtube.com/watch?v=yr0GiSgUvPU&t=156s
DeepSeek R2 Reuters: https://www.reuters.com/technology/artificial-intelligence/deepseek-rushes-launch-new-ai-model-china-goes-all-2025-02-25/
Protoclone: https://www.reddit.com/r/interestingasfuck/comments/1it9rpp/protoclone_the_worlds_first_bipedal/
Helix: https://www.figure.ai/news/helix
TechTrance: https://www.youtube.com/@TheTechTrance/videos
24 episodes
All episodes
×Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.