Go offline with the Player FM app!
Deep Research by OpenAI - The Ups and Downs vs DeepSeek R1 Search + Gemini Deep Research
Manage episode 464835574 series 3611272
12 hours ago Deep Research was unveiled, and I’ve tested it thoroughly, including vs Deepseek R1 with search, Gemini Deep Research and even R1 in Perplexity. It’s a notable step forward, with one big caveat. I’ll go through all the benchmark figures, my initial impression of the o3 model within, and much more.
Deep Research: https://openai.com/index/introducing-deep-research/
https://www.youtube.com/watch?v=YkCDVn3_wiw
GAIA Bench: https://openreview.net/forum?id=fibxvahvs3
https://openreview.net/pdf?id=fibxvahvs3
CodeELO:https://arxiv.org/pdf/2501.01257
CamelCamel:https://uk.camelcamelcamel.com/
Deepseek R1 with search: https://chat.deepseek.com/
https://arxiv.org/pdf/2501.12948
HaluBench: https://arxiv.org/pdf/2407.08488
Chapters:
00:00 - Introduction
01:06 - Powered by o3, Humanity’s Last Exam, GAIA
03:55 - Simple Tests
06:00 - Good News vs Deepseek R1 and Gemini Deep Research
09:32 - Bad News on Hallucinations
14:14 - What Can’t it Browse?
14:42 - For Shopping?
16:40 - Final thoughts
24 episodes
Manage episode 464835574 series 3611272
12 hours ago Deep Research was unveiled, and I’ve tested it thoroughly, including vs Deepseek R1 with search, Gemini Deep Research and even R1 in Perplexity. It’s a notable step forward, with one big caveat. I’ll go through all the benchmark figures, my initial impression of the o3 model within, and much more.
Deep Research: https://openai.com/index/introducing-deep-research/
https://www.youtube.com/watch?v=YkCDVn3_wiw
GAIA Bench: https://openreview.net/forum?id=fibxvahvs3
https://openreview.net/pdf?id=fibxvahvs3
CodeELO:https://arxiv.org/pdf/2501.01257
CamelCamel:https://uk.camelcamelcamel.com/
Deepseek R1 with search: https://chat.deepseek.com/
https://arxiv.org/pdf/2501.12948
HaluBench: https://arxiv.org/pdf/2407.08488
Chapters:
00:00 - Introduction
01:06 - Powered by o3, Humanity’s Last Exam, GAIA
03:55 - Simple Tests
06:00 - Good News vs Deepseek R1 and Gemini Deep Research
09:32 - Bad News on Hallucinations
14:14 - What Can’t it Browse?
14:42 - For Shopping?
16:40 - Final thoughts
24 episodes
All episodes
×Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.