23 subscribers
Go offline with the Player FM app!
Podcasts Worth a Listen
SPONSORED
How DeepSeek is Pushing the Boundaries of AI Development
Manage episode 467851709 series 3448051
This week, we dive into DeepSeek. SallyAnn DeLucia, Product Manager at Arize, and Nick Luzio, a Solutions Engineer, break down key insights on a model that have dominating headlines for its significant breakthrough in inference speed over other models. What’s next for AI (and open source)? From training strategies to real-world performance, here’s what you need to know.
Read a summary: https://arize.com/blog/how-deepseek-is-pushing-the-boundaries-of-ai-development/
Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
49 episodes
Manage episode 467851709 series 3448051
This week, we dive into DeepSeek. SallyAnn DeLucia, Product Manager at Arize, and Nick Luzio, a Solutions Engineer, break down key insights on a model that have dominating headlines for its significant breakthrough in inference speed over other models. What’s next for AI (and open source)? From training strategies to real-world performance, here’s what you need to know.
Read a summary: https://arize.com/blog/how-deepseek-is-pushing-the-boundaries-of-ai-development/
Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
49 episodes
All episodes
×
1 Scalable Chain of Thoughts via Elastic Reasoning 28:54

1 Sleep-time Compute: Beyond Inference Scaling at Test-time 30:24

1 LibreEval: The Largest Open Source Benchmark for RAG Hallucination Detection 27:19

1 AI Benchmark Deep Dive: Gemini 2.5 and Humanity's Last Exam 26:11

1 AI Roundup: DeepSeek’s Big Moves, Claude 3.7, and the Latest Breakthroughs 30:23

1 How DeepSeek is Pushing the Boundaries of AI Development 29:54

1 Multiagent Finetuning: A Conversation with Researcher Yilun Du 30:03

1 Training Large Language Models to Reason in Continuous Latent Space 24:58

1 LLMs as Judges: A Comprehensive Survey on LLM-Based Evaluation Methods 28:57

1 Merge, Ensemble, and Cooperate! A Survey on Collaborative LLM Strategies 28:47

1 Agent-as-a-Judge: Evaluate Agents with Agents 24:54


1 Swarm: OpenAI's Experimental Approach to Multi-Agent Systems 46:46

1 The Shrek Sampler: How Entropy-Based Sampling is Revolutionizing LLMs 3:31

1 Google's NotebookLM and the Future of AI-Generated Audio 43:28


1 Breaking Down Reflection Tuning: Enhancing LLM Performance with Self-Learning 26:54

1 Composable Interventions for Language Models 42:35

1 Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges 39:05

1 Breaking Down Meta's Llama 3 Herd of Models 44:40

1 DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines 33:57

1 RAFT: Adapting Language Model to Domain Specific RAG 44:01

1 LLM Interpretability and Sparse Autoencoders: Research from OpenAI and Anthropic 44:00

1 Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models' Alignment 48:07

1 Breaking Down EvalGen: Who Validates the Validators? 44:47

1 Keys To Understanding ReAct: Synergizing Reasoning and Acting in Language Models 45:07

1 Demystifying Chronos: Learning the Language of Time Series 44:40
Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.