25 subscribers
Go offline with the Player FM app!
Podcasts Worth a Listen
SPONSORED


1 Gene Baur: Confronting the Morality of Factory Farming 28:16
Episode 45: Your AI application is broken. Here’s what to do about it.
Manage episode 467694275 series 3317544
Too many teams are building AI applications without truly understanding why their models fail. Instead of jumping straight to LLM evaluations, dashboards, or vibe checks, how do you actually fix a broken AI app?
In this episode, Hugo speaks with Hamel Husain, longtime ML engineer, open-source contributor, and consultant, about why debugging generative AI systems starts with looking at your data.
In this episode, we dive into:
- Why “look at your data” is the best debugging advice no one follows.
- How spreadsheet-based error analysis can uncover failure modes faster than complex dashboards.
- The role of synthetic data in bootstrapping evaluation.
- When to trust LLM judges—and when they’re misleading.
- Why most AI dashboards measuring truthfulness, helpfulness, and conciseness are often a waste of time.
If you're building AI-powered applications, this episode will change how you approach debugging, iteration, and improving model performance in production.
LINKS
56 episodes
Manage episode 467694275 series 3317544
Too many teams are building AI applications without truly understanding why their models fail. Instead of jumping straight to LLM evaluations, dashboards, or vibe checks, how do you actually fix a broken AI app?
In this episode, Hugo speaks with Hamel Husain, longtime ML engineer, open-source contributor, and consultant, about why debugging generative AI systems starts with looking at your data.
In this episode, we dive into:
- Why “look at your data” is the best debugging advice no one follows.
- How spreadsheet-based error analysis can uncover failure modes faster than complex dashboards.
- The role of synthetic data in bootstrapping evaluation.
- When to trust LLM judges—and when they’re misleading.
- Why most AI dashboards measuring truthfulness, helpfulness, and conciseness are often a waste of time.
If you're building AI-powered applications, this episode will change how you approach debugging, iteration, and improving model performance in production.
LINKS
56 episodes
All episodes
×
1 Episode 56: DeepMind Just Dropped Gemma 270M... And Here’s Why It Matters 45:40

1 Episode 55: From Frittatas to Production LLMs: Breakfast at SciPy 38:08

1 Episode 54: Scaling AI: From Colab to Clusters — A Practitioner’s Guide to Distributed Training and Inference 41:17

1 Episode 53: Human-Seeded Evals & Self-Tuning Agents: Samuel Colvin on Shipping Reliable LLMs 44:49

1 Episode 52: Why Most LLM Products Break at Retrieval (And How to Fix Them) 28:38

1 Episode 51: Why We Built an MCP Server and What Broke First 47:41

1 Episode 50: A Field Guide to Rapidly Improving AI Products -- With Hamel Husain 27:42

1 Episode 49: Why Data and AI Still Break at Scale (and What to Do About It) 1:21:45

1 Episode 48: HOW TO BENCHMARK AGI WITH GREG KAMRADT 1:04:25

1 Episode 47: The Great Pacific Garbage Patch of Code Slop with Joe Reis 1:19:12

1 Episode 46: Software Composition Is the New Vibe Coding 1:08:57

1 Episode 45: Your AI application is broken. Here’s what to do about it. 1:17:30

1 Episode 44: The Future of AI Coding Assistants: Who’s Really in Control? 1:34:11

1 Episode 43: Tales from 400+ LLM Deployments: Building Reliable AI Agents in Production 1:01:03

1 Episode 42: Learning, Teaching, and Building in the Age of AI 1:20:03
Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.