Best Llm Evaluation Podcasts (2025)

1
How Good Is It, Really? - A Guide to LLM Evaluation 7:45

8d ago7:45

7:45

In the season finale of "All Things LLM," hosts Alex and Ben turn to one of the most important—and challenging—topics in AI: How do we objectively evaluate the quality and reliability of a language model? With so many models, benchmarks, and metrics, what actually counts as “good”? In this episode, you’ll discover: The evolution of LLM evaluation: …

1
Inside Nano Banana 🍌 and the Future of Vision-Language Models with Oliver Wang - #748 1:03:39

4d ago1:03:39

1:03:39

Today, we’re joined by Oliver Wang, principal scientist at Google DeepMind and tech lead for Gemini 2.5 Flash Image—better known by its code name, “Nano Banana.” We dive into the development and capabilities of this newly released frontier vision-language model, beginning with the broader shift from specialized image generators to general-purpose m…

1
#738: AWS News: Global Cross-Region Inference, Aurora Limitless and lots more. 25:01

2d ago25:01

25:01

Simon and Jillian keep you up to date with all the latest releases and capabilities!By Amazon Web Services

1
From Prediction to Action - Autonomous Agents and LAMs 8:39

8d ago8:39

8:39

What happens when AI not only understands the world, but acts in it? In this trailblazing episode of "All Things LLM," Alex and Ben chart the rise of next-generation AI: autonomous agents and Large Action Models (LAMs). Discover how LLMs are evolving from passive text generators to powerful doers—reshaping workflows, business automation, and the ve…

1
Prompt Injections and Data Poisoning - Securing Your LLM 8:38

8d ago8:38

8:38

As LLMs power more business workflows, security risks grow. In this essential episode of "All Things LLM," hosts Alex and Ben break down the new wave of cybersecurity threats targeting language models—and what you can do to defend your AI infrastructure. What you’ll learn: The OWASP Top 10 for LLMs: Explore the most pressing LLM security risks and …

1
More Then Words - The Rise of Multimodal LLMs 7:42

8d ago7:42

7:42

AI’s next great leap isn’t about bigger models—it’s about broader senses. In this season premiere of "All Things LLM," Alex and Ben explore the revolutionary world of multimodal large language models (LLMs)—the new frontier where AI can “see,” “hear,” and “understand” the world far beyond text. In this episode: Journey to Multimodality: Discover wh…

1
The Paradigm Shift & The Black Box: Reasoning Models and the Quest for Understanding 11:38

8d ago11:38

11:38

In the grand finale of "All Things LLM," hosts Alex and Ben look ahead to the bleeding edge—and reflect on the ultimate question for AI: can we ever truly understand how these models think? Inside this episode: The rise of reasoning models: Discover why the next leap for AI isn’t just bigger models, but smarter thinking. Explore how OpenAI’s o1 and…

1
Hallucinations, Bias, and Black Boxes - The Biggest Challenges 9:04

8d ago9:04

9:04

Powerful language models are reshaping the world, but serious challenges remain. In this revealing episode of "All Things LLM," hosts Alex and AI expert Ben tackle the core limitations and ethical risks facing all large language models—open or closed. This episode covers: Hallucinations: Why LLMs make up plausible-sounding but false or misleading a…

1
The Human Touch - How RLHF Aligns Models with Our Values 5:54

8d ago5:54

5:54

How do we make AI not just smart, but safe and genuinely helpful? In this episode of "All Things LLM," Alex and Ben break down the vital process of alignment—transforming a powerful language model into a trustworthy assistant you can rely on. Inside this episode: What is RLHF? Discover Reinforcement Learning from Human Feedback—the multi-stage proc…

1
Open vs. Closed - The Great Model Debate 9:00

8d ago9:00

9:00

Season 4 of "All Things LLM" kicks off with one of the most crucial debates in AI today: open-source vs. closed-source (proprietary) language models. Hosts Alex and Ben cut through the hype to explain what’s at stake for businesses, developers, and the entire AI ecosystem. In this episode, you’ll discover: The fundamentals: What truly sets open-sou…

1
Cooking with Terabytes - Training an LLM from Scratch 7:33

8d ago7:33

7:33

Get an insider’s look behind the curtain of modern AI with this episode of "All Things LLM." Join hosts Alex and AI expert Ben as they reveal the colossal effort, expense, and ingenuity required to take a language model from “blank slate” to foundational intelligence. What you’ll learn: The massive scale of LLM training: how developers assemble and…

1
Making It Yours - Fine-Tuning, PEFT, and RAG 7:26

8d ago7:26

7:26

Discover how generalist AIs become powerful specialists in this episode of "All Things LLM." Hosts Alex and AI expert Ben break down the next stage of the LLM lifecycle—customization—and unpack the practical techniques that transform foundation models into domain experts or business-ready assistants. Learn about: Fine-Tuning: Why it’s essential for…

1
The Magic of "Attention" - How LLMs Understand Context 5:39

8d ago5:39

5:39

Unlock the key to modern AI with this deep-dive episode of "All Things LLM"! Hosts Alex and our resident AI expert Ben unpack the “self-attention mechanism”—the heart of every powerful Transformer model powering GPT, Llama, Gemini, and more. Discover: What “self-attention” actually means in the context of language models—and why it’s a game-changer…

1
The Art of the Ask - A Practical Guide to Prompt Engineering 6:11

8d ago6:11

6:11

Unlock the full potential of large language models with this hands-on episode of "All Things LLM." Hosts Alex and AI expert Ben break down the essential (and rapidly evolving) discipline of prompt engineering—your steering wheel for directing AI toward more relevant, accurate, and actionable outputs. What you’ll learn: Prompt Engineering 101: Why c…

1
From Turing to Transformers - A Brief History of Language AI 6:52

8d ago6:52

6:52

Curious how AI language models like ChatGPT burst into the mainstream? Welcome to "All Things LLM," where hosts Alex and AI expert Ben unravel the true origins and evolution of Large Language Models. In this episode, we journey through more than a century of discoveries that paved the way for today’s groundbreaking AI. Discover: The surprising root…

1
The Blueprint - Encoders, Decoders, and Model Families 5:42

8d ago5:42

5:42

Unlock the mysteries of modern AI with "All Things LLM." In this episode, Alex and Ben break down the Transformer—the revolutionary engine powering today’s Large Language Models (LLMs) like GPT-4, Llama, and Gemini. If you’ve ever wondered how AI can both understand and generate text, this deep dive into Transformer architecture is your essential g…

1
"Hello, World!" - What Exactly Is an LLM? 6:06

8d ago6:06

6:06

Step into the rapidly evolving world of artificial intelligence with the premiere of "All Things LLM." In this debut episode, hosts Alex and AI expert Ben break down one of today's most talked-about technologies: the Large Language Model (LLM). Are you curious about terms like ChatGPT, generative AI, or conversational AI—but not sure where to start…

1
Is It Time to Rethink LLM Pre-Training? with Aditi Raghunathan - #747 58:26

11d ago58:26

58:26

Today, we're joined by Aditi Raghunathan, assistant professor at Carnegie Mellon University, to discuss the limitations of LLMs and how we can build more adaptable and creative models. We dig into her ICML 2025 Outstanding Paper Award winner, “Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction,” which ex…

1
#737: Accelerate your GenAI innovation journey on AWS with Innovation Sandbox solution 31:50

9d ago31:50

31:50

In this episode, we will dive deep into Innovation Sandbox on AWS, a new AWS solution offering that transforms the management of temporary sandbox environments, by offering a ready-made solution that enables customers to reduce sandbox setup time from weeks to hours while automating spend controls, security policies, and usage monitoring. Learn how…

1
Building an Immune System for AI Generated Software with Animesh Koratana - #746 1:05:11

17d ago1:05:11

1:05:11

Today, we're joined by Animesh Koratana, founder and CEO of PlayerZero to discuss his team’s approach to making agentic and AI-assisted coding tools production-ready at scale. Animesh explains how rapid advances in AI-assisted coding have created an “asymmetry” where the speed of code output outpaces the maturity of processes for maintenance and su…

1
#736: AWS News: New Amazon Bedrock APIs, New EC2 Instance Types and Lots More. 22:44

16d ago22:44

22:44

Simon takes you through a big list of cool new things - something for everyone.By Amazon Web Services

1
Autoformalization and Verifiable Superintelligence with Christian Szegedy - #745 1:11:48

25d ago1:11:48

1:11:48

In this episode, Christian Szegedy, Chief Scientist at Morph Labs, joins us to discuss how the application of formal mathematics and reasoning enables the creation of more robust and safer AI systems. A pioneer behind concepts like the Inception architecture and adversarial examples, Christian now focuses on autoformalization—the AI-driven process …

1
#735: The Frugal Architect w/ Werner Vogels: Zillow's Chief Architect on why cheap ≠ frugal 41:47

26d ago41:47

41:47

Frugality wasn't something Craig Link learned on the job, it was passed down from his father, who would calculate the cost-benefit of driving for cheaper gas and meticulously track every tank's miles per gallon in a worn notebook tucked into the glove box. He would also pack sandwiches, toss them in a cooler, and store them in the back seat. These …

1
Multimodal AI Models on Apple Silicon with MLX with Prince Canuma - #744 1:10:20

1M ago1:10:20

1:10:20

Today, we're joined by Prince Canuma, an ML engineer and open-source developer focused on optimizing AI inference on Apple Silicon devices. Prince shares his journey to becoming one of the most prolific contributors to Apple’s MLX ecosystem, having published over 1,000 models and libraries that make open, multimodal AI accessible and performant on …

1
#734: AWS News: OpenAI, Amazon Elastic VMware Service, and Lots More. 42:53

1M ago42:53

42:53

A bumper crop of new and improved things for you to take advantage of.By Amazon Web Services

1
Genie 3: A New Frontier for World Models with Jack Parker-Holder and Shlomi Fruchter - #743 1:01:01

1M ago1:01:01

1:01:01

Today, we're joined by Jack Parker-Holder and Shlomi Fruchter, researchers at Google DeepMind, to discuss the recent release of Genie 3, a model capable of generating “playable” virtual worlds. We dig into the evolution of the Genie project and review the current model’s scaled-up capabilities, including creating real-time, interactive, and high-re…

1
#733: Amazon Connect - So Many Cool New Capabilities For You to Use! 33:05

1M ago33:05

33:05

In this episode of the AWS Podcast, we explore the evolving world of contact centers and Amazon Connect. The discussion covers why contact centers remain critical to both business and public sector operations, and how they're transforming from traditional cost centers into valuable sources of business intelligence. Key highlights include Amazon Con…

1
Closing the Loop Between AI Training and Inference with Lin Qiao - #742 1:01:11

2M ago1:01:11

1:01:11

In this episode, we're joined by Lin Qiao, CEO and co-founder of Fireworks AI. Drawing on key lessons from her time building PyTorch, Lin shares her perspective on the modern generative AI development lifecycle. She explains why aligning training and inference systems is essential for creating a seamless, fast-moving production pipeline, preventing…

1
46 - Tom Davidson on AI-enabled Coups 2:05:26

2M ago2:05:26

2:05:26

Could AI enable a small group to gain power over a large country, and lock in their power permanently? Often, people worried about catastrophic risks from AI have been concerned with misalignment risks. In this episode, Tom Davidson talks about a risk that could be comparably important: that of AI-enabled coups. Patreon: https://www.patreon.com/axr…

1
#732: How to gain Multi-Cluster Visibility across Kubernetes Clusters with the EKS Dashboard 24:53

2M ago24:53

24:53

In this episode, we'll explore how the new Amazon EKS Dashboard solves key challenges in managing Kubernetes at scale across multiple AWS accounts and regions. We'll discuss how it provides centralized visibility into cluster health, versions, and costs - enabling teams to improve governance, streamline operations, and optimize their Kubernetes inf…

1
Context Engineering for Productive AI Agents with Filip Kozera - #741 46:01

2M ago46:01

46:01

In this episode, Filip Kozera, founder and CEO of Wordware, explains his approach to building agentic workflows where natural language serves as the new programming interface. Filip breaks down the architecture of these "background agents," explaining how they use a reflection loop and tool-calling to execute complex tasks. He discusses the current…

1
#731: AWS News: Kiro, Amazon Bedrock AgentCore, and Lots More 31:43

2M ago31:43

31:43

Simon and Jillian take you on a fast paced update of all things new on AWS!By Amazon Web Services

1
Infrastructure Scaling and Compound AI Systems with Jared Quincy Davis - #740 1:13:02

2M ago1:13:02

1:13:02

In this episode, Jared Quincy Davis, founder and CEO at Foundry, introduces the concept of "compound AI systems," which allows users to create powerful, efficient applications by composing multiple, often diverse, AI models and services. We discuss how these "networks of networks" can push the Pareto frontier, delivering results that are simultaneo…

1
#730: The Frugal Architect w/ Werner Vogels: At Too Good To Go, Practical Engineering Keeps Food Out of the Bin 36:16

2M ago36:16

36:16

In the fifth episode of "The Frugal Architect" podcast, Werner and co-host Simon Elisha welcome Morten Keldebaek (CTO) and Robert Hjertmann from Too Good To Go. Too Good To Go is the world's largest marketplace for surplus food, connecting consumers with restaurants, cafes, and grocery stores to rescue food that would otherwise go to waste. The dis…

1
Building Voice AI Agents That Don’t Suck with Kwindla Kramer - #739 1:13:02

2M ago1:13:02

1:13:02

In this episode, Kwindla Kramer, co-founder and CEO of Daily and creator of the open source Pipecat framework, joins us to discuss the architecture and challenges of building real-time, production-ready conversational voice AI. Kwin breaks down the full stack for voice agents—from the models and APIs to the critical orchestration layer that manages…

1
#729: AWS News: Aurora Storage Upgrades, DynamoDB Multi-Region Strong Consistency, and More 41:32

2M ago41:32

41:32

There are over 60 new updates that your hosts Simon, Jillian and Shruthi take you through this week!By Amazon Web Services

1
Distilling Transformers and Diffusion Models for Robust Edge Use Cases with Fatih Porikli - #738 1:00:29

3M ago1:00:29

1:00:29

Today, we're joined by Fatih Porikli, senior director of technology at Qualcomm AI Research for an in-depth look at several of Qualcomm's accepted papers and demos featured at this year’s CVPR conference. We start with “DiMA: Distilling Multi-modal Large Language Models for Autonomous Driving,” an end-to-end autonomous driving system that incorpora…

1
#728: The Duck Talks Back - Using GENAI in Your Work 23:15

3M ago23:15

23:15

Simon shares some tips, tricks, and experiences in how Builders can use GENAI in their work to get things done faster, and with better outcomes.Links:Amazon Q Developer CLI: https://github.com/aws/amazon-q-developer-cliPromptz: https://www.promptz.dev/AWS MCP Servers: https://github.com/awslabs/mcpBy Amazon Web Services

1
45 - Samuel Albanie on DeepMind's AGI Safety Approach 1:15:42

3M ago1:15:42

1:15:42

In this episode, I chat with Samuel Albanie about the Google DeepMind paper he co-authored called "An Approach to Technical AGI Safety and Security". It covers the assumptions made by the approach, as well as the types of mitigations it outlines. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast Transcript: https://axr…

1
#727: AWS News: AWS Shield Network Security Director, Amazon GuardDuty for EKS, and more 34:48

3M ago34:48

34:48

Simon and Jillian take you through all the big security announcements from AWS re:Inforce plus a host of cool new features and price reductions!By Amazon Web Services

1
44 - Peter Salib on AI Rights for Human Safety 3:21:33

3M ago3:21:33

3:21:33

In this episode, I talk with Peter Salib about his paper "AI Rights for Human Safety", arguing that giving AIs the right to contract, hold property, and sue people will reduce the risk of their trying to attack humanity and take over. He also tells me how law reviews work, in the face of my incredulity. Patreon: https://www.patreon.com/axrpodcast K…

1
Building the Internet of Agents with Vijoy Pandey - #737 56:13

3M ago56:13

56:13

Today, we're joined by Vijoy Pandey, SVP and general manager at Outshift by Cisco to discuss a foundational challenge for the enterprise: how do we make specialized agents from different vendors collaborate effectively? As companies like Salesforce, Workday, and Microsoft all develop their own agentic systems, integrating them creates a complex, pr…

1
#726: Single region, zero excuses: Mastering AWS resilience 45:33

3M ago45:33

45:33

Dive into single-region resilience with AWS experts John Formento and Tarik Makota as they debunk common misconceptions and share practical strategies for building robust applications. Learn why multi-AZ isn't enough on its own, discover key AWS services for resilience, and get actionable tips for identifying and mitigating failure modes.Learn More…

1
Revolutionizing Computer Vision: OpenFilter with Andrew Smith 30:27

3M ago30:27

30:27

Andrew Smith (Github: @asmith-plainsight) dives deep into OpenFilter, the open-source framework for building computer vision workflows. Andrew is the CTO of Plainsight which is a leader in modern computer vision infrastructure. Tune in to find out how OpenFilter is simplifying and revolutionizing computer vision applications. Subscribe to Contribut…

1
LLMs for Equities Feature Forecasting at Two Sigma with Ben Wellington - #736 59:31

3M ago59:31

59:31

Today, we're joined by Ben Wellington, deputy head of feature forecasting at Two Sigma. We dig into the team’s end-to-end approach to leveraging AI in equities feature forecasting, covering how they identify and create features, collect and quantify historical data, and build predictive models to forecast market behavior and asset prices for tradin…

1
#725: AWS News: FSx for Lustre introduces cost-saving storage tiers, MCP servers enhance AI development tools, and more 39:59

3M ago39:59

39:59

Explore FSx for Lustre's new intelligent storage tiering that delivers cost savings and unlimited scalability for file storage in the cloud. Plus, discover how the new Model Context Protocol (MCP) servers are revolutionizing AI-assisted development across ECS, EKS, and serverless platforms with real-time contextual responses and automated resource …

1
43 - David Lindner on Myopic Optimization with Non-myopic Approval 1:40:59

3M ago1:40:59

1:40:59

In this episode, I talk with David Lindner about Myopic Optimization with Non-myopic Approval, or MONA, which attempts to address (multi-step) reward hacking by myopically optimizing actions against a human's sense of whether those actions are generally good. Does this work? Can we get smarter-than-human AI this way? How does this compare to approa…

1
Zero-Shot Auto-Labeling: The End of Annotation for Computer Vision with Jason Corso - #735 56:45

4M ago56:45

56:45

Today, we're joined by Jason Corso, co-founder of Voxel51 and professor at the University of Michigan, to explore automated labeling in computer vision. Jason introduces FiftyOne, an open-source platform for visualizing datasets, analyzing models, and improving data quality. We focus on Voxel51’s recent research report, “Zero-shot auto-labeling riv…

1
#724: Accelerated computing: From fraud detection to AI innovation 41:25

4M ago41:25

41:25

Join host Shruthi to discover how organizations use GPU-accelerated computing on AWS. Container Specialist Re Alvarez Parmar shows how Rivian optimizes GPU usage for autonomous vehicles with Amazon EKS. AWS Financial Services expert Sudhir Kalidindi explains real-time fraud detection processing 100B+ events annually. Learn architectural patterns an…

1
42 - Owain Evans on LLM Psychology 2:14:26

4M ago2:14:26

2:14:26

Earlier this year, the paper "Emergent Misalignment" made the rounds on AI x-risk social media for seemingly showing LLMs generalizing from 'misaligned' training data of insecure code to acting comically evil in response to innocuous questions. In this episode, I chat with one of the authors of that paper, Owain Evans, about that research as well a…

Podcasts Worth a Listen

Llm Evaluation Podcasts

Podcasts Worth a Listen

Quick Reference Guide