Why Agents Are Stupid & What We Can Do About It With Dan Jeffries - #713 The TWIML AI Podcast (formerly This Week In Machine Learning & Artificial Intelligence) podcast

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

Artificial Intelligence Tech News Artificialintelligence Machinelearning Samcharrington Technology Thisweekinmachinelearning Sam Charrington Thetwimlaipocast Twimlaipodcast Tech News China TWIML Datascience Science

Player FM - Internet Radio Done Right

1,759 subscribers

Artificial Intelligence

Added seven years ago

Content provided by TWIML and Sam Charrington. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by TWIML and Sam Charrington or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

<

<div class="span index">1</div> <span><a class="" data-remote="true" data-type="html" href="/series/the-sarah-fraser-show-3599341">The Sarah Fraser Show</a></span>

<div class="span index">1</div> <span><a class="" data-remote="true" data-type="html" href="/series/the-sarah-fraser-show-3599341">The Sarah Fraser Show</a></span> podcast artwork

<div class="span index">1</div> <span><a class="" data-remote="true" data-type="html" href="/series/the-sarah-fraser-show-3599341">The Sarah Fraser Show</a></span> podcast artwork

1
The Sarah Fraser Show

15 hours ago15h ago

Daily

For twenty years, media personality Sarah Fraser has made a name for herself with her bold, hilarious, and totally unfiltered takes on everything from pop culture to parenting, life's messy struggles, and surviving the chaos of Hollywood. On her hit daily show, The Sarah Fraser Show, Sarah dives into the wildest corners of celebrity culture, interviewing and dissecting the most outrageous reality stars and offbeat personalities from Bravo, Sister Wives, 90 Day Fiance, and beyond. Nothing is off-limits, and you never know what’s coming next—but with Sarah, it’s guaranteed to be unforgettable. For advertising or collabs, reach out to thesarahfrasershow@gmail.com

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) « »
Why Agents Are Stupid & What We Can Do About It with Dan Jeffries - #713

about a year ago 1:08:49

Share

MP3•Episode home

Content provided by TWIML and Sam Charrington. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by TWIML and Sam Charrington or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Today, we're joined by Dan Jeffries, founder and CEO of Kentauros AI to discuss the challenges currently faced by those developing advanced AI agents. We dig into how Dan defines agents and distinguishes them from other similar uses of LLM, explore various use cases for them, and dig into ways to create smarter agentic systems. Dan shared his “big brain, little brain, tool brain” approach to tackling real-world challenges in agents, the trade-offs in leveraging general-purpose vs. task-specific models, and his take on LLM reasoning. We also cover the way he thinks about model selection for agents, along with the need for new tools and platforms for deploying them. Finally, Dan emphasizes the importance of open source in advancing AI, shares the new products they’re working on, and explores the future directions in the agentic era.

The complete show notes for this episode can be found at https://twimlai.com/go/713.

… continue reading

750 episodes

#Artificial Intelligence #Tech News #Artificialintelligence #Machinelearning #Samcharrington #Technology #Thisweekinmachinelearning #Sam Charrington #Thetwimlaipocast #Twimlaipodcast #Tech #News #China #TWIML #Datascience #Science

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

Why Agents Are Stupid & What We Can Do About It with Dan Jeffries - #713

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1,759 subscribers

published about a year ago

Share

MP3•Episode home

Content provided by TWIML and Sam Charrington. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by TWIML and Sam Charrington or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Today, we're joined by Dan Jeffries, founder and CEO of Kentauros AI to discuss the challenges currently faced by those developing advanced AI agents. We dig into how Dan defines agents and distinguishes them from other similar uses of LLM, explore various use cases for them, and dig into ways to create smarter agentic systems. Dan shared his “big brain, little brain, tool brain” approach to tackling real-world challenges in agents, the trade-offs in leveraging general-purpose vs. task-specific models, and his take on LLM reasoning. We also cover the way he thinks about model selection for agents, along with the need for new tools and platforms for deploying them. Finally, Dan emphasizes the importance of open source in advancing AI, shares the new products they’re working on, and explores the future directions in the agentic era.

The complete show notes for this episode can be found at https://twimlai.com/go/713.

… continue reading

750 episodes

#Artificial Intelligence #Tech News #Artificialintelligence #Machinelearning #Samcharrington #Technology #Thisweekinmachinelearning #Sam Charrington #Thetwimlaipocast #Twimlaipodcast #Tech #News #China #TWIML #Datascience #Science

All episodes

×

T

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

1
From Prompts to Policies: How RL Builds Better AI Agents with Mahesh Sathiamoorthy - #731 1:01:25

6 days ago1:01:25

1:01:25

Today, we're joined by Mahesh Sathiamoorthy, co-founder and CEO of Bespoke Labs, to discuss how reinforcement learning (RL) is reshaping the way we build custom agents on top of foundation models. Mahesh highlights the crucial role of data curation, evaluation, and error analysis in model performance, and explains why RL offers a more robust alternative to prompting, and how it can improve multi-step tool use capabilities. We also explore the limitations of supervised fine-tuning (SFT) for tool-augmented reasoning tasks, the reward-shaping strategies they’ve used, and Bespoke Labs’ open-source libraries like Curator. We also touch on the models MiniCheck for hallucination detection and MiniChart for chart-based QA. The complete show notes for this episode can be found at https://twimlai.com/go/731 .…

T

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

1
How OpenAI Builds AI Agents That Think and Act with Josh Tobin - #730 1:07:27

13 days ago1:07:27

1:07:27

Today, we're joined by Josh Tobin, member of technical staff at OpenAI, to discuss the company’s approach to building AI agents. We cover OpenAI's three agentic offerings—Deep Research for comprehensive web research, Operator for website navigation, and Codex CLI for local code execution. We explore OpenAI’s shift from simple LLM workflows to reasoning models specifically trained for multi-step tasks through reinforcement learning, and how that enables agents to more easily recover from failures while executing complex processes. Josh shares insights on the practical applications of these agents, including some unexpected use cases. We also discuss the future of human-AI collaboration in software development, such as with "vibe coding," the integration of tools through the Model Control Protocol (MCP), and the significance of context management in AI-enabled IDEs. Additionally, we highlight the challenges of ensuring trust and safety as AI agents become more powerful and autonomous. The complete show notes for this episode can be found at https://twimlai.com/go/730 .…

T

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

1
CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729 56:18

20 days ago56:18

56:18

Today, we're joined by Nidhi Rastogi, assistant professor at Rochester Institute of Technology to discuss Cyber Threat Intelligence (CTI), focusing on her recent project CTIBench—a benchmark for evaluating LLMs on real-world CTI tasks. Nidhi explains the evolution of AI in cybersecurity, from rule-based systems to LLMs that accelerate analysis by providing critical context for threat detection and defense. We dig into the advantages and challenges of using LLMs in CTI, how techniques like Retrieval-Augmented Generation (RAG) are essential for keeping LLMs up-to-date with emerging threats, and how CTIBench measures LLMs’ ability to perform a set of real-world tasks of the cybersecurity analyst. We unpack the process of building the benchmark, the tasks it covers, and key findings from benchmarking various LLMs. Finally, Nidhi shares the importance of benchmarks in exposing model limitations and blind spots, the challenges of large-scale benchmarking, and the future directions of her AI4Sec Research Lab, including developing reliable mitigation techniques, monitoring "concept drift" in threat detection models, improving explainability in cybersecurity, and more. The complete show notes for this episode can be found at https://twimlai.com/go/729 .…

T

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

1
Generative Benchmarking with Kelly Hong - #728 54:17

26 days ago54:17

54:17

In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on relevant content and generating queries that mimic actual user behavior. Kelly shares insights from applying this approach to Weights & Biases' technical support bot, revealing how domain-specific evaluation provides more accurate assessments of embedding model performance. We also discuss the importance of aligning LLM judges with human preferences, the impact of chunking strategies on retrieval effectiveness, and how production queries differ from benchmark queries in ambiguity and style. Throughout the episode, Kelly emphasizes the need for systematic evaluation approaches that go beyond "vibe checks" to help developers build more effective RAG applications. The complete show notes for this episode can be found at https://twimlai.com/go/728 .…

T

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

1
Exploring the Biology of LLMs with Circuit Tracing with Emmanuel Ameisen - #727 1:34:06

5 weeks ago1:34:06

1:34:06

In this episode, Emmanuel Ameisen, a research engineer at Anthropic, returns to discuss two recent papers: "Circuit Tracing: Revealing Language Model Computational Graphs" and "On the Biology of a Large Language Model." Emmanuel explains how his team developed mechanistic interpretability methods to understand the internal workings of Claude by replacing dense neural network components with sparse, interpretable alternatives. The conversation explores several fascinating discoveries about large language models, including how they plan ahead when writing poetry (selecting the rhyming word "rabbit" before crafting the sentence leading to it), perform mathematical calculations using unique algorithms, and process concepts across multiple languages using shared neural representations. Emmanuel details how the team can intervene in model behavior by manipulating specific neural pathways, revealing how concepts are distributed throughout the network's MLPs and attention mechanisms. The discussion highlights both capabilities and limitations of LLMs, showing how hallucinations occur through separate recognition and recall circuits, and demonstrates why chain-of-thought explanations aren't always faithful representations of the model's actual reasoning. This research ultimately supports Anthropic's safety strategy by providing a deeper understanding of how these AI systems actually work. The complete show notes for this episode can be found at https://twimlai.com/go/727 .…

T

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

1
Teaching LLMs to Self-Reflect with Reinforcement Learning with Maohao Shen - #726 51:45

6 weeks ago51:45

51:45

Today, we're joined by Maohao Shen, PhD student at MIT to discuss his paper, “Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search.” We dig into how Satori leverages reinforcement learning to improve language model reasoning—enabling model self-reflection, self-correction, and exploration of alternative solutions. We explore the Chain-of-Action-Thought (COAT) approach, which uses special tokens—continue, reflect, and explore—to guide the model through distinct reasoning actions, allowing it to navigate complex reasoning tasks without external supervision. We also break down Satori’s two-stage training process: format tuning, which teaches the model to understand and utilize the special action tokens, and reinforcement learning, which optimizes reasoning through trial-and-error self-improvement. We cover key techniques such “restart and explore,” which allows the model to self-correct and generalize beyond its training domain. Finally, Maohao reviews Satori’s performance and how it compares to other models, the reward design, the benchmarks used, and the surprising observations made during the research. The complete show notes for this episode can be found at https://twimlai.com/go/726 .…

T

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

1
Waymo's Foundation Model for Autonomous Driving with Drago Anguelov - #725 1:09:07

7 weeks ago1:09:07

1:09:07

Today, we're joined by Drago Anguelov, head of AI foundations at Waymo, for a deep dive into the role of foundation models in autonomous driving. Drago shares how Waymo is leveraging large-scale machine learning, including vision-language models and generative AI techniques to improve perception, planning, and simulation for its self-driving vehicles. The conversation explores the evolution of Waymo’s research stack, their custom “Waymo Foundation Model,” and how they’re incorporating multimodal sensor data like lidar, radar, and camera into advanced AI systems. Drago also discusses how Waymo ensures safety at scale with rigorous validation frameworks, predictive world models, and realistic simulation environments. Finally, we touch on the challenges of generalization across cities, freeway driving, end-to-end learning vs. modular architectures, and the future of AV testing through ML-powered simulation. The complete show notes for this episode can be found at https://twimlai.com/go/725 .…

T

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

1
Dynamic Token Merging for Efficient Byte-level Language Models with Julie Kallini - #724 50:32

8 weeks ago50:32

50:32

Today, we're joined by Julie Kallini, PhD student at Stanford University to discuss her recent papers, “MrT5: Dynamic Token Merging for Efficient Byte-level Language Models” and “Mission: Impossible Language Models.” For the MrT5 paper, we explore the importance and failings of tokenization in large language models—including inefficient compression rates for under-resourced languages—and dig into byte-level modeling as an alternative. We discuss the architecture of MrT5, its ability to learn language-specific compression rates, its performance on multilingual benchmarks and character-level manipulation tasks, and its performance and efficiency. For the “Mission: Impossible Language Models” paper, we review the core idea behind the research, the definition and creation of impossible languages, the creation of impossible language training datasets, and explore the bias of language model architectures towards natural language. The complete show notes for this episode can be found at https://twimlai.com/go/724 .…

T

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

1
Scaling Up Test-Time Compute with Latent Reasoning with Jonas Geiping - #723 58:38

9 weeks ago58:38

58:38

Today, we're joined by Jonas Geiping, research group leader at Ellis Institute and the Max Planck Institute for Intelligent Systems to discuss his recent paper, “Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach.” This paper proposes a novel language model architecture which uses recurrent depth to enable “thinking in latent space.” We dig into “internal reasoning” versus “verbalized reasoning”—analogous to non-verbalized and verbalized thinking in humans, and discuss how the model searches in latent space to predict the next token and dynamically allocates more compute based on token difficulty. We also explore how the recurrent depth architecture simplifies LLMs, the parallels to diffusion models, the model's performance on reasoning tasks, the challenges of comparing models with varying compute budgets, and architectural advantages such as zero-shot adaptive exits and natural speculative decoding. The complete show notes for this episode can be found at https://twimlai.com/go/723 .…

T

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

1
Imagine while Reasoning in Space: Multimodal Visualization-of-Thought with Chengzu Li - #722 42:11

10 weeks ago42:11

42:11

Today, we're joined by Chengzu Li, PhD student at the University of Cambridge to discuss his recent paper, “Imagine while Reasoning in Space: Multimodal Visualization-of-Thought.” We explore the motivations behind MVoT, its connection to prior work like TopViewRS, and its relation to cognitive science principles such as dual coding theory. We dig into the MVoT framework along with its various task environments—maze, mini-behavior, and frozen lake. We explore token discrepancy loss, a technique designed to align language and visual embeddings, ensuring accurate and meaningful visual representations. Additionally, we cover the data collection and training process, reasoning over relative spatial relations between different entities, and dynamic spatial reasoning. Lastly, Chengzu shares insights from experiments with MVoT, focusing on the lessons learned and the potential for applying these models in real-world scenarios like robotics and architectural design. The complete show notes for this episode can be found at https://twimlai.com/go/722 .…

T

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

1
Inside s1: An o1-Style Reasoning Model That Cost Under $50 to Train with Niklas Muennighoff - #721 49:29

11 weeks ago49:29

49:29

Today, we're joined by Niklas Muennighoff, a PhD student at Stanford University, to discuss his paper, “S1: Simple Test-Time Scaling.” We explore the motivations behind S1, as well as how it compares to OpenAI's O1 and DeepSeek's R1 models. We dig into the different approaches to test-time scaling, including parallel and sequential scaling, as well as S1’s data curation process, its training recipe, and its use of model distillation from Google Gemini and DeepSeek R1. We explore the novel "budget forcing" technique developed in the paper, allowing it to think longer for harder problems and optimize test-time compute for better performance. Additionally, we cover the evaluation benchmarks used, the comparison between supervised fine-tuning and reinforcement learning, and similar projects like the Hugging Face Open R1 project. Finally, we discuss the open-sourcing of S1 and its future directions. The complete show notes for this episode can be found at https://twimlai.com/go/721 .…

T

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

1
Accelerating AI Training and Inference with AWS Trainium2 with Ron Diamant - #720 1:07:05

12 weeks ago1:07:05

1:07:05

Today, we're joined by Ron Diamant, chief architect for Trainium at Amazon Web Services, to discuss hardware acceleration for generative AI and the design and role of the recently released Trainium2 chip. We explore the architectural differences between Trainium and GPUs, highlighting its systolic array-based compute design, and how it balances performance across key dimensions like compute, memory bandwidth, memory capacity, and network bandwidth. We also discuss the Trainium tooling ecosystem including the Neuron SDK, Neuron Compiler, and Neuron Kernel Interface (NKI). We also dig into the various ways Trainum2 is offered, including Trn2 instances, UltraServers, and UltraClusters, and access through managed services like AWS Bedrock. Finally, we cover sparsity optimizations, customer adoption, performance benchmarks, support for Mixture of Experts (MoE) models, and what’s next for Trainium. The complete show notes for this episode can be found at https://twimlai.com/go/720 .…

T

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

1
π0: A Foundation Model for Robotics with Sergey Levine - #719 52:30

13 weeks ago52:30

52:30

Today, we're joined by Sergey Levine, associate professor at UC Berkeley and co-founder of Physical Intelligence, to discuss π0 (pi-zero), a general-purpose robotic foundation model. We dig into the model architecture, which pairs a vision language model (VLM) with a diffusion-based action expert, and the model training "recipe," emphasizing the roles of pre-training and post-training with a diverse mixture of real-world data to ensure robust and intelligent robot learning. We review the data collection approach, which uses human operators and teleoperation rigs, the potential of synthetic data and reinforcement learning in enhancing robotic capabilities, and much more. We also introduce the team’s new FAST tokenizer, which opens the door to a fully Transformer-based model and significant improvements in learning and generalization. Finally, we cover the open-sourcing of π0 and future directions for their research. The complete show notes for this episode can be found at https://twimlai.com/go/719 .…

T

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

1
AI Trends 2025: AI Agents and Multi-Agent Systems with Victor Dibia - #718 1:44:59

14 weeks ago1:44:59

1:44:59

Today we’re joined by Victor Dibia, principal research software engineer at Microsoft Research, to explore the key trends and advancements in AI agents and multi-agent systems shaping 2025 and beyond. In this episode, we discuss the unique abilities that set AI agents apart from traditional software systems–reasoning, acting, communicating, and adapting. We also examine the rise of agentic foundation models, the emergence of interface agents like Claude with Computer Use and OpenAI Operator, the shift from simple task chains to complex workflows, and the growing range of enterprise use cases. Victor shares insights into emerging design patterns for autonomous multi-agent systems, including graph and message-driven architectures, the advantages of the “actor model” pattern as implemented in Microsoft’s AutoGen, and guidance on how users should approach the ”build vs. buy” decision when working with AI agent frameworks. We also address the challenges of evaluating end-to-end agent performance, the complexities of benchmarking agentic systems, and the implications of our reliance on LLMs as judges. Finally, we look ahead to the future of AI agents in 2025 and beyond, discuss emerging HCI challenges, their potential for impact on the workforce, and how they are poised to reshape fields like software engineering. The complete show notes for this episode can be found at https://twimlai.com/go/718 .…

T

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) podcast artwork

1
Speculative Decoding and Efficient LLM Inference with Chris Lott - #717 1:16:30

15 weeks ago1:16:30

1:16:30

Today, we're joined by Chris Lott, senior director of engineering at Qualcomm AI Research to discuss accelerating large language model inference. We explore the challenges presented by the LLM encoding and decoding (aka generation) and how these interact with various hardware constraints such as FLOPS, memory footprint and memory bandwidth to limit key inference metrics such as time-to-first-token, tokens per second, and tokens per joule. We then dig into a variety of techniques that can be used to accelerate inference such as KV compression, quantization, pruning, speculative decoding, and leveraging small language models (SLMs). We also discuss future directions for enabling on-device agentic experiences such as parallel generation and software tools like Qualcomm AI Orchestrator. The complete show notes for this episode can be found at https://twimlai.com/go/717 .…

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

Listen to 500+ topics

Quick Reference Guide

Top Podcasts

The Bill Simmons Podcast

Comedy of the Week

How Did This Get Made?

Doug Loves Movies

TED Talks Daily

NBC Nightly News with Lester Holt

The World This Hour

Daily Boost Motivation and Coaching

This American Life

Sword and Scale

Help/FAQ | Upgrade | Advertise

Arts|Business|Comedy|Economics|Entertainment|News|Politics|Religion

Science|Soccer|Sports|Storytelling|Technology|True Crime

Copyright 2025 | Sitemap | Privacy Policy | Terms of Service | | Copyright

Listen to this show while you explore