Building More Efficient AI With VLLM Ft. Nick Hill Technically Speaking With Chris Wright podcast

Artwork

Tech Business Red Hat Emerging Technologies Machine Learning Chris Wright Hybrid Clouds Data Development Security Linux Software Cloud Computing Enterprise Technology Artificial Intelligence Programming Coding Careers Technology

Content provided by Red Hat. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Red Hat or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Technically Speaking with Chris Wright »
Building more efficient AI with vLLM ft. Nick Hill

1d ago 20:52

Share

MP3•Episode home

Content provided by Red Hat. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Red Hat or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Explore what it takes to run massive language models efficiently with Red Hat's Senior Principal Software Engineer in AI Engineering, Nick Hill. In this episode, we go behind the headlines to uncover the systems-level engineering making AI practical, focusing on the pivotal challenge of inference optimization and the transformative power of the vLLM open-source project. Nick Hill shares his experiences working in AI including: • The evolution of AI optimization, from early handcrafted systems like IBM Watson to the complex demands of today's generative AI. • The critical role of open-source projects like vLLM in creating a common, efficient inference stack for diverse hardware platforms. • Key innovations like PagedAttention that solve GPU memory fragmentation and manage the KV cache for scalable, high-throughput performance. • How the open-source community is rapidly translating academic research into real-world, production-ready solutions for AI. Join us to explore the infrastructure and optimization strategies making large-scale AI a reality. This conversation is essential for any technologist, engineer, or leader who wants to understand the how and why of AI performance. You’ll come away with a new appreciation for the clever, systems-level work required to build a truly scalable and open AI future.

… continue reading

3 episodes

#Tech #Business #Red Hat #Emerging Technologies #Machine Learning #Chris Wright #Hybrid Clouds #Data #Development #Security #Linux #Software #Cloud Computing #Enterprise Technology #Artificial Intelligence #Programming #Coding Careers #Technology

Artwork

Building more efficient AI with vLLM ft. Nick Hill

Technically Speaking with Chris Wright

published 1d ago

Share

MP3•Episode home

Content provided by Red Hat. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Red Hat or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Explore what it takes to run massive language models efficiently with Red Hat's Senior Principal Software Engineer in AI Engineering, Nick Hill. In this episode, we go behind the headlines to uncover the systems-level engineering making AI practical, focusing on the pivotal challenge of inference optimization and the transformative power of the vLLM open-source project. Nick Hill shares his experiences working in AI including: • The evolution of AI optimization, from early handcrafted systems like IBM Watson to the complex demands of today's generative AI. • The critical role of open-source projects like vLLM in creating a common, efficient inference stack for diverse hardware platforms. • Key innovations like PagedAttention that solve GPU memory fragmentation and manage the KV cache for scalable, high-throughput performance. • How the open-source community is rapidly translating academic research into real-world, production-ready solutions for AI. Join us to explore the infrastructure and optimization strategies making large-scale AI a reality. This conversation is essential for any technologist, engineer, or leader who wants to understand the how and why of AI performance. You’ll come away with a new appreciation for the clever, systems-level work required to build a truly scalable and open AI future.

… continue reading

3 episodes

#Tech #Business #Red Hat #Emerging Technologies #Machine Learning #Chris Wright #Hybrid Clouds #Data #Development #Security #Linux #Software #Cloud Computing #Enterprise Technology #Artificial Intelligence #Programming #Coding Careers #Technology

All episodes

×

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

Listen to 500+ topics

Quick Reference Guide

Top Podcasts

The Bill Simmons Podcast

Comedy of the Week

How Did This Get Made?

Doug Loves Movies

TED Talks Daily

NBC Nightly News with Tom Llamas

The World This Hour

Daily Boost Motivation and Coaching

This American Life

Sword and Scale

Help/FAQ | Upgrade | Advertise

Arts|Business|Comedy|Economics|Entertainment|News|Politics|Religion

Science|Soccer|Sports|Storytelling|Technology|True Crime

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright

Listen to this show while you explore