Llm Evaluation public
[search 0]
More
Download the App!
show episodes
 
All Things LLM is your go-to podcast for demystifying Large Language Models! We break down their core concepts—like tokens, embeddings, and the self-attention that powers GPT-4 and Llama. Learn how LLMs are built, trained, and fine-tuned (SFT, RLHF, PEFT) on massive datasets. Discover real-world use cases in healthcare, finance, chatbots, code, RAG, and more. We explore the LLM ecosystem, covering open-source vs. closed models, LLMaaS, LangChain, and LLMOps tools. Plus, we tackle challenges— ...
  continue reading
 
AXRP (pronounced axe-urp) is the AI X-risk Research Podcast where I, Daniel Filan, have conversations with researchers about their papers. We discuss the paper, and hopefully get a sense of why it's been written and how it might reduce the risk of AI causing an existential catastrophe: that is, permanently and drastically curtailing humanity's future potential. You can visit the website and read transcripts at axrp.net.
  continue reading
 
Artwork

1
AWS Podcast

Amazon Web Services

icon
Unsubscribe
icon
icon
Unsubscribe
icon
Weekly
 
The Official AWS Podcast is a podcast for developers and IT professionals looking for the latest news and trends in storage, security, infrastructure, serverless, and more. Join Simon Elisha and Hawn Nguyen-Loughren for regular updates, deep dives, launches, and interviews. Whether you’re training machine learning models, developing open source projects, or building cloud solutions, the Official AWS Podcast has something for you.
  continue reading
 
Machine learning and artificial intelligence are dramatically changing the way businesses operate and people live. The TWIML AI Podcast brings the top minds and ideas from the world of ML and AI to a broad and influential community of ML/AI researchers, data scientists, engineers and tech-savvy business and IT leaders. Hosted by Sam Charrington, a sought after industry analyst, speaker, commentator and thought leader. Technologies covered include machine learning, artificial intelligence, de ...
  continue reading
 
Loading …
show series
 
In the season finale of "All Things LLM," hosts Alex and Ben turn to one of the most important—and challenging—topics in AI: How do we objectively evaluate the quality and reliability of a language model? With so many models, benchmarks, and metrics, what actually counts as “good”? In this episode, you’ll discover: The evolution of LLM evaluation: …
  continue reading
 
Today, we’re joined by Oliver Wang, principal scientist at Google DeepMind and tech lead for Gemini 2.5 Flash Image—better known by its code name, “Nano Banana.” We dive into the development and capabilities of this newly released frontier vision-language model, beginning with the broader shift from specialized image generators to general-purpose m…
  continue reading
 
What happens when AI not only understands the world, but acts in it? In this trailblazing episode of "All Things LLM," Alex and Ben chart the rise of next-generation AI: autonomous agents and Large Action Models (LAMs). Discover how LLMs are evolving from passive text generators to powerful doers—reshaping workflows, business automation, and the ve…
  continue reading
 
As LLMs power more business workflows, security risks grow. In this essential episode of "All Things LLM," hosts Alex and Ben break down the new wave of cybersecurity threats targeting language models—and what you can do to defend your AI infrastructure. What you’ll learn: The OWASP Top 10 for LLMs: Explore the most pressing LLM security risks and …
  continue reading
 
AI’s next great leap isn’t about bigger models—it’s about broader senses. In this season premiere of "All Things LLM," Alex and Ben explore the revolutionary world of multimodal large language models (LLMs)—the new frontier where AI can “see,” “hear,” and “understand” the world far beyond text. In this episode: Journey to Multimodality: Discover wh…
  continue reading
 
In the grand finale of "All Things LLM," hosts Alex and Ben look ahead to the bleeding edge—and reflect on the ultimate question for AI: can we ever truly understand how these models think? Inside this episode: The rise of reasoning models: Discover why the next leap for AI isn’t just bigger models, but smarter thinking. Explore how OpenAI’s o1 and…
  continue reading
 
Powerful language models are reshaping the world, but serious challenges remain. In this revealing episode of "All Things LLM," hosts Alex and AI expert Ben tackle the core limitations and ethical risks facing all large language models—open or closed. This episode covers: Hallucinations: Why LLMs make up plausible-sounding but false or misleading a…
  continue reading
 
How do we make AI not just smart, but safe and genuinely helpful? In this episode of "All Things LLM," Alex and Ben break down the vital process of alignment—transforming a powerful language model into a trustworthy assistant you can rely on. Inside this episode: What is RLHF? Discover Reinforcement Learning from Human Feedback—the multi-stage proc…
  continue reading
 
Season 4 of "All Things LLM" kicks off with one of the most crucial debates in AI today: open-source vs. closed-source (proprietary) language models. Hosts Alex and Ben cut through the hype to explain what’s at stake for businesses, developers, and the entire AI ecosystem. In this episode, you’ll discover: The fundamentals: What truly sets open-sou…
  continue reading
 
Get an insider’s look behind the curtain of modern AI with this episode of "All Things LLM." Join hosts Alex and AI expert Ben as they reveal the colossal effort, expense, and ingenuity required to take a language model from “blank slate” to foundational intelligence. What you’ll learn: The massive scale of LLM training: how developers assemble and…
  continue reading
 
Discover how generalist AIs become powerful specialists in this episode of "All Things LLM." Hosts Alex and AI expert Ben break down the next stage of the LLM lifecycle—customization—and unpack the practical techniques that transform foundation models into domain experts or business-ready assistants. Learn about: Fine-Tuning: Why it’s essential for…
  continue reading
 
Unlock the key to modern AI with this deep-dive episode of "All Things LLM"! Hosts Alex and our resident AI expert Ben unpack the “self-attention mechanism”—the heart of every powerful Transformer model powering GPT, Llama, Gemini, and more. Discover: What “self-attention” actually means in the context of language models—and why it’s a game-changer…
  continue reading
 
Unlock the full potential of large language models with this hands-on episode of "All Things LLM." Hosts Alex and AI expert Ben break down the essential (and rapidly evolving) discipline of prompt engineering—your steering wheel for directing AI toward more relevant, accurate, and actionable outputs. What you’ll learn: Prompt Engineering 101: Why c…
  continue reading
 
Curious how AI language models like ChatGPT burst into the mainstream? Welcome to "All Things LLM," where hosts Alex and AI expert Ben unravel the true origins and evolution of Large Language Models. In this episode, we journey through more than a century of discoveries that paved the way for today’s groundbreaking AI. Discover: The surprising root…
  continue reading
 
Unlock the mysteries of modern AI with "All Things LLM." In this episode, Alex and Ben break down the Transformer—the revolutionary engine powering today’s Large Language Models (LLMs) like GPT-4, Llama, and Gemini. If you’ve ever wondered how AI can both understand and generate text, this deep dive into Transformer architecture is your essential g…
  continue reading
 
Step into the rapidly evolving world of artificial intelligence with the premiere of "All Things LLM." In this debut episode, hosts Alex and AI expert Ben break down one of today's most talked-about technologies: the Large Language Model (LLM). Are you curious about terms like ChatGPT, generative AI, or conversational AI—but not sure where to start…
  continue reading
 
Today, we're joined by Aditi Raghunathan, assistant professor at Carnegie Mellon University, to discuss the limitations of LLMs and how we can build more adaptable and creative models. We dig into her ICML 2025 Outstanding Paper Award winner, “Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction,” which ex…
  continue reading
 
In this episode, we will dive deep into Innovation Sandbox on AWS, a new AWS solution offering that transforms the management of temporary sandbox environments, by offering a ready-made solution that enables customers to reduce sandbox setup time from weeks to hours while automating spend controls, security policies, and usage monitoring. Learn how…
  continue reading
 
Today, we're joined by Animesh Koratana, founder and CEO of PlayerZero to discuss his team’s approach to making agentic and AI-assisted coding tools production-ready at scale. Animesh explains how rapid advances in AI-assisted coding have created an “asymmetry” where the speed of code output outpaces the maturity of processes for maintenance and su…
  continue reading
 
In this episode, Christian Szegedy, Chief Scientist at Morph Labs, joins us to discuss how the application of formal mathematics and reasoning enables the creation of more robust and safer AI systems. A pioneer behind concepts like the Inception architecture and adversarial examples, Christian now focuses on autoformalization—the AI-driven process …
  continue reading
 
Frugality wasn't something Craig Link learned on the job, it was passed down from his father, who would calculate the cost-benefit of driving for cheaper gas and meticulously track every tank's miles per gallon in a worn notebook tucked into the glove box. He would also pack sandwiches, toss them in a cooler, and store them in the back seat. These …
  continue reading
 
Today, we're joined by Prince Canuma, an ML engineer and open-source developer focused on optimizing AI inference on Apple Silicon devices. Prince shares his journey to becoming one of the most prolific contributors to Apple’s MLX ecosystem, having published over 1,000 models and libraries that make open, multimodal AI accessible and performant on …
  continue reading
 
Today, we're joined by Jack Parker-Holder and Shlomi Fruchter, researchers at Google DeepMind, to discuss the recent release of Genie 3, a model capable of generating “playable” virtual worlds. We dig into the evolution of the Genie project and review the current model’s scaled-up capabilities, including creating real-time, interactive, and high-re…
  continue reading
 
In this episode of the AWS Podcast, we explore the evolving world of contact centers and Amazon Connect. The discussion covers why contact centers remain critical to both business and public sector operations, and how they're transforming from traditional cost centers into valuable sources of business intelligence. Key highlights include Amazon Con…
  continue reading
 
In this episode, we're joined by Lin Qiao, CEO and co-founder of Fireworks AI. Drawing on key lessons from her time building PyTorch, Lin shares her perspective on the modern generative AI development lifecycle. She explains why aligning training and inference systems is essential for creating a seamless, fast-moving production pipeline, preventing…
  continue reading
 
Could AI enable a small group to gain power over a large country, and lock in their power permanently? Often, people worried about catastrophic risks from AI have been concerned with misalignment risks. In this episode, Tom Davidson talks about a risk that could be comparably important: that of AI-enabled coups. Patreon: https://www.patreon.com/axr…
  continue reading
 
In this episode, we'll explore how the new Amazon EKS Dashboard solves key challenges in managing Kubernetes at scale across multiple AWS accounts and regions. We'll discuss how it provides centralized visibility into cluster health, versions, and costs - enabling teams to improve governance, streamline operations, and optimize their Kubernetes inf…
  continue reading
 
In this episode, Filip Kozera, founder and CEO of Wordware, explains his approach to building agentic workflows where natural language serves as the new programming interface. Filip breaks down the architecture of these "background agents," explaining how they use a reflection loop and tool-calling to execute complex tasks. He discusses the current…
  continue reading
 
In this episode, Jared Quincy Davis, founder and CEO at Foundry, introduces the concept of "compound AI systems," which allows users to create powerful, efficient applications by composing multiple, often diverse, AI models and services. We discuss how these "networks of networks" can push the Pareto frontier, delivering results that are simultaneo…
  continue reading
 
In the fifth episode of "The Frugal Architect" podcast, Werner and co-host Simon Elisha welcome Morten Keldebaek (CTO) and Robert Hjertmann from Too Good To Go. Too Good To Go is the world's largest marketplace for surplus food, connecting consumers with restaurants, cafes, and grocery stores to rescue food that would otherwise go to waste. The dis…
  continue reading
 
In this episode, Kwindla Kramer, co-founder and CEO of Daily and creator of the open source Pipecat framework, joins us to discuss the architecture and challenges of building real-time, production-ready conversational voice AI. Kwin breaks down the full stack for voice agents—from the models and APIs to the critical orchestration layer that manages…
  continue reading
 
Today, we're joined by Fatih Porikli, senior director of technology at Qualcomm AI Research for an in-depth look at several of Qualcomm's accepted papers and demos featured at this year’s CVPR conference. We start with “DiMA: Distilling Multi-modal Large Language Models for Autonomous Driving,” an end-to-end autonomous driving system that incorpora…
  continue reading
 
Simon shares some tips, tricks, and experiences in how Builders can use GENAI in their work to get things done faster, and with better outcomes.Links:Amazon Q Developer CLI: https://github.com/aws/amazon-q-developer-cliPromptz: https://www.promptz.dev/AWS MCP Servers: https://github.com/awslabs/mcpBy Amazon Web Services
  continue reading
 
In this episode, I chat with Samuel Albanie about the Google DeepMind paper he co-authored called "An Approach to Technical AGI Safety and Security". It covers the assumptions made by the approach, as well as the types of mitigations it outlines. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast Transcript: https://axr…
  continue reading
 
In this episode, I talk with Peter Salib about his paper "AI Rights for Human Safety", arguing that giving AIs the right to contract, hold property, and sue people will reduce the risk of their trying to attack humanity and take over. He also tells me how law reviews work, in the face of my incredulity. Patreon: https://www.patreon.com/axrpodcast K…
  continue reading
 
Today, we're joined by Vijoy Pandey, SVP and general manager at Outshift by Cisco to discuss a foundational challenge for the enterprise: how do we make specialized agents from different vendors collaborate effectively? As companies like Salesforce, Workday, and Microsoft all develop their own agentic systems, integrating them creates a complex, pr…
  continue reading
 
Dive into single-region resilience with AWS experts John Formento and Tarik Makota as they debunk common misconceptions and share practical strategies for building robust applications. Learn why multi-AZ isn't enough on its own, discover key AWS services for resilience, and get actionable tips for identifying and mitigating failure modes.Learn More…
  continue reading
 
Andrew Smith (Github: @asmith-plainsight) dives deep into OpenFilter, the open-source framework for building computer vision workflows. Andrew is the CTO of Plainsight which is a leader in modern computer vision infrastructure. Tune in to find out how OpenFilter is simplifying and revolutionizing computer vision applications. Subscribe to Contribut…
  continue reading
 
Today, we're joined by Ben Wellington, deputy head of feature forecasting at Two Sigma. We dig into the team’s end-to-end approach to leveraging AI in equities feature forecasting, covering how they identify and create features, collect and quantify historical data, and build predictive models to forecast market behavior and asset prices for tradin…
  continue reading
 
Explore FSx for Lustre's new intelligent storage tiering that delivers cost savings and unlimited scalability for file storage in the cloud. Plus, discover how the new Model Context Protocol (MCP) servers are revolutionizing AI-assisted development across ECS, EKS, and serverless platforms with real-time contextual responses and automated resource …
  continue reading
 
In this episode, I talk with David Lindner about Myopic Optimization with Non-myopic Approval, or MONA, which attempts to address (multi-step) reward hacking by myopically optimizing actions against a human's sense of whether those actions are generally good. Does this work? Can we get smarter-than-human AI this way? How does this compare to approa…
  continue reading
 
Today, we're joined by Jason Corso, co-founder of Voxel51 and professor at the University of Michigan, to explore automated labeling in computer vision. Jason introduces FiftyOne, an open-source platform for visualizing datasets, analyzing models, and improving data quality. We focus on Voxel51’s recent research report, “Zero-shot auto-labeling riv…
  continue reading
 
Join host Shruthi to discover how organizations use GPU-accelerated computing on AWS. Container Specialist Re Alvarez Parmar shows how Rivian optimizes GPU usage for autonomous vehicles with Amazon EKS. AWS Financial Services expert Sudhir Kalidindi explains real-time fraud detection processing 100B+ events annually. Learn architectural patterns an…
  continue reading
 
Earlier this year, the paper "Emergent Misalignment" made the rounds on AI x-risk social media for seemingly showing LLMs generalizing from 'misaligned' training data of insecure code to acting comically evil in response to innocuous questions. In this episode, I chat with one of the authors of that paper, Owain Evans, about that research as well a…
  continue reading
 
Loading …

Quick Reference Guide

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play