The Challenge Of AI Model Evaluations With Ankur Goyal Software Engineering Daily podcast

Artwork

Jeffmeyerson Softwaredevelopment Softwareengineering News Tech News Software Daily

Player FM - Internet Radio Done Right

40 subscribers

Added seven years ago

Content provided by Software Engineering Daily. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Software Engineering Daily or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

<

<div class="span index">1</div> <span><a class="" data-remote="true" data-type="html" href="/series/netflix-sports-club-podcast">Netflix Sports Club Podcast</a></span>

<div class="span index">1</div> <span><a class="" data-remote="true" data-type="html" href="/series/netflix-sports-club-podcast">Netflix Sports Club Podcast</a></span> podcast artwork

<div class="span index">1</div> <span><a class="" data-remote="true" data-type="html" href="/series/netflix-sports-club-podcast">Netflix Sports Club Podcast</a></span> podcast artwork

1
Netflix Sports Club Podcast

5 days ago5d ago

Monthly+

Hosted by Kay Adams, the Netflix Sports Club Podcast is a bi-weekly all-access deep dive into the Netflix Sports universe! Each episode, Adams will speak with athletes, coaches, and a rotating cycle of familiar sports correspondents to talk about a recently released Netflix Sports series. The podcast will feature hot takes, deep analysis, games, and intimate conversations. Watch the episodes on Spotify, Tudum, or the Netflix Sports YouTube Channel. Audio episodes are available to listen to wherever you get your podcasts. New episodes every other Friday starting June 6th!

Software Engineering Daily « »
The Challenge of AI Model Evaluations with Ankur Goyal

about a year ago 45:22

Share

MP3•Episode home

Content provided by Software Engineering Daily. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Software Engineering Daily or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Evaluations are critical for assessing the quality, performance, and effectiveness of software during development. Common evaluation methods include code reviews and automated testing, and can help identify bugs, ensure compliance with requirements, and measure software reliability.

However, evaluating LLMs presents unique challenges due to their complexity, versatility, and potential for unpredictable behavior.

Ankur Goyal is the CEO and Founder of Braintrust Data, which provides an end-to-end platform for AI application development, and has a focus on making LLM development robust and iterative. Ankur previously founded Impira which was acquired by Figma, and he later ran the AI team at Figma. Ankur joins the show to talk about Braintrust and the unique challenges of developing evaluations in a non-deterministic context.

Sean's been an academic, startup founder, and Googler. He has published works covering a wide range of topics from AI to quantum computing. Currently, Sean is an AI Entrepreneur in Residence at Confluent where he works on AI strategy and thought leadership. You can connect with Sean on LinkedIn.

Please click here to see the transcript of this episode.

Sponsorship inquiries: sponsor@softwareengineeringdaily.com

… continue reading

2108 episodes

#Jeffmeyerson #Softwaredevelopment #Softwareengineering #News #Tech News #Software Daily

Artwork

The Challenge of AI Model Evaluations with Ankur Goyal

Software Engineering Daily

40 subscribers

published about a year ago

Share

MP3•Episode home

Content provided by Software Engineering Daily. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Software Engineering Daily or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Evaluations are critical for assessing the quality, performance, and effectiveness of software during development. Common evaluation methods include code reviews and automated testing, and can help identify bugs, ensure compliance with requirements, and measure software reliability.

However, evaluating LLMs presents unique challenges due to their complexity, versatility, and potential for unpredictable behavior.

Ankur Goyal is the CEO and Founder of Braintrust Data, which provides an end-to-end platform for AI application development, and has a focus on making LLM development robust and iterative. Ankur previously founded Impira which was acquired by Figma, and he later ran the AI team at Figma. Ankur joins the show to talk about Braintrust and the unique challenges of developing evaluations in a non-deterministic context.

Sean's been an academic, startup founder, and Googler. He has published works covering a wide range of topics from AI to quantum computing. Currently, Sean is an AI Entrepreneur in Residence at Confluent where he works on AI strategy and thought leadership. You can connect with Sean on LinkedIn.

Please click here to see the transcript of this episode.

Sponsorship inquiries: sponsor@softwareengineeringdaily.com

… continue reading

2108 episodes

#Jeffmeyerson #Softwaredevelopment #Softwareengineering #News #Tech News #Software Daily

All episodes

×

S

Software Engineering Daily

Software Engineering Daily podcast artwork

1
MCP Security at Wiz with Rami McCarthy 56:07

2 days ago56:07

56:07

Wiz is a cloud security platform that helps organizations identify and remediate risks across their cloud environments. The company’s platform scans layers of the cloud stack, including virtual machines, containers, and serverless configurations, to detect vulnerabilities and misconfigurations in context. The Model Context Protocol, or MCP, is emerging as a potential standard for connecting LLM applications to external data sources and tools. It has rapidly gained traction across the industry with broad backing from companies such as OpenAI, Microsoft, and Google. While the protocol offers great opportunities, it also introduces certain security risks. Rami McCarthy is a Principal Security Researcher at Wiz. He joins the podcast with Gregor Vand to talk about security research, AI and secrets leakage, MCP security, supply chain attacks, career advice, and more. Gregor Vand is a security-focused technologist, and is the founder and CTO of Mailpass. Previously, Gregor was a CTO across cybersecurity, cyber insurance and general software engineering companies. He has been based in Asia Pacific for almost a decade and can be found via his profile at vand.hk. Please click here to see the transcript of this episode. Sponsorship inquiries: sponsor@softwareengineeringdaily.com…

S

Software Engineering Daily

Software Engineering Daily podcast artwork

1
SED News: Data Land Grabs, Copyright Fights, and the Great AI Talent War 47:14

4 days ago47:14

47:14

Welcome back to SED News, a podcast series from Software Engineering Daily where hosts Gregor Vand and Sean Falconer break down the latest stories in software engineering, Silicon Valley, and the wider tech industry. In this episode, Gregor and Sean dig into Meta’s legal battle over AI training data, discuss the strategic implications of Meta’s $14 billion stake in Scale AI , and examine how competition in the AI space is reshaping relationships between tech giants like Microsoft , OpenAI , and Google . They also highlight some of the most interesting stories from Hacker News, including a solar-powered iPhone turned OCR server, and a provocative case for why some AI agents should really just be SQL queries. Gregor Vand is a security-focused technologist, and is the founder and CTO of Mailpass. Previously, Gregor was a CTO across cybersecurity, cyber insurance and general software engineering companies. He has been based in Asia Pacific for almost a decade and can be found via his profile at vand.hk. Sean's been an academic, startup founder, and Googler. He has published works covering a wide range of topics from AI to quantum computing. Currently, Sean is an AI Entrepreneur in Residence at Confluent where he works on AI strategy and thought leadership. You can connect with Sean on LinkedIn. Please click here to see the transcript of this episode. Sponsorship inquiries: sponsor@softwareengineeringdaily.com…

S

Software Engineering Daily

Software Engineering Daily podcast artwork

1
AI at Anaconda with Greg Jennings 49:47

9 days ago49:47

49:47

Anaconda is a software company that's well-known for its solutions for managing packages, environments, and security in large-scale data workflows. The company has played a major role in making Python-based data science more accessible, efficient, and scalable. Anaconda has also invested heavily in AI tool development. Greg Jennings is the VP of Engineering and AI at Anaconda. He joins the podcast with Kevin Ball to talk about the tooling ecosystem around AI app development, the Anaconda Toolbox, the rapidly evolving role of AI in engineering, and more. Kevin Ball or KBall, is the vice president of engineering at Mento and an independent coach for engineers and engineering leaders. He co-founded and served as CTO for two companies, founded the San Diego JavaScript meetup, and organizes the AI inaction discussion group through Latent Space. Please click here to see the transcript of this episode. Sponsorship inquiries: sponsor@softwareengineeringdaily.com…

S

Software Engineering Daily

Software Engineering Daily podcast artwork

1
ByteDance’s Container Networking Stack with Chen Tang 47:57

11 days ago47:57

47:57

ByteDance is a global technology company operating a wide range of content platforms around the world, and is best known for creating TikTok. The company operates at a massive scale, which naturally presents challenges in ensuring performance and stability across its data centers. It has over a million servers running containerized applications, and this required the company to find a networking solution that could handle high throughput while maintaining stability. eBPF is a technology for dynamically and safely reprogramming the Linux kernel. ByteDance leveraged eBPF to successfully implement a decentralized networking solution that improved efficiency, scalability, and performance. Chen Tang is an engineer at ByteDance, where he worked on redesigning the company's container networking stack using eBPF. In this episode, Chen joins the show with Kevin Ball to talk about eBPF, the problems it solves, and how it was used at ByteDance. Kevin Ball or KBall, is the vice president of engineering at Mento and an independent coach for engineers and engineering leaders. He co-founded and served as CTO for two companies, founded the San Diego JavaScript meetup, and organizes the AI inaction discussion group through Latent Space. Please click here to see the transcript of this episode. Sponsorship inquiries: sponsor@softwareengineeringdaily.com…

S

Software Engineering Daily

Software Engineering Daily podcast artwork

1
WayForward Games with Tomm Hulett and Voldi Way 46:02

16 days ago46:02

46:02

WayForward is a renowned video game studio that was founded in 1990. The company has developed games for publishers such as Capcom, Konami, and Nintendo and has released their games across major hardware platforms from the last 35 years. They are also the creators of the Shantae series of 2D platformers. WayForward recently developed the latest game in the storied Contra series, called Operation Galuga, which is a reimagining of the original Contra from 1987. Voldi Way is the founder and CEO of WayForward, and Tomm Hulett is a Director at WayForward. They join the show to talk about the history of their studio and developing Contra: Operation Galuga. Joe Nash is a developer, educator, and award-winning community builder, who has worked at companies including GitHub, Twilio, Unity, and PayPal. Joe got his start in software development by creating mods and running servers for Garry’s Mod, and game development remains his favorite way to experience and explore new technologies and concepts. Please click here to see the transcript of this episode. Sponsorship inquiries: sponsor@softwareengineeringdaily.com…

S

Software Engineering Daily

Software Engineering Daily podcast artwork

1
CodeRabbit and RAG for Code Review with Harjot Gill 48:42

18 days ago48:42

48:42

One of the most immediate and high-impact applications of LLMs has been in software development. The models can significantly accelerate code writing, but with that increased velocity comes a greater need for thoughtful, scalable approaches to codereview. Integrating AI into the development workflow requires rethinking how to ensure quality,security, and maintainability at scale. CodeRabbit is a startup that brings generative AI into the code review process. It evaluates code quality and security directly within tools like GitHub and VS Code, acting as an AI reviewer that complements existing CI/CD pipelines. Harjot Gill is the founder and CEO of CodeRabbit. He joins the podcast with Kevin Ball to discuss CodeRabbit's architecture. Its multi-model LLM strategy, how it tracks the reasoning trail of agents, managing context windows, lessons from bootstrapping the company, and much more. Full Disclosure: This episode is sponsored by CodeRabbit. Kevin Ball or KBall, is the vice president of engineering at Mento and an independent coach for engineers and engineering leaders. He co-founded and served as CTO for two companies, founded the San Diego JavaScript meetup, and organizes the AI inaction discussion group through Latent Space. Please click here to see the transcript of this episode. Sponsorship inquiries: sponsor@softwareengineeringdaily.com…

S

Software Engineering Daily

Software Engineering Daily podcast artwork

1
Emulating Retro Games on Modern Consoles with Robin Lavallée and Bill Litshauer 1:01:34

23 days ago1:01:34

1:01:34

Emulating retro games on modern consoles is a growing trend, and allows players to experience classic titles with improved performance, enhanced resolution, and added features like save states and rewinding. However, this process raises many challenging technical questions related to hardware compatibility, performance optimization, rendering, and state management. Implicit Conversions is a company focused on emulating retro PlayStation games on modern consoles. Robin Lavallée is the CEO and Bill Litshauer is the COO at the company. They join the show to talk about the engineering that’s needed to emulate and enhance retro games. Kevin Ball or KBall, is the vice president of engineering at Mento and an independent coach for engineers and engineering leaders. He co-founded and served as CTO for two companies, founded the San Diego JavaScript meetup, and organizes the AI inaction discussion group through Latent Space. Please click here to see the transcript of this episode. Sponsorship inquiries: sponsor@softwareengineeringdaily.com…

S

Software Engineering Daily

Software Engineering Daily podcast artwork

1
SED News: Corporate Spies, Postgres, and the Weird Life of Devs Right Now 44:38

25 days ago44:38

44:38

Welcome back to SED News, a podcast series from Software Engineering Daily where hosts Gregor Vand and Sean Falconer break down the latest stories in software engineering, Silicon Valley, and wider tech world. In this episode, Gregor and Sean unpack what’s going with Deel and Rippling, explore why Databricks and Snowflake are making big bets on Postgres, and reflect on how AI-powered tools like Cursor are reshaping what it means to be a developer today. They also surface highlights from Hacker News, including Claude’s evolving system prompt and a surprising history of transit cards in Japan and Hong Kong. Gregor Vand is a security-focused technologist, and is the founder and CTO of Mailpass. Previously, Gregor was a CTO across cybersecurity, cyber insurance and general software engineering companies. He has been based in Asia Pacific for almost a decade and can be found via his profile at vand.hk. Sean's been an academic, startup founder, and Googler. He has published works covering a wide range of topics from AI to quantum computing. Currently, Sean is an AI Entrepreneur in Residence at Confluent where he works on AI strategy and thought leadership. You can connect with Sean on LinkedIn. Please click here to see the transcript of this episode. Sponsorship inquiries: sponsor@softwareengineeringdaily.com…

S

Software Engineering Daily

Software Engineering Daily podcast artwork

1
TanStack and the Future of Frontend with Tanner Linsley 55:13

30 days ago55:13

55:13

TanStack is an open-source collection of high-performance libraries for JavaScript and TypeScript applications, primarily focused on state management, data fetching, and table utilities. It includes popular libraries like TanStack Query, TanStack Table, and TanStack Router. These libraries emphasize declarative APIs, optimized performance, and developer-friendly features, and they are increasingly popular for modern frontend development. Tanner Linsley is the creator of TanStack and he joins the podcast with Nick Nisi to talk about the project, SSG, type safety, the TanStack Start full-stack React framework, and much more. Nick Nisi is a conference organizer, speaker, and developer focused on tools across the web ecosystem. He has organized and emceed several conferences and has led NebraskaJS for more than a decade. Nick currently works as a developer experience engineer at WorkOS. Please click here to see the transcript of this episode. Sponsorship inquiries: sponsor@softwareengineeringdaily.com…

S

Software Engineering Daily

Software Engineering Daily podcast artwork

1
The Challenge of AI Model Evaluations with Ankur Goyal 45:22

5 weeks ago45:22

45:22

Evaluations are critical for assessing the quality, performance, and effectiveness of software during development. Common evaluation methods include code reviews and automated testing, and can help identify bugs, ensure compliance with requirements, and measure software reliability. However, evaluating LLMs presents unique challenges due to their complexity, versatility, and potential for unpredictable behavior. Ankur Goyal is the CEO and Founder of Braintrust Data , which provides an end-to-end platform for AI application development, and has a focus on making LLM development robust and iterative. Ankur previously founded Impira which was acquired by Figma, and he later ran the AI team at Figma. Ankur joins the show to talk about Braintrust and the unique challenges of developing evaluations in a non-deterministic context. Sean's been an academic, startup founder, and Googler. He has published works covering a wide range of topics from AI to quantum computing. Currently, Sean is an AI Entrepreneur in Residence at Confluent where he works on AI strategy and thought leadership. You can connect with Sean on LinkedIn. Please click here to see the transcript of this episode. Sponsorship inquiries: sponsor@softwareengineeringdaily.com…

S

Software Engineering Daily

Software Engineering Daily podcast artwork

1
Modern Distributed Applications with Stephan Ewen 41:20

5 weeks ago41:20

41:20

A major challenge with creating distributed applications is achieving resilience, reliability, and fault tolerance. It can take considerable engineering time to address non-functional concerns like retries, state synchronization, and distributed coordination. Event-driven models aim to simplify these issues, but often introduce new difficulties in debugging and operations. Stephan Ewen is the Founder at Restate which aims to simplify modern distributed applications. He is also the co-creator of Apache Flink which is an open-source framework for unified stream-processing and batch-processing. Stephan joins the show with Sean Falconer to talk about distributed applications and his work with Restate. Sean's been an academic, startup founder, and Googler. He has published works covering a wide range of topics from AI to quantum computing. Currently, Sean is an AI Entrepreneur in Residence at Confluent where he works on AI strategy and thought leadership. You can connect with Sean on LinkedIn. Please click here to see the transcript of this episode. Sponsorship inquiries: sponsor@softwareengineeringdaily.com…

S

Software Engineering Daily

Software Engineering Daily podcast artwork

1
Crew AI with João Moura 45:08

6 weeks ago45:08

45:08

Agentic AI is seen as a key frontier in artificial intelligence, enabling systems to autonomously act, adapt in real-time, and solve complex, multi-step problems based on objectives and context. Unlike traditional rule-based or generative AI, which are limited to predefined or reactive tasks, agentic AI processes vast information, models uncertainty, and makes context-sensitive decisions, mimicking human-like problem-solving. Crew AI is a platform to build and deploy automated workflows using any LLM and cloud platform. The company has rapidly become one of the most prominent in the field of agentic AI. João Moura is the founder at Crew AI and he joins the show with Sean Falconer to talk about his company. Sean's been an academic, startup founder, and Googler. He has published works covering a wide range of topics from AI to quantum computing. Currently, Sean is an AI Entrepreneur in Residence at Confluent where he works on AI strategy and thought leadership. You can connect with Sean on LinkedIn. Please click here to see the transcript of this episode. Sponsorship inquiries: sponsor@softwareengineeringdaily.com…

S

Software Engineering Daily

Software Engineering Daily podcast artwork

1
Chip Design in the AI Era with Thomas Andersen 50:33

6 weeks ago50:33

50:33

Synopsys is a leading electronic design automation company specializing in silicon design and verification, as well as software integrity and security. Their tools are foundational to the creation of modern chips and embedded software, powering everything from smartphones to cars. Chip design is a deeply complex process, often taking months or years and requiring the coordination of thousands of engineers. Now, advances in AI are beginning to transform the field by reducing manual effort, accelerating timelines, and unlocking new design possibilities. Thomas Andersen is the Vice President of AI and Machine Learning at Synopsys , where he has spent over 15 years. He joins the show to talk with Kevin Ball about the evolving role of AI in hardware design, the challenges of training models on tacit, undocumented chip engineering knowledge, the emergence of domain-specific LLMs, and where this fast-moving field is going next. Kevin Ball or KBall, is the vice president of engineering at Mento and an independent coach for engineers and engineering leaders. He co-founded and served as CTO for two companies, founded the San Diego JavaScript meetup, and organizes the AI inaction discussion group through Latent Space. Please click here to see the transcript of this episode. Sponsorship inquiries: sponsor@softwareengineeringdaily.com…

S

Software Engineering Daily

Software Engineering Daily podcast artwork

1
OpenTofu with Cory O’Daniel and Malcolm Matalka 48:58

7 weeks ago48:58

48:58

OpenTofu is an open-source alternative to Terraform, designed for managing infrastructure as code. It enables users to define, provision, and manage their cloud and on-premises resources using a declarative configuration language. OpenTofu was created to ensure an open and community-driven approach to infrastructure tooling, and it emphasizes compatibility and extensibility for diverse deployment scenarios. Cory O’Daniel is the CEO of Massdriver and he's a founding member of OpenTofu. Malcolm Matalka is a Co-Founder at Terrateam and he’s also a founding member of OpenTofu. They join the podcast to talk about the OpenTofu project. Sean's been an academic, startup founder, and Googler. He has published works covering a wide range of topics from AI to quantum computing. Currently, Sean is an AI Entrepreneur in Residence at Confluent where he works on AI strategy and thought leadership. You can connect with Sean on LinkedIn. Please click here to see the transcript of this episode. Sponsorship inquiries: sponsor@softwareengineeringdaily.com…

S

Software Engineering Daily

Software Engineering Daily podcast artwork

1
Mojo and Building a CUDA Replacement with Chris Lattner 56:14

7 weeks ago56:14

56:14

Python is the dominant language for AI and data science applications, but it lacks the performance and low-level control needed to fully leverage GPU hardware. As a result, developers often rely on NVIDIA’s CUDA framework, which adds complexity and fragments the development stack. Mojo is a new programming language designed to combine the simplicity of Python with the performance of C and the safety of Rust. It also aims to provide a vendor-independent approach to GPU programming. Mojo is being developed by Chris Lattner , a renowned systems engineer known for his seminal contributions to computer science, including LLVM, the Clang compiler, and the Swift programming language. Chris is the CEO and Co-Founder of Modular AI , the company behind Mojo. In this episode, he joins the show to discuss his engineering journey and his current work on AI infrastructure and the Mojo language. Kevin Ball or KBall, is the vice president of engineering at Mento and an independent coach for engineers and engineering leaders. He co-founded and served as CTO for two companies, founded the San Diego JavaScript meetup, and organizes the AI inaction discussion group through Latent Space. Please click here to see the transcript of this episode. Sponsorship inquiries: sponsor@softwareengineeringdaily.com…

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

Listen to 500+ topics

Quick Reference Guide

Top Podcasts

The Bill Simmons Podcast

Comedy of the Week

How Did This Get Made?

Doug Loves Movies

TED Talks Daily

NBC Nightly News with Tom Llamas

The World This Hour

Daily Boost Motivation and Coaching

This American Life

Sword and Scale

Help/FAQ | Upgrade | Advertise

Arts|Business|Comedy|Economics|Entertainment|News|Politics|Religion

Science|Soccer|Sports|Storytelling|Technology|True Crime

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright

Listen to this show while you explore