From Chaos to Reliability with Gremlin CEO Kolton Andrus
Manage episode 491832439 series 3661258
In this episode, Kolton Andrus, Founder and CEO of Gremlin deep dives into all things chaos engineering and reliability testing. Kolton shares his journey from leading reliability efforts at Amazon and Netflix to founding Gremlin, an enterprise reliability platform. They discuss what it really takes to build resilient systems, the cultural shift required to prioritize reliability, and how Gremlin is working to reshape accountability in engineering teams. From testing dependencies to aligning incentives, this conversation is packed with real-world insights into scaling systems (and teams) that don't break under pressure.
Episode page
---
Kolton Andrus is the CEO and founder of Gremlin. Prior, he focused on building and operating reliable systems at Netflix and Amazon. At both companies he operated systems at scale, managed company wide incidents and helped build out their respective reliability programs and toolsets.
Host Jose Quaresma is the VP of Technical Engagement at Queue-it, working on the frontlines with some of the world’s biggest businesses on their busiest days, from Ticketmaster to Zalando to Home Office U.K. Each week, he’ll be joined by experts across industries, uncovering how major organizations design, build, and deploy systems that perform at scale.
This podcast is hosted by José Quaresma, researched by Joseph Thwaites and produced by Perseu Mandillo.
- (00:00) - Intro & Guest: Kolton Andrus
- (04:20) - Founding Gremlin (2016)
- (08:47) - Rewarding Invisible Reliability Work
- (12:27) - Proving Reliability’s Business Value
- (15:21) - Rethinking the “Chaos Engineering” Label
- (20:18) - Chaos Testing to Reliability Scores
- (24:25) - Spreading Reliability Culture Across Teams
- (28:50) - Safe, Incremental Failure Testing in Prod
- (33:30) - Load + Fault Testing for Peak Traffic
- (36:30) - AI’s Opportunities & Risks for Ops
- (39:30) - Defining Scalability as Elasticity
- (44:18) - Key Takeaways & Farewell
© Queue-it, 2025
7 episodes