Dagster's New Era: Modularizing Data Transformation In The Age Of AI Data Engineering podcast

Content provided by Tobias Macey. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Tobias Macey or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Data Engineering Podcast »
Dagster's New Era: Modularizing Data Transformation in the Age of AI

13h ago 1:01:37

MP3•Episode home

Summary
In this episode of the Data Engineering Podcast we welcome back Nick Schrock, CTO and founder of Dagster Labs, to discuss the evolving landscape of data engineering in the age of AI. As AI begins to impact data platforms and the role of data engineers, Nick shares his insights on how it will ultimately enhance productivity and expand software engineering's scope. He delves into the current state of AI adoption, the importance of maintaining core data engineering principles, and the need for human oversight when leveraging AI tools effectively. Nick also introduces Dagster's new components feature, designed to modularize and standardize data transformation processes, making it easier for teams to collaborate and integrate AI into their workflows. Join in to explore the future of data engineering, the potential for AI to abstract away complexity, and the importance of open standards in preventing walled gardens in the tech industry.
Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management
This episode is brought to you by Coresignal, your go-to source for high-quality public web data to power best-in-class AI products. Instead of spending time collecting, cleaning, and enriching data in-house, use ready-made multi-source B2B data that can be smoothly integrated into your systems via APIs or as datasets. With over 3 billion data records from 15+ online sources, Coresignal delivers high-quality data on companies, employees, and jobs. It is powering decision-making for more than 700 companies across AI, investment, HR tech, sales tech, and market intelligence industries. A founding member of the Ethical Web Data Collection Initiative, Coresignal stands out not only for its data quality but also for its commitment to responsible data collection practices. Recognized as the top data provider by Datarade for two consecutive years, Coresignal is the go-to partner for those who need fresh, accurate, and ethically sourced B2B data at scale. Discover how Coresignal's data can enhance your AI platforms. Visit dataengineeringpodcast.com/coresignal to start your free 14-day trial.
Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.
This is a pharmaceutical Ad for Soda Data Quality. Do you suffer from chronic dashboard distrust? Are broken pipelines and silent schema changes wreaking havoc on your analytics? You may be experiencing symptoms of Undiagnosed Data Quality Syndrome — also known as UDQS. Ask your data team about Soda. With Soda Metrics Observability, you can track the health of your KPIs and metrics across the business — automatically detecting anomalies before your CEO does. It’s 70% more accurate than industry benchmarks, and the fastest in the category, analyzing 1.1 billion rows in just 64 seconds. And with Collaborative Data Contracts, engineers and business can finally agree on what “done” looks like — so you can stop fighting over column names, and start trusting your data again.Whether you’re a data engineer, analytics lead, or just someone who cries when a dashboard flatlines, Soda may be right for you. Side effects of implementing Soda may include: Increased trust in your metrics, reduced late-night Slack emergencies, spontaneous high-fives across departments, fewer meetings and less back-and-forth with business stakeholders, and in rare cases, a newfound love of data. Sign up today to get a chance to win a $1000+ custom mechanical keyboard. Visit dataengineeringpodcast.com/soda to sign up and follow Soda’s launch week. It starts June 9th.
Your host is Tobias Macey and today I'm interviewing Nick Schrock about lowering the barrier to entry for data platform consumers

Interview

Introduction
How did you get involved in the area of data management?
Can you start by giving your summary of the impact that the tidal wave of AI has had on data platforms and data teams?
For anyone who hasn't heard of Dagster, can you give a quick summary of the project?
- What are the notable changes in the Dagster project in the past year?
- What are the ecosystem pressures that have shaped the ways that you think about the features and trajectory of Dagster as a project/product/community?
In your recent release you introduced "components", which is a substantial change in how you enable teams to collaborate on data problems. What was the motivating factor in that work and how does it change the ways that organizations engage with their data?
tension between being flexible and extensible vs. opinionated and constrained
increased dependency on orchestration with LLM use cases
reducing the barrier to contribution for data platform/pipelines
- bringing application engineers into the mix
challenges of meeting users/teams where they are (languages, platform investments, etc.)
What are the most interesting, innovative, or unexpected ways that you have seen teams applying the Components pattern?
What are the most interesting, unexpected, or challenging lessons that you have learned while working on the latest iterations of Dagster?
When is Dagster the wrong choice?
What do you have planned for the future of Dagster?

Contact Info

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

470 episodes

#Tech #Tobias Macey #Data Engineering #Big Data #Database #Data Pipeline #Podcasting Education