Artwork

Content provided by Andreas Wittig and Michael Wittig focusing on AWS Cloud, Andreas Wittig, and Michael Wittig focusing on AWS Cloud. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Andreas Wittig and Michael Wittig focusing on AWS Cloud, Andreas Wittig, and Michael Wittig focusing on AWS Cloud or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

#55 Serverless ETL with Athena and Airflow | Builder's Diary Vol. 2

50:33
 
Share
 

Manage episode 344611201 series 2570451
Content provided by Andreas Wittig and Michael Wittig focusing on AWS Cloud, Andreas Wittig, and Michael Wittig focusing on AWS Cloud. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Andreas Wittig and Michael Wittig focusing on AWS Cloud, Andreas Wittig, and Michael Wittig focusing on AWS Cloud or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Get insights into the day-to-day challenges of builders. In this issue, Peter Reitz from our partner tecRacer talks about how to build Serverless ETL pipelines with Athena and Airflow. Learn how to extract data from data stored on S3, transform and enrich the data, transform it into a format optimized for data analytics and upload the data to S3 for further processing.

Would you like to join Peters's team to solve real-world problems with the help of data analytics and machine learning powered by AWS? tecRacer is hiring a Cloud Consultant focusing on Machine Learning and Data Analytics. Apply now!

  continue reading

Chapters

1. Intro (00:00:00)

2. Peter's career: from economics to cloud computing (00:01:30)

3. The typical data analytics project (00:03:48)

4. Serverless ETL pipeline (00:06:51)

5. What is Amazon Athena? (00:08:47)

6. How to build Serverless ETL pipelines with Athena? (00:11:17)

7. Partitioning data to increase Athena's efficiency (00:16:57)

8. Monitoring Athena and S3 costs (00:22:47)

9. Amazon EMR vs. Amazon Athena (00:24:50)

10. What is Apache Airflow? (00:27:40)

11. Development workflow for ETL pipelines (00:30:40)

12. Column-oriented data file format like Apache Parquet (00:32:45)

13. AWS Step Functions vs. Airflow (00:35:30)

14. What Peter likes most about data analytics (00:41:20)

15. Limitations of Serverless ETL (00:45:20)

16. tecRacer is hiring! (00:49:05)

17. Outro (00:50:03)

93 episodes

Artwork
iconShare
 
Manage episode 344611201 series 2570451
Content provided by Andreas Wittig and Michael Wittig focusing on AWS Cloud, Andreas Wittig, and Michael Wittig focusing on AWS Cloud. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Andreas Wittig and Michael Wittig focusing on AWS Cloud, Andreas Wittig, and Michael Wittig focusing on AWS Cloud or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Get insights into the day-to-day challenges of builders. In this issue, Peter Reitz from our partner tecRacer talks about how to build Serverless ETL pipelines with Athena and Airflow. Learn how to extract data from data stored on S3, transform and enrich the data, transform it into a format optimized for data analytics and upload the data to S3 for further processing.

Would you like to join Peters's team to solve real-world problems with the help of data analytics and machine learning powered by AWS? tecRacer is hiring a Cloud Consultant focusing on Machine Learning and Data Analytics. Apply now!

  continue reading

Chapters

1. Intro (00:00:00)

2. Peter's career: from economics to cloud computing (00:01:30)

3. The typical data analytics project (00:03:48)

4. Serverless ETL pipeline (00:06:51)

5. What is Amazon Athena? (00:08:47)

6. How to build Serverless ETL pipelines with Athena? (00:11:17)

7. Partitioning data to increase Athena's efficiency (00:16:57)

8. Monitoring Athena and S3 costs (00:22:47)

9. Amazon EMR vs. Amazon Athena (00:24:50)

10. What is Apache Airflow? (00:27:40)

11. Development workflow for ETL pipelines (00:30:40)

12. Column-oriented data file format like Apache Parquet (00:32:45)

13. AWS Step Functions vs. Airflow (00:35:30)

14. What Peter likes most about data analytics (00:41:20)

15. Limitations of Serverless ETL (00:45:20)

16. tecRacer is hiring! (00:49:05)

17. Outro (00:50:03)

93 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide

Listen to this show while you explore
Play