Go offline with the Player FM app!
#55 Serverless ETL with Athena and Airflow | Builder's Diary Vol. 2
Manage episode 344611201 series 2570451
Get insights into the day-to-day challenges of builders. In this issue, Peter Reitz from our partner tecRacer talks about how to build Serverless ETL pipelines with Athena and Airflow. Learn how to extract data from data stored on S3, transform and enrich the data, transform it into a format optimized for data analytics and upload the data to S3 for further processing.
Would you like to join Peters's team to solve real-world problems with the help of data analytics and machine learning powered by AWS? tecRacer is hiring a Cloud Consultant focusing on Machine Learning and Data Analytics. Apply now!
Chapters
1. Intro (00:00:00)
2. Peter's career: from economics to cloud computing (00:01:30)
3. The typical data analytics project (00:03:48)
4. Serverless ETL pipeline (00:06:51)
5. What is Amazon Athena? (00:08:47)
6. How to build Serverless ETL pipelines with Athena? (00:11:17)
7. Partitioning data to increase Athena's efficiency (00:16:57)
8. Monitoring Athena and S3 costs (00:22:47)
9. Amazon EMR vs. Amazon Athena (00:24:50)
10. What is Apache Airflow? (00:27:40)
11. Development workflow for ETL pipelines (00:30:40)
12. Column-oriented data file format like Apache Parquet (00:32:45)
13. AWS Step Functions vs. Airflow (00:35:30)
14. What Peter likes most about data analytics (00:41:20)
15. Limitations of Serverless ETL (00:45:20)
16. tecRacer is hiring! (00:49:05)
17. Outro (00:50:03)
93 episodes
Manage episode 344611201 series 2570451
Get insights into the day-to-day challenges of builders. In this issue, Peter Reitz from our partner tecRacer talks about how to build Serverless ETL pipelines with Athena and Airflow. Learn how to extract data from data stored on S3, transform and enrich the data, transform it into a format optimized for data analytics and upload the data to S3 for further processing.
Would you like to join Peters's team to solve real-world problems with the help of data analytics and machine learning powered by AWS? tecRacer is hiring a Cloud Consultant focusing on Machine Learning and Data Analytics. Apply now!
Chapters
1. Intro (00:00:00)
2. Peter's career: from economics to cloud computing (00:01:30)
3. The typical data analytics project (00:03:48)
4. Serverless ETL pipeline (00:06:51)
5. What is Amazon Athena? (00:08:47)
6. How to build Serverless ETL pipelines with Athena? (00:11:17)
7. Partitioning data to increase Athena's efficiency (00:16:57)
8. Monitoring Athena and S3 costs (00:22:47)
9. Amazon EMR vs. Amazon Athena (00:24:50)
10. What is Apache Airflow? (00:27:40)
11. Development workflow for ETL pipelines (00:30:40)
12. Column-oriented data file format like Apache Parquet (00:32:45)
13. AWS Step Functions vs. Airflow (00:35:30)
14. What Peter likes most about data analytics (00:41:20)
15. Limitations of Serverless ETL (00:45:20)
16. tecRacer is hiring! (00:49:05)
17. Outro (00:50:03)
93 episodes
All episodes
×Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.