Artwork

Content provided by Demetrios. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Demetrios or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

MLOps Coffee Sessions #10 Analyzing the Article “Continuous Delivery and Automation Pipelines in Machine Learning" // Part 2

1:07:59
 
Share
 

Manage episode 313294513 series 3241972
Content provided by Demetrios. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Demetrios or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Second installation David and Demetrios reviewing the google paper about Continuous training and automated pipelines. They dive deep into machine learning monitoring and also what exactly continuous training actually entails. Some key highlights are:

Automatically retraining and serving the models:
When to do it?
Outlier detection
Drift detection

Outlier detection:
What is it?
How you deal with it
Drift detection
Individual features may start to drift. This could be a bug or it could be perfectly normal behavior that indicates that the world has changed requiring the model to be retrained.

Example changes:
shifts in people’s preferences
marketing campaigns
competitor moves
the weather
the news cycle
Locations
Time
Devices (clients)

If the world you're working with is changing over time, model deployment should be treated as a continuous process. What this tells me is that you should keep the data scientists and engineers working on the model instead of immediately moving to another project.

Deeper dive into concept drift
Feature/target distributions change

An overview of concept drift applications: “.. data analysis applications, data evolve over time and must be analyzed in near real time. Patterns and relations in such data often evolve over time, thus, models built for analyzing such data quickly become obsolete over time. In machine learning and data mining this phenomenon is referred to as concept drift.”
https://www.win.tue.nl/~mpechen/publications/pubs/CD_applications15.pdf
https://www-ai.cs.tu-dortmund.de/LEHRE/FACHPROJEKT/SS12/paper/concept-drift/tsymbal2004.pdf

Types of concept drift:
Sudden
Gradual
Google in some way is trying to address this concern - the world is changing and you want your ML system to change as well so it can avoid decreased performance but also improve over time and adapt to its environment. This sort of robustness is necessary for certain domains.
Continuous delivery and automation of pipelines (data, training, prediction service) was built with this in mind. Minimizing the commit-to-deploy interval and maximize the velocity software delivery and its components: maintainability, extensibility, and testability
Then the pipeline is ready, you can now run it. So you can do this continuously. After the pipeline is deployed to the production environment, it will be executed automatically and repetitively to produce a trained model that is stored in a central model registry.
This pipeline should be able to be run on a schedule or based on triggers: certain events that you have configured to your business domain - new data or drop in performance from the prod model.
The link between the model artifact and the pipeline is never severed. What pipeline trained them? What data was extracted, validated and how was it prepared? What was the training configuration and how was it evaluated? Etc. metrics are key here! Lineage tracking!!!
Keeping a close tie between the dev/experiment pipeline and the continuous production pipeline helps avoid inconsistencies between model artifacts produced by the pipeline and models beings served - hard to debug

Join our slack community: https://join.slack.com/t/mlops-community/shared_invite/zt-391hcpnl-aSwNf_X5RyYSh40MiRe9Lw
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register

Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with David on LinkedIn: https://www.linkedin.com/in/aponteanalytics/
Connect with Cris Sterry on LinkedIn: https://www.linkedin.com/in/chrissterry/

  continue reading

449 episodes

Artwork
iconShare
 
Manage episode 313294513 series 3241972
Content provided by Demetrios. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Demetrios or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Second installation David and Demetrios reviewing the google paper about Continuous training and automated pipelines. They dive deep into machine learning monitoring and also what exactly continuous training actually entails. Some key highlights are:

Automatically retraining and serving the models:
When to do it?
Outlier detection
Drift detection

Outlier detection:
What is it?
How you deal with it
Drift detection
Individual features may start to drift. This could be a bug or it could be perfectly normal behavior that indicates that the world has changed requiring the model to be retrained.

Example changes:
shifts in people’s preferences
marketing campaigns
competitor moves
the weather
the news cycle
Locations
Time
Devices (clients)

If the world you're working with is changing over time, model deployment should be treated as a continuous process. What this tells me is that you should keep the data scientists and engineers working on the model instead of immediately moving to another project.

Deeper dive into concept drift
Feature/target distributions change

An overview of concept drift applications: “.. data analysis applications, data evolve over time and must be analyzed in near real time. Patterns and relations in such data often evolve over time, thus, models built for analyzing such data quickly become obsolete over time. In machine learning and data mining this phenomenon is referred to as concept drift.”
https://www.win.tue.nl/~mpechen/publications/pubs/CD_applications15.pdf
https://www-ai.cs.tu-dortmund.de/LEHRE/FACHPROJEKT/SS12/paper/concept-drift/tsymbal2004.pdf

Types of concept drift:
Sudden
Gradual
Google in some way is trying to address this concern - the world is changing and you want your ML system to change as well so it can avoid decreased performance but also improve over time and adapt to its environment. This sort of robustness is necessary for certain domains.
Continuous delivery and automation of pipelines (data, training, prediction service) was built with this in mind. Minimizing the commit-to-deploy interval and maximize the velocity software delivery and its components: maintainability, extensibility, and testability
Then the pipeline is ready, you can now run it. So you can do this continuously. After the pipeline is deployed to the production environment, it will be executed automatically and repetitively to produce a trained model that is stored in a central model registry.
This pipeline should be able to be run on a schedule or based on triggers: certain events that you have configured to your business domain - new data or drop in performance from the prod model.
The link between the model artifact and the pipeline is never severed. What pipeline trained them? What data was extracted, validated and how was it prepared? What was the training configuration and how was it evaluated? Etc. metrics are key here! Lineage tracking!!!
Keeping a close tie between the dev/experiment pipeline and the continuous production pipeline helps avoid inconsistencies between model artifacts produced by the pipeline and models beings served - hard to debug

Join our slack community: https://join.slack.com/t/mlops-community/shared_invite/zt-391hcpnl-aSwNf_X5RyYSh40MiRe9Lw
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register

Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with David on LinkedIn: https://www.linkedin.com/in/aponteanalytics/
Connect with Cris Sterry on LinkedIn: https://www.linkedin.com/in/chrissterry/

  continue reading

449 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play