Artwork

Content provided by Demetrios. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Demetrios or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

Hands-on serving models using KFserving // Theofilos Papapanagiotou // Data Science Architect at Prosus // MLOps Meetup #40

57:45
 
Share
 

Manage episode 313294503 series 3241972
Content provided by Demetrios. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Demetrios or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

MLOps community meetup #40! Last Wednesday, we talked to Theofilos Papapanagiotou, Data Science Architect at Prosus, about Hands-on Serving Models Using KFserving.
// Abstract:
We looked to some popular model formats like the SavedModel of Tensorflow, the Model Archiver of PyTorch, pickle&ONNX, to understand how the weights of the NN are saved there, the graph, and the signature concepts.
We discussed the relevant resources of the deployment stack of Istio (the Ingress gateway, the sidecar and the virtual service) and Knative (the service and revisions), as well as Kubeflow and KFServing. Then we got into the design details of KFServing, its custom resources, the controller and webhooks, the logging, and configuration.
We spent a large part in the monitoring stack, the metrics of the servable (memory footprint, latency, number of requests), as well as the model metrics like the graph, init/restore latencies, the optimizations, and the runtime metrics which end up to Prometheus. We looked at the inference payload and prediction logging to observe drifts and trigger the retraining of the pipeline.
Finally, a few words about the awesome community and the roadmap of the project on multi-model serving and inference routing graph.
// Bio:
Theo is a recovering Unix Engineer with 20 years of work experience in Telcos, on internet services, video delivery, and cybersecurity. He is also a university student for life; BSc in CS 1999, MSc in Data Coms 2008, and MSc in AI 2017.
Nowadays he calls himself an ML Engineer, as he expresses through this role his passion for System Engineering and Machine Learning.
His analytical thinking is driven by curiosity and hacker spirit. He has skills that span a variety of different areas: Statistics, Programming, Databases, Distributed Systems, and Visualization.
----------- Connect With Us ✌️-------------
Join our Slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register
Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with Theofilos on LinkedIn: https://linkedin.com/in/theofpa

  continue reading

450 episodes

Artwork
iconShare
 
Manage episode 313294503 series 3241972
Content provided by Demetrios. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Demetrios or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

MLOps community meetup #40! Last Wednesday, we talked to Theofilos Papapanagiotou, Data Science Architect at Prosus, about Hands-on Serving Models Using KFserving.
// Abstract:
We looked to some popular model formats like the SavedModel of Tensorflow, the Model Archiver of PyTorch, pickle&ONNX, to understand how the weights of the NN are saved there, the graph, and the signature concepts.
We discussed the relevant resources of the deployment stack of Istio (the Ingress gateway, the sidecar and the virtual service) and Knative (the service and revisions), as well as Kubeflow and KFServing. Then we got into the design details of KFServing, its custom resources, the controller and webhooks, the logging, and configuration.
We spent a large part in the monitoring stack, the metrics of the servable (memory footprint, latency, number of requests), as well as the model metrics like the graph, init/restore latencies, the optimizations, and the runtime metrics which end up to Prometheus. We looked at the inference payload and prediction logging to observe drifts and trigger the retraining of the pipeline.
Finally, a few words about the awesome community and the roadmap of the project on multi-model serving and inference routing graph.
// Bio:
Theo is a recovering Unix Engineer with 20 years of work experience in Telcos, on internet services, video delivery, and cybersecurity. He is also a university student for life; BSc in CS 1999, MSc in Data Coms 2008, and MSc in AI 2017.
Nowadays he calls himself an ML Engineer, as he expresses through this role his passion for System Engineering and Machine Learning.
His analytical thinking is driven by curiosity and hacker spirit. He has skills that span a variety of different areas: Statistics, Programming, Databases, Distributed Systems, and Visualization.
----------- Connect With Us ✌️-------------
Join our Slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register
Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with Theofilos on LinkedIn: https://linkedin.com/in/theofpa

  continue reading

450 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play