Artwork

Content provided by Demetrios. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Demetrios or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

How Pinterest Powers Image Similarity // Shaji Chennan Kunnummel // System Design Reviews #1

57:36
 
Share
 

Manage episode 313294437 series 3241972
Content provided by Demetrios. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Demetrios or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

In this Machine Learning System Design Review, Shaji Chennan Kunnummel walks us through the system design for Pinterest’s near-real-time architecture for detecting similar images. We discuss their usage of Kafka, Flink, rocksdb, and much more. Starting with the high-level requirements for the system, we discussed Pinterest’s focus on debuggability and an easy transition from their batch processing system to stream processing. We then touch on the different system interfaces and components involved such as Manas—Pinterest’s custom search engine—and how it all ends up in their custom graph database, downstream Kafka streams, and to Pinterest’s feature store—Galaxy. With Shaji’s expert knowledge of the system, we were able to do a deep dive into the system’s architecture and some of its components.
// Experiences
15+ years of experience in software product development.
Led multiple teams in a highly agile, collaborative, and cross-functional environment.
Designed and implemented highly scalable, fault-tolerant, and optimized distributed systems that scale to handle millions of requests per second. In-depth knowledge of Object-oriented programming and design patterns in C++/Java/Python/Golang.
Designed and built complex data pipelines and microservices to train and serve machine learning models.
Built analytics pipelines for processing and mining high-volume data set using Hadoop and Map-Reduce frameworks.
In-depth knowledge of distributed storage, consistency models, NoSQL data modeling, Cloud computing environment (AWS and Google Cloud).

  continue reading

440 episodes

Artwork
iconShare
 
Manage episode 313294437 series 3241972
Content provided by Demetrios. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Demetrios or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

In this Machine Learning System Design Review, Shaji Chennan Kunnummel walks us through the system design for Pinterest’s near-real-time architecture for detecting similar images. We discuss their usage of Kafka, Flink, rocksdb, and much more. Starting with the high-level requirements for the system, we discussed Pinterest’s focus on debuggability and an easy transition from their batch processing system to stream processing. We then touch on the different system interfaces and components involved such as Manas—Pinterest’s custom search engine—and how it all ends up in their custom graph database, downstream Kafka streams, and to Pinterest’s feature store—Galaxy. With Shaji’s expert knowledge of the system, we were able to do a deep dive into the system’s architecture and some of its components.
// Experiences
15+ years of experience in software product development.
Led multiple teams in a highly agile, collaborative, and cross-functional environment.
Designed and implemented highly scalable, fault-tolerant, and optimized distributed systems that scale to handle millions of requests per second. In-depth knowledge of Object-oriented programming and design patterns in C++/Java/Python/Golang.
Designed and built complex data pipelines and microservices to train and serve machine learning models.
Built analytics pipelines for processing and mining high-volume data set using Hadoop and Map-Reduce frameworks.
In-depth knowledge of distributed storage, consistency models, NoSQL data modeling, Cloud computing environment (AWS and Google Cloud).

  continue reading

440 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play