Artwork

Content provided by Demetrios. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Demetrios or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

Cleanlab: Labeled Datasets that Correct Themselves Automatically // Curtis Northcutt // MLOps Coffee Sessions #105

1:06:10
 
Share
 

Manage episode 333116425 series 3241972
Content provided by Demetrios. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Demetrios or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

MLOps Coffee Sessions #106 with Curtis Northcutt, CEO & Co-Founder of Cleanlab, Cleanlab: Labeled Datasets that Correct Themselves Automatically co-hosted by Vishnu Rachakonda.
// Abstract
Pioneered at MIT by 3 Ph.D. Co-Founders, Cleanlab is an open-source/SaaS company building the premier data-centric AI tools workflows for (1) automatically correcting messy data and labels, (2) auto-tracking of dataset quality over time, (3) automatically finding classes to merge and delete, (4) auto ml for data tasks, (5) obtaining and ranking high-quality annotations, and (6) training ML models with messy data.
Most of the prescriptive tasks (finding issues) can be done in one line of code with their open-source product: https://github.com/cleanlab/cleanlab.
// Bio
Curtis Northcutt is the CEO and Co-Founder of Cleanlab focused on making AI work reliably for people and their messy, real-world data by automatically fixing issues in any ML dataset. Curtis completed his Ph.D. in Computer Science at MIT, receiving the MIT Thesis Award, NSF Fellowship, and the Goldwater Scholarship. Prior to Cleanlab, Curtis worked at AI research groups including Google, Oculus, Amazon, Facebook, Microsoft, and NASA.
// MLOps Jobs board
https://mlops.pallet.xyz/jobs
MLOps Swag/Merch
https://mlops-community.myshopify.com/
// Related Links
https://github.com/cleanlab/cleanlab
https://cleanlab.ai/blog/cleanlab-history/
https://labelerrors.com/ https://l7.curtisnorthcutt.com/
https://nips.cc/Conferences/2021/ScheduleMultitrack?event=47102
https://www.youtube.com/watch?v=ieUOv1sQPlw
https://cleanlab.typeform.com/to/NLnU1XZF
Cameo cheating detection system: https://arxiv.org/ftp/arxiv/papers/1508/1508.05699.pdf
The Cathedral & the Bazaar book: https://www.amazon.com/Cathedral-Bazaar-Musings-Accidental-Revolutionary/dp/0596001088
--------------- ✌️Connect With Us ✌️ -------------
Join our slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register
Catch all episodes, blogs, newsletters, and more: https://mlops.community/
Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with Vishnu on LinkedIn: https://www.linkedin.com/in/vrachakonda/
Connect with Curtis on LinkedIn: https://www.linkedin.com/in/cgnorthcutt/
Timestamps:
[00:00] Introduction to Curtis Northcutt
[00:30] Difference between MLOps and Data-Centric AI
[04:04] Realizing the problem of data quality in ML manifesting
[05:11] Computer vision problems
[06:54] War story that got Curtis into Data-Centric AI
[13:50] Overview of Curtis' vision
[14:45] PU Learning
[21:25] Consistency Rate and Flipping Rate
[25:25] One line of code
[29:48] Models makes mistakes
[33:09] Cleanlab play with the environment
[36:30] How ML Engineers should approach data quality problem
[42:42] Quantum computing
[46:39] Result of confident learning
[52:31] Utility for small data sets
[53:53] Cleanlab's huge success stories
[56:13] Rapid fire questions
[58:58] Cloudy and mystified space
[1:03:46] Cleanlab is hiring!
[1:05:06] Wrap up

  continue reading

451 episodes

Artwork
iconShare
 
Manage episode 333116425 series 3241972
Content provided by Demetrios. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Demetrios or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

MLOps Coffee Sessions #106 with Curtis Northcutt, CEO & Co-Founder of Cleanlab, Cleanlab: Labeled Datasets that Correct Themselves Automatically co-hosted by Vishnu Rachakonda.
// Abstract
Pioneered at MIT by 3 Ph.D. Co-Founders, Cleanlab is an open-source/SaaS company building the premier data-centric AI tools workflows for (1) automatically correcting messy data and labels, (2) auto-tracking of dataset quality over time, (3) automatically finding classes to merge and delete, (4) auto ml for data tasks, (5) obtaining and ranking high-quality annotations, and (6) training ML models with messy data.
Most of the prescriptive tasks (finding issues) can be done in one line of code with their open-source product: https://github.com/cleanlab/cleanlab.
// Bio
Curtis Northcutt is the CEO and Co-Founder of Cleanlab focused on making AI work reliably for people and their messy, real-world data by automatically fixing issues in any ML dataset. Curtis completed his Ph.D. in Computer Science at MIT, receiving the MIT Thesis Award, NSF Fellowship, and the Goldwater Scholarship. Prior to Cleanlab, Curtis worked at AI research groups including Google, Oculus, Amazon, Facebook, Microsoft, and NASA.
// MLOps Jobs board
https://mlops.pallet.xyz/jobs
MLOps Swag/Merch
https://mlops-community.myshopify.com/
// Related Links
https://github.com/cleanlab/cleanlab
https://cleanlab.ai/blog/cleanlab-history/
https://labelerrors.com/ https://l7.curtisnorthcutt.com/
https://nips.cc/Conferences/2021/ScheduleMultitrack?event=47102
https://www.youtube.com/watch?v=ieUOv1sQPlw
https://cleanlab.typeform.com/to/NLnU1XZF
Cameo cheating detection system: https://arxiv.org/ftp/arxiv/papers/1508/1508.05699.pdf
The Cathedral & the Bazaar book: https://www.amazon.com/Cathedral-Bazaar-Musings-Accidental-Revolutionary/dp/0596001088
--------------- ✌️Connect With Us ✌️ -------------
Join our slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register
Catch all episodes, blogs, newsletters, and more: https://mlops.community/
Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with Vishnu on LinkedIn: https://www.linkedin.com/in/vrachakonda/
Connect with Curtis on LinkedIn: https://www.linkedin.com/in/cgnorthcutt/
Timestamps:
[00:00] Introduction to Curtis Northcutt
[00:30] Difference between MLOps and Data-Centric AI
[04:04] Realizing the problem of data quality in ML manifesting
[05:11] Computer vision problems
[06:54] War story that got Curtis into Data-Centric AI
[13:50] Overview of Curtis' vision
[14:45] PU Learning
[21:25] Consistency Rate and Flipping Rate
[25:25] One line of code
[29:48] Models makes mistakes
[33:09] Cleanlab play with the environment
[36:30] How ML Engineers should approach data quality problem
[42:42] Quantum computing
[46:39] Result of confident learning
[52:31] Utility for small data sets
[53:53] Cleanlab's huge success stories
[56:13] Rapid fire questions
[58:58] Cloudy and mystified space
[1:03:46] Cleanlab is hiring!
[1:05:06] Wrap up

  continue reading

451 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play