Reward Models | Data Brew | Episode 40

Data Brew by Databricks

Player FM - Internet Radio Done Right

73 subscribers

Data Science

Artificial Intelligence

Added five years ago

Content provided by Databricks. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Databricks or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Ask Grumpy

1
Helpless Hydrangea 6:24

7 days ago6:24

Play Later

Lists

Liked

6:24

Grumpy helps a reader choose the best hydrangea for their hometown. Plus, Grumpy’s gripe of the week. You can find us online at southernliving.com/askgrumpy Ask Grumpy Credits: Steve Bender aka The Grumpy Gardener - Host Nellah McGough - Co-Host Krissy Tiglias - GM, Southern Living Lottie Leymarie - Executive Producer Michael Onufrak - Audio Engineer/Producer Isaac Nunn - Recording Tech Learn more about your ad choices. Visit podcastchoices.com/adchoices…

about a year ago 39:58

MP3•Episode home

In this episode, Brandon Cui, Research Scientist at MosaicML and Databricks, dives into cutting-edge advancements in AI model optimization, focusing on Reward Models and Reinforcement Learning from Human Feedback (RLHF).
Highlights include:
- How synthetic data and RLHF enable fine-tuning models to generate preferred outcomes.
- Techniques like Policy Proximal Optimization (PPO) and Direct Preference
Optimization (DPO) for enhancing response quality.
- The role of reward models in improving coding, math, reasoning, and other NLP tasks.
Connect with Brandon Cui:
https://www.linkedin.com/in/bcui19/

43 episodes

#Databricks #Data Analytics #Apache Spark #Delta Lake #Machine Learning #Data Engineering #Artificial Intelligence #Tech #Data Science #Science #Lifestyle #Podcasting Education

Data Brew by Databricks