180: Reinforcement Learning Programming Throwdown podcast

Artwork

Programming Software Development Java Python Patrick Wheeler and Jason Gauci Jason Gauci Patrick Wheeler Tech Podcasting Education News Tech News Programming Language Objective-c

Content provided by Patrick Wheeler and Jason Gauci, Patrick Wheeler, and Jason Gauci. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Patrick Wheeler and Jason Gauci, Patrick Wheeler, and Jason Gauci or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Programming Throwdown « »
180: Reinforcement Learning

6M ago 1:52:22

Share

MP3•Episode home

Content provided by Patrick Wheeler and Jason Gauci, Patrick Wheeler, and Jason Gauci. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Patrick Wheeler and Jason Gauci, Patrick Wheeler, and Jason Gauci or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Intro topic: Grills

News/Links:

You can’t call yourself a senior until you’ve worked on a legacy project
- https://www.infobip.com/developers/blog/seniors-working-on-a-legacy-project
Recraft might be the most powerful AI image platform I’ve ever used — here’s why
- https://www.tomsguide.com/ai/ai-image-video/recraft-might-be-the-most-powerful-ai-image-platform-ive-ever-used-heres-why
NASA has a list of 10 rules for software development
- https://www.cs.otago.ac.nz/cosc345/resources/nasa-10-rules.htm
AMD Radeon RX 9070 XT performance estimates leaked: 42% to 66% faster than Radeon RX 7900 GRE
- https://www.tomshardware.com/tech-industry/amd-estimates-of-radeon-rx-9070-xt-performance-leaked-42-percent-66-percent-faster-than-radeon-rx-7900-gre

Book of the Show

Patrick:
- The Player of Games (Ian M Banks)
  - https://a.co/d/1ZpUhGl (non-affiliate)
Jason:
- Basic Roleplaying Universal Game Engine
  - https://amzn.to/3ES4p5i

Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h

Tool of the Show

Patrick:
- Pokemon Sword and Shield
Jason:
- Features and Labels ( https://fal.ai )

Topic: Reinforcement Learning

Three types of AI
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
Online vs Offline RL
Optimization algorithms
- Value optimization
  - SARSA
  - Q-Learning
- Policy optimization
  - Policy Gradients
  - Actor-Critic
  - Proximal Policy Optimization
Value vs Policy Optimization
- Value optimization is more intuitive (Value loss)
- Policy optimization is less intuitive at first (policy gradients)
- Converting values to policies in deep learning is difficult
Imitation Learning
- Supervised policy learning
- Often used to bootstrap reinforcement learning
Policy Evaluation
- Propensity scoring versus model-based
Challenges to training RL model
- Two optimization loops
  - Collecting feedback vs updating the model
- Difficult optimization target
  - Policy evaluation
RLHF & GRPO

★ Support this podcast on Patreon ★

… continue reading

184 episodes

#Programming #Software Development #Java #Python #Patrick Wheeler and Jason Gauci #Jason Gauci #Patrick Wheeler #Tech #Podcasting Education #News #Tech News #Programming Language #Objective-c

Artwork

180: Reinforcement Learning

Programming Throwdown

148 subscribers

published 6M ago

Share

MP3•Episode home

Content provided by Patrick Wheeler and Jason Gauci, Patrick Wheeler, and Jason Gauci. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Patrick Wheeler and Jason Gauci, Patrick Wheeler, and Jason Gauci or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Intro topic: Grills

News/Links:

You can’t call yourself a senior until you’ve worked on a legacy project
- https://www.infobip.com/developers/blog/seniors-working-on-a-legacy-project
Recraft might be the most powerful AI image platform I’ve ever used — here’s why
- https://www.tomsguide.com/ai/ai-image-video/recraft-might-be-the-most-powerful-ai-image-platform-ive-ever-used-heres-why
NASA has a list of 10 rules for software development
- https://www.cs.otago.ac.nz/cosc345/resources/nasa-10-rules.htm
AMD Radeon RX 9070 XT performance estimates leaked: 42% to 66% faster than Radeon RX 7900 GRE
- https://www.tomshardware.com/tech-industry/amd-estimates-of-radeon-rx-9070-xt-performance-leaked-42-percent-66-percent-faster-than-radeon-rx-7900-gre

Book of the Show

Patrick:
- The Player of Games (Ian M Banks)
  - https://a.co/d/1ZpUhGl (non-affiliate)
Jason:
- Basic Roleplaying Universal Game Engine
  - https://amzn.to/3ES4p5i

Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h

Tool of the Show

Patrick:
- Pokemon Sword and Shield
Jason:
- Features and Labels ( https://fal.ai )

Topic: Reinforcement Learning

Three types of AI
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
Online vs Offline RL
Optimization algorithms
- Value optimization
  - SARSA
  - Q-Learning
- Policy optimization
  - Policy Gradients
  - Actor-Critic
  - Proximal Policy Optimization
Value vs Policy Optimization
- Value optimization is more intuitive (Value loss)
- Policy optimization is less intuitive at first (policy gradients)
- Converting values to policies in deep learning is difficult
Imitation Learning
- Supervised policy learning
- Often used to bootstrap reinforcement learning
Policy Evaluation
- Propensity scoring versus model-based
Challenges to training RL model
- Two optimization loops
  - Collecting feedback vs updating the model
- Difficult optimization target
  - Policy evaluation
RLHF & GRPO

★ Support this podcast on Patreon ★

… continue reading

184 episodes

#Programming #Software Development #Java #Python #Patrick Wheeler and Jason Gauci #Jason Gauci #Patrick Wheeler #Tech #Podcasting Education #News #Tech News #Programming Language #Objective-c

All episodes

×

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

Listen to 500+ topics

Listen to this show while you explore