Go offline with the Player FM app!
Podcasts Worth a Listen
SPONSORED


1 How To Replace A $100,000+ Salary Within 6 MONTHS Through Buying A Small Business w/ Alex Kamenca & Carley Mitus 57:50
Reasoning LLMs Are Just Efficient Samplers: RL Training Elicits No Transcending Capacity
Manage episode 479293092 series 3524393
This study critiques Reinforcement Learning with Verifiable Rewards (RLVR), revealing it doesn't enhance reasoning capabilities in large language models beyond base models, suggesting a need for improved training methods.
https://arxiv.org/abs//2504.13837
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
2169 episodes
Manage episode 479293092 series 3524393
This study critiques Reinforcement Learning with Verifiable Rewards (RLVR), revealing it doesn't enhance reasoning capabilities in large language models beyond base models, suggesting a need for improved training methods.
https://arxiv.org/abs//2504.13837
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
2169 episodes
All episodes
×
1 [QA] Reinforcement Learning for Reasoning in Large Language Models with One Training Example 9:15

1 Reinforcement Learning for Reasoning in Large Language Models with One Training Example 29:41

1 [QA] ReasonIR: Training Retrievers for Reasoning Tasks 8:27

1 ReasonIR: Training Retrievers for Reasoning Tasks 24:05

1 [QA] Think, Prune, Train, Improve: Scaling Reasoning Without Scaling Models 7:26

1 Think, Prune, Train, Improve: Scaling Reasoning Without Scaling Models 16:50

1 [QA] Reasoning LLMs Are Just Efficient Samplers: RL Training Elicits No Transcending Capacity 8:04

1 Reasoning LLMs Are Just Efficient Samplers: RL Training Elicits No Transcending Capacity 23:12

1 [QA] Learning Adaptive Parallel Reasoning with Language Models 7:42

1 Learning Adaptive Parallel Reasoning with Language Models 21:22

1 [QA] Boosting Generative Image Modeling via Joint Image-Feature Synthesis 7:48

1 Boosting Generative Image Modeling via Joint Image-Feature Synthesis 18:50

1 [QA] Step1X-Edit: A Practical Framework for General Image Editing 8:07

1 Step1X-Edit: A Practical Framework for General Image Editing 15:50

1 [QA] Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models 7:49

1 Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models 24:06

1 [QA] Exploring How LLMs Capture and Represent Domain-Specific Knowledge 7:44

1 Exploring How LLMs Capture and Represent Domain-Specific Knowledge 19:09

1 [QA] I-Con: A Unifying Framework for Representation Learning 7:41

1 I-Con: A Unifying Framework for Representation Learning 16:31



1 [QA] LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities 8:09

1 LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities 15:38

1 [QA] NEMOTRON-CROSSTHINK: Scaling Self-Learning beyond Math Reasoning 8:52

1 NEMOTRON-CROSSTHINK: Scaling Self-Learning beyond Math Reasoning 31:19

1 Activated LoRA: Fine-tuned LLMs for Intrinsics 18:55

1 [QA] COLORBENCH: Can VLMs See and Understand the Colorful World? 7:49

1 COLORBENCH: Can VLMs See and Understand the Colorful World? 20:40

1 [QA] ReTool: Reinforcement Learning for Strategic Tool Use in LLMs 8:33

1 ReTool: Reinforcement Learning for Strategic Tool Use in LLMs 14:57


1 [QA] How to Predict Best Pretraining Data with Small Experiments 8:16

1 How to Predict Best Pretraining Data with Small Experiments 20:22

1 [QA] Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability 7:18

1 Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability 7:07

1 [QA] DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training 7:39

1 DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training 10:11

1 [QA] Steering CLIP's vision transformer with sparse autoencoders 8:11

1 Steering CLIP's vision transformer with sparse autoencoders 17:53

1 [QA] Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning 7:54

1 Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning 7:09

1 [QA] Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model 7:38

1 Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model 16:13

1 [QA] Reasoning Models Can Be Effective Without Thinking 7:29

1 Reasoning Models Can Be Effective Without Thinking 20:05

1 [QA] A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce 8:27

1 A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce 14:38

1 [QA] CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training 7:14

1 CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training 20:35

1 [QA] Position: The Most Expensive Part of an LLM should be its Training Data 7:16

1 Position: The Most Expensive Part of an LLM should be its Training Data 20:05

1 [QA] Activated LoRA: Fine-tuned LLMs for Intrinsics 8:16

1 [QA] Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning 7:58

1 Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning 18:11




1 [QA] Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill? 7:45

1 Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill? 16:23



1 [QA] Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory 7:56

1 Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory 15:48

1 [QA] Scaling Laws for Native Multimodal Models 7:14


1 [QA] OLMOTRACE: Tracing Language Model Outputs Back to Trillions of Training Tokens 7:16
Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.