Go offline with the Player FM app!
[QA] Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Manage episode 486657523 series 3524393
This study explores Reinforcement Learning with Verifiable Rewards (RLVR) through token entropy patterns, revealing that high-entropy tokens significantly enhance reasoning performance in Large Language Models.
https://arxiv.org/abs//2506.01939
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
2389 episodes
Manage episode 486657523 series 3524393
This study explores Reinforcement Learning with Verifiable Rewards (RLVR) through token entropy patterns, revealing that high-entropy tokens significantly enhance reasoning performance in Large Language Models.
https://arxiv.org/abs//2506.01939
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
2389 episodes
All episodes
×
1 [QA] Cascade: Token-Sharded Private LLM Inference 7:04

1 Cascade: Token-Sharded Private LLM Inference 35:03

1 [QA] Real-TabPFN: Improving Tabular Foundation Models via Continued Pre-training With Real-World Data 7:28

1 Real-TabPFN: Improving Tabular Foundation Models via Continued Pre-training With Real-World Data 10:15

1 [QA] Strategic Intelligence in Large Language Models Evidence from evolutionary Game Theory. 7:21

1 Strategic Intelligence in Large Language Models Evidence from evolutionary Game Theory. 34:06

1 [QA] Fast and Simplex: 2-Simplicial Attention in Triton 7:28

1 Fast and Simplex: 2-Simplicial Attention in Triton 17:55

1 [QA] Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning 7:21

1 Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning 15:33

1 [QA] DABstep: Data Agent Benchmark for Multi-step Reasoning 7:54

1 DABstep: Data Agent Benchmark for Multi-step Reasoning 16:50

1 [QA] Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in Inference-time Scaling? 8:16

1 Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in Inference-time Scaling? 16:52

1 [QA] LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs 8:19

1 LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs 14:25

1 [QA] Performance Prediction for Large Systems via Text-to-Text Regression 8:40

1 Performance Prediction for Large Systems via Text-to-Text Regression 20:32

1 [QA] From Memories to Maps: Mechanisms of In-Context Reinforcement Learning in Transformers 7:47

1 From Memories to Maps: Mechanisms of In-Context Reinforcement Learning in Transformers 20:44

1 [QA] OmniGen2: Exploration to Advanced Multimodal Generation 7:44

1 OmniGen2: Exploration to Advanced Multimodal Generation 32:16

1 [QA] OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling 7:28

1 OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling 25:52

1 [QA] Potemkin Understanding in Large Language Models 8:04

1 Potemkin Understanding in Large Language Models 17:20

1 [QA] Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test 7:49

1 Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test 18:35

1 [QA] MMSearch-R1: Incentivizing LMMs to Search 8:11

Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.