Go offline with the Player FM app!
Podcasts Worth a Listen
SPONSORED


1 America’s Sweethearts: Dallas Cowboys Cheerleaders Season 2 - Tryouts, Tears, & Texas 32:48
Are Reasoning Models More Prone to Hallucination?
Manage episode 485766257 series 3524393
This paper investigates hallucination in large reasoning models, analyzing post-training effects, cognitive behaviors, and model uncertainty, revealing insights into their impact on factual accuracy.
https://arxiv.org/abs//2505.23646
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
2413 episodes
Manage episode 485766257 series 3524393
This paper investigates hallucination in large reasoning models, analyzing post-training effects, cognitive behaviors, and model uncertainty, revealing insights into their impact on factual accuracy.
https://arxiv.org/abs//2505.23646
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
2413 episodes
All episodes
×
1 [QA] AGENTSNET: Coordination and Collaborative Reasoning in Multi-Agent LLMs 7:37

1 AGENTSNET: Coordination and Collaborative Reasoning in Multi-Agent LLMs 19:47



1 [QA] Should We Still Pretrain Encoders with Masked Language Modeling? 8:09

1 Should We Still Pretrain Encoders with Masked Language Modeling? 16:52

1 [QA] Token Bottleneck: One Token to Remember Dynamics 7:30

1 Token Bottleneck: One Token to Remember Dynamics 16:06

1 [QA] A Systematic Analysis of Hybrid Linear Attention 7:55

1 A Systematic Analysis of Hybrid Linear Attention 15:40



1 [QA] Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs 8:31

1 Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs 15:32

1 [QA] Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving 8:09

1 Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving 21:33

1 [QA] Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful 7:03

1 Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful 18:57

1 [QA] The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation 7:35

1 The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation 23:36

1 [QA] Cascade: Token-Sharded Private LLM Inference 7:04

1 Cascade: Token-Sharded Private LLM Inference 35:03

1 [QA] Real-TabPFN: Improving Tabular Foundation Models via Continued Pre-training With Real-World Data 7:28

1 Real-TabPFN: Improving Tabular Foundation Models via Continued Pre-training With Real-World Data 10:15

1 [QA] Strategic Intelligence in Large Language Models Evidence from evolutionary Game Theory. 7:21

1 Strategic Intelligence in Large Language Models Evidence from evolutionary Game Theory. 34:06

1 [QA] Fast and Simplex: 2-Simplicial Attention in Triton 7:28

1 Fast and Simplex: 2-Simplicial Attention in Triton 17:55

1 [QA] Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning 7:21

1 Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning 15:33

1 [QA] DABstep: Data Agent Benchmark for Multi-step Reasoning 7:54

1 DABstep: Data Agent Benchmark for Multi-step Reasoning 16:50

1 [QA] Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in Inference-time Scaling? 8:16

1 Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in Inference-time Scaling? 16:52

1 [QA] LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs 8:19

1 LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs 14:25

1 [QA] Performance Prediction for Large Systems via Text-to-Text Regression 8:40

1 Performance Prediction for Large Systems via Text-to-Text Regression 20:32

1 [QA] From Memories to Maps: Mechanisms of In-Context Reinforcement Learning in Transformers 7:47

1 From Memories to Maps: Mechanisms of In-Context Reinforcement Learning in Transformers 20:44

1 [QA] OmniGen2: Exploration to Advanced Multimodal Generation 7:44

1 OmniGen2: Exploration to Advanced Multimodal Generation 32:16

1 [QA] OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling 7:28

1 OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling 25:52

1 [QA] Potemkin Understanding in Large Language Models 8:04

1 Potemkin Understanding in Large Language Models 17:20

1 [QA] Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test 7:49

1 Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test 18:35

1 [QA] MMSearch-R1: Incentivizing LMMs to Search 8:11


1 [QA] Thought Anchors: Which LLM Reasoning Steps Matter? 7:51

1 Thought Anchors: Which LLM Reasoning Steps Matter? 15:41

1 [QA] Scaling Speculative Decoding with LOOKAHEAD REASONING 8:06

1 Scaling Speculative Decoding with LOOKAHEAD REASONING 22:49

1 [QA] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations 7:55

1 Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations 16:59

1 [QA] Watermarking Autoregressive Image Generation 7:39

1 Watermarking Autoregressive Image Generation 27:33

1 [QA] Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights 6:43

1 Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights 11:26

1 [QA] Flat Channels to Infinity in Neural Loss Landscapes 7:16

1 Flat Channels to Infinity in Neural Loss Landscapes 15:03

1 [QA] Approximating Language Model Training Data from Weights 7:34

1 Approximating Language Model Training Data from Weights 21:37

1 [QA] GenRecal: Generation after Recalibration from Large to Small Vision-Language Models 7:40

1 GenRecal: Generation after Recalibration from Large to Small Vision-Language Models 17:19

1 [QA] ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs 8:30

1 ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs 12:10

1 [QA] Sampling from Your Language Model One Byte at a Time 7:05

1 Sampling from Your Language Model One Byte at a Time 13:35

1 [QA] Don't throw the baby out with the bathwater: How and why deep learning for ARC 7:44

1 Don't throw the baby out with the bathwater: How and why deep learning for ARC 32:30

1 [QA] What Happens During the Loss Plateau? Understanding Abrupt Learning in Transformers 7:18

1 What Happens During the Loss Plateau? Understanding Abrupt Learning in Transformers 19:43

1 [QA] MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention 8:28

1 MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention 25:05

1 [QA] Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation 8:10

1 Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation 16:59

1 [QA] TreeRL: LLM Reinforcement Learning with On-Policy Tree Search 7:17

1 TreeRL: LLM Reinforcement Learning with On-Policy Tree Search 19:00

1 [QA] Solving Inequality Proofs with Large Language Models 8:20

1 Solving Inequality Proofs with Large Language Models 23:49

1 [QA] Reinforcement Learning Teachers of Test Time Scaling 7:54

1 Reinforcement Learning Teachers of Test Time Scaling 22:37

1 [QA] Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers 7:05

1 Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers 18:28

1 [QA] Spurious Rewards: Rethinking Training Signals in RLVR 7:41

1 Spurious Rewards: Rethinking Training Signals in RLVR 30:13

1 [QA] Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation 8:08

1 Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation 24:30





1 [QA] Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening 7:45

1 Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening 16:56

Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.