Are Reasoning Models More Prone To Hallucination? Arxiv Papers podcast

A

Arxiv Papers

1
[QA] AGENTSNET: Coordination and Collaborative Reasoning in Multi-Agent LLMs 7:37

4 days ago7:37

7:37

AGENTSNET is a new benchmark for evaluating multi-agent systems' collaborative problem-solving, self-organization, and communication, revealing performance limitations as network size increases among large-language models. https://arxiv.org/abs//2507.08616 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
AGENTSNET: Coordination and Collaborative Reasoning in Multi-Agent LLMs 19:47

4 days ago19:47

19:47

AGENTSNET is a new benchmark for evaluating multi-agent systems' collaborative problem-solving, self-organization, and communication, revealing performance limitations as network size increases among large-language models. https://arxiv.org/abs//2507.08616 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] One Token to Fool LLM-as-a-Judge 7:56

4 days ago7:56

7:56

Generative reward models using LLMs for evaluating answer quality are vulnerable to superficial manipulations, prompting the need for improved evaluation methods and a robust new model to enhance reliability. https://arxiv.org/abs//2507.08794 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
One Token to Fool LLM-as-a-Judge 17:55

4 days ago17:55

17:55

Generative reward models using LLMs for evaluating answer quality are vulnerable to superficial manipulations, prompting the need for improved evaluation methods and a robust new model to enhance reliability. https://arxiv.org/abs//2507.08794 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Should We Still Pretrain Encoders with Masked Language Modeling? 8:09

5 days ago8:09

8:09

This paper compares Masked Language Modeling and Causal Language Modeling for text representation, finding MLM generally performs better, but CLM offers data efficiency and stability, suggesting a biphasic training strategy. https://arxiv.org/abs//2507.00994 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Should We Still Pretrain Encoders with Masked Language Modeling? 16:52

5 days ago16:52

16:52

This paper compares Masked Language Modeling and Causal Language Modeling for text representation, finding MLM generally performs better, but CLM offers data efficiency and stability, suggesting a biphasic training strategy. https://arxiv.org/abs//2507.00994 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Token Bottleneck: One Token to Remember Dynamics 7:30

5 days ago7:30

7:30

The paper presents Token Bottleneck (ToBo), a self-supervised learning method for compact visual representations, enhancing sequential scene understanding and demonstrating effectiveness in various tasks and real-world applications. https://arxiv.org/abs//2507.06543 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Token Bottleneck: One Token to Remember Dynamics 16:06

5 days ago16:06

16:06

The paper presents Token Bottleneck (ToBo), a self-supervised learning method for compact visual representations, enhancing sequential scene understanding and demonstrating effectiveness in various tasks and real-world applications. https://arxiv.org/abs//2507.06543 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] A Systematic Analysis of Hybrid Linear Attention 7:55

6 days ago7:55

7:55

https://arxiv.org/abs//2507.06457 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

A

Arxiv Papers

1
A Systematic Analysis of Hybrid Linear Attention 15:40

6 days ago15:40

15:40

https://arxiv.org/abs//2507.06457 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

A

Arxiv Papers

1
[QA] First Return, Entropy-Eliciting Explore 7:43

6 days ago7:43

7:43

https://arxiv.org/abs//2507.07017 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

A

Arxiv Papers

1
First Return, Entropy-Eliciting Explore 21:32

6 days ago21:32

21:32

https://arxiv.org/abs//2507.07017 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

A

Arxiv Papers

1
[QA] Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs 8:31

6 days ago8:31

8:31

Pretrained neural networks can adapt their architecture dynamically for different inputs, improving efficiency and performance by customizing layer usage without finetuning, as shown through Monte Carlo Tree Search optimization. https://arxiv.org/abs//2507.07996 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs 15:32

6 days ago15:32

15:32

Pretrained neural networks can adapt their architecture dynamically for different inputs, improving efficiency and performance by customizing layer usage without finetuning, as shown through Monte Carlo Tree Search optimization. https://arxiv.org/abs//2507.07996 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Scaling RL to Long Videos 8:19

6 days ago8:19

8:19

https://arxiv.org/abs//2507.07966 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

A

Arxiv Papers

1
Scaling RL to Long Videos 15:24

6 days ago15:24

15:24

https://arxiv.org/abs//2507.07966 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

A

Arxiv Papers

1
[QA] Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving 8:09

8 days ago8:09

8:09

The paper proposes a decoupled framework for Automated Theorem Proving, enhancing reasoning and proving performance by using specialized models, achieving success on challenging mathematical problems. https://arxiv.org/abs//2507.06804 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving 21:33

8 days ago21:33

21:33

The paper proposes a decoupled framework for Automated Theorem Proving, enhancing reasoning and proving performance by using specialized models, achieving success on challenging mathematical problems. https://arxiv.org/abs//2507.06804 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful 7:03

8 days ago7:03

7:03

This paper challenges conventional wisdom on small batch sizes in language model training, demonstrating their stability, robustness, and efficiency, while providing guidelines for hyperparameter adjustments and batch size selection. https://arxiv.org/abs//2507.07101 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful 18:57

8 days ago18:57

18:57

This paper challenges conventional wisdom on small batch sizes in language model training, demonstrating their stability, robustness, and efficiency, while providing guidelines for hyperparameter adjustments and batch size selection. https://arxiv.org/abs//2507.07101 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation 7:35

8 days ago7:35

7:35

This paper reviews Large Language Models' memorization, exploring its causes, detection methods, implications, and mitigation strategies, while highlighting challenges in balancing memorization minimization with model utility. https://arxiv.org/abs//2507.05578 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation 23:36

8 days ago23:36

23:36

This paper reviews Large Language Models' memorization, exploring its causes, detection methods, implications, and mitigation strategies, while highlighting challenges in balancing memorization minimization with model utility. https://arxiv.org/abs//2507.05578 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Differential Mamba 7:09

8 days ago7:09

7:09

This paper introduces a novel differential mechanism for Mamba architecture, enhancing retrieval capabilities and performance while addressing attention overallocation issues found in sequence models like Transformers and RNNs. https://arxiv.org/abs//2507.06204 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Differential Mamba 18:31

8 days ago18:31

18:31

This paper introduces a novel differential mechanism for Mamba architecture, enhancing retrieval capabilities and performance while addressing attention overallocation issues found in sequence models like Transformers and RNNs. https://arxiv.org/abs//2507.06204 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Cascade: Token-Sharded Private LLM Inference 7:04

10 days ago7:04

7:04

The paper presents Cascade, a multi-party inference protocol that enhances performance and scalability while maintaining privacy for large language models, outperforming existing secure schemes. https://arxiv.org/abs//2507.05228 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Cascade: Token-Sharded Private LLM Inference 35:03

10 days ago35:03

35:03

The paper presents Cascade, a multi-party inference protocol that enhances performance and scalability while maintaining privacy for large language models, outperforming existing secure schemes. https://arxiv.org/abs//2507.05228 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Real-TabPFN: Improving Tabular Foundation Models via Continued Pre-training With Real-World Data 7:28

10 days ago7:28

7:28

Real-TabPFN enhances tabular data performance by continued pre-training on curated real-world datasets, outperforming models trained on broader datasets, achieving significant gains on 29 OpenML AutoML Benchmark datasets. https://arxiv.org/abs//2507.03971 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Real-TabPFN: Improving Tabular Foundation Models via Continued Pre-training With Real-World Data 10:15

10 days ago10:15

10:15

Real-TabPFN enhances tabular data performance by continued pre-training on curated real-world datasets, outperforming models trained on broader datasets, achieving significant gains on 29 OpenML AutoML Benchmark datasets. https://arxiv.org/abs//2507.03971 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Strategic Intelligence in Large Language Models Evidence from evolutionary Game Theory. 7:21

10 days ago7:21

7:21

This study explores Large Language Models' strategic intelligence in competitive settings, revealing their reasoning abilities and distinct strategies in evolutionary Iterated Prisoner's Dilemma tournaments against traditional strategies. https://arxiv.org/abs//2507.02618 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Strategic Intelligence in Large Language Models Evidence from evolutionary Game Theory. 34:06

10 days ago34:06

34:06

This study explores Large Language Models' strategic intelligence in competitive settings, revealing their reasoning abilities and distinct strategies in evolutionary Iterated Prisoner's Dilemma tournaments against traditional strategies. https://arxiv.org/abs//2507.02618 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Fast and Simplex: 2-Simplicial Attention in Triton 7:28

10 days ago7:28

7:28

This paper explores the 2-simplicial Transformer, which enhances token efficiency over standard Transformers, improving performance on mathematics, coding, reasoning, and logic tasks within fixed token budgets. https://arxiv.org/abs//2507.02754 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Fast and Simplex: 2-Simplicial Attention in Triton 17:55

10 days ago17:55

17:55

This paper explores the 2-simplicial Transformer, which enhances token efficiency over standard Transformers, improving performance on mathematics, coding, reasoning, and logic tasks within fixed token budgets. https://arxiv.org/abs//2507.02754 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning 7:21

16 days ago7:21

7:21

https://arxiv.org/abs//2507.00432 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

A

Arxiv Papers

1
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning 15:33

16 days ago15:33

15:33

https://arxiv.org/abs//2507.00432 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

A

Arxiv Papers

1
[QA] DABstep: Data Agent Benchmark for Multi-step Reasoning 7:54

17 days ago7:54

7:54

DABstep is a benchmark for evaluating AI agents on multi-step data analysis tasks, featuring 450 real-world challenges that test data processing and contextual reasoning capabilities. https://arxiv.org/abs//2506.23719 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
DABstep: Data Agent Benchmark for Multi-step Reasoning 16:50

17 days ago16:50

16:50

DABstep is a benchmark for evaluating AI agents on multi-step data analysis tasks, featuring 450 real-world challenges that test data processing and contextual reasoning capabilities. https://arxiv.org/abs//2506.23719 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in Inference-time Scaling? 8:16

17 days ago8:16

8:16

This paper explores the effectiveness of inference-time techniques in vision-language models, finding that generation-based methods enhance reasoning more than verification methods, while self-correction in RL models shows limited benefits. https://arxiv.org/abs//2506.17417 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in Inference-time Scaling? 16:52

17 days ago16:52

16:52

This paper explores the effectiveness of inference-time techniques in vision-language models, finding that generation-based methods enhance reasoning more than verification methods, while self-correction in RL models shows limited benefits. https://arxiv.org/abs//2506.17417 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs 8:19

18 days ago8:19

8:19

LLaVA-Scissor introduces a training-free token compression method for video multimodal models, utilizing Semantic Connected Components for effective, non-redundant semantic coverage, outperforming existing methods in various benchmarks. https://arxiv.org/abs//2506.21862 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs 14:25

18 days ago14:25

14:25

LLaVA-Scissor introduces a training-free token compression method for video multimodal models, utilizing Semantic Connected Components for effective, non-redundant semantic coverage, outperforming existing methods in various benchmarks. https://arxiv.org/abs//2506.21862 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Performance Prediction for Large Systems via Text-to-Text Regression 8:40

18 days ago8:40

8:40

https://arxiv.org/abs//2506.21718 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

A

Arxiv Papers

1
Performance Prediction for Large Systems via Text-to-Text Regression 20:32

18 days ago20:32

20:32

https://arxiv.org/abs//2506.21718 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

A

Arxiv Papers

1
[QA] From Memories to Maps: Mechanisms of In-Context Reinforcement Learning in Transformers 7:47

18 days ago7:47

7:47

This study explores how transformers can model rapid adaptation in learning, highlighting the role of episodic memory and caching in decision-making, paralleling cognitive processes in the brain. https://arxiv.org/abs//2506.19686 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
From Memories to Maps: Mechanisms of In-Context Reinforcement Learning in Transformers 20:44

18 days ago20:44

20:44

This study explores how transformers can model rapid adaptation in learning, highlighting the role of episodic memory and caching in decision-making, paralleling cognitive processes in the brain. https://arxiv.org/abs//2506.19686 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] OmniGen2: Exploration to Advanced Multimodal Generation 7:44

18 days ago7:44

7:44

OmniGen2 is an open-source generative model for diverse tasks like text-to-image and image editing, featuring distinct decoding pathways and achieving competitive results with modest parameters. https://arxiv.org/abs//2506.18871 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
OmniGen2: Exploration to Advanced Multimodal Generation 32:16

18 days ago32:16

32:16

OmniGen2 is an open-source generative model for diverse tasks like text-to-image and image editing, featuring distinct decoding pathways and achieving competitive results with modest parameters. https://arxiv.org/abs//2506.18871 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling 7:28

20 days ago7:28

7:28

https://arxiv.org/abs//2506.20512 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

A

Arxiv Papers

1
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling 25:52

20 days ago25:52

25:52

https://arxiv.org/abs//2506.20512 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

A

Arxiv Papers

1
[QA] Potemkin Understanding in Large Language Models 8:04

20 days ago8:04

8:04

This paper introduces a framework to evaluate large language models, revealing that their benchmark success often reflects superficial understanding, with pervasive internal incoherence in concept representations. https://arxiv.org/abs//2506.21521 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Potemkin Understanding in Large Language Models 17:20

20 days ago17:20

17:20

This paper introduces a framework to evaluate large language models, revealing that their benchmark success often reflects superficial understanding, with pervasive internal incoherence in concept representations. https://arxiv.org/abs//2506.21521 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test 7:49

21 days ago7:49

7:49

This study explores grokking in large language models during pretraining, revealing how training pathways evolve from random to structured, enhancing generalization despite converged loss. https://arxiv.org/abs//2506.21551 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test 18:35

21 days ago18:35

18:35

This study explores grokking in large language models during pretraining, revealing how training pathways evolve from random to structured, enhancing generalization despite converged loss. https://arxiv.org/abs//2506.21551 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] MMSearch-R1: Incentivizing LMMs to Search 8:11

21 days ago8:11

8:11

MMSearch-R1 is a reinforcement learning framework for large multimodal models, enabling efficient, on-demand multi-turn search in real-world environments, outperforming existing methods while reducing search calls by over 30%. https://arxiv.org/abs//2506.20670 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
MMSearch-R1: Incentivizing LMMs to Search 18:50

21 days ago18:50

18:50

MMSearch-R1 is a reinforcement learning framework for large multimodal models, enabling efficient, on-demand multi-turn search in real-world environments, outperforming existing methods while reducing search calls by over 30%. https://arxiv.org/abs//2506.20670 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Thought Anchors: Which LLM Reasoning Steps Matter? 7:51

22 days ago7:51

7:51

The paper explores sentence-level analysis of reasoning in large language models, presenting three methods to identify influential "thought anchors" that shape multi-step reasoning processes. An open-source tool is provided. https://arxiv.org/abs//2506.19143 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Thought Anchors: Which LLM Reasoning Steps Matter? 15:41

22 days ago15:41

15:41

The paper explores sentence-level analysis of reasoning in large language models, presenting three methods to identify influential "thought anchors" that shape multi-step reasoning processes. An open-source tool is provided. https://arxiv.org/abs//2506.19143 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Scaling Speculative Decoding with LOOKAHEAD REASONING 8:06

23 days ago8:06

8:06

LOOKAHEAD REASONING enhances token-level speculative decoding by introducing step-level parallelism, improving speedup from 1.4x to 2.1x while maintaining answer quality across various benchmarks. https://arxiv.org/abs//2506.19830 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Scaling Speculative Decoding with LOOKAHEAD REASONING 22:49

23 days ago22:49

22:49

LOOKAHEAD REASONING enhances token-level speculative decoding by introducing step-level parallelism, improving speedup from 1.4x to 2.1x while maintaining answer quality across various benchmarks. https://arxiv.org/abs//2506.19830 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations 7:55

24 days ago7:55

7:55

This paper introduces Tar, a multimodal framework integrating visual understanding and generation through a shared semantic representation, enhancing efficiency and performance in cross-modal tasks. https://arxiv.org/abs//2506.18898 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations 16:59

24 days ago16:59

16:59

This paper introduces Tar, a multimodal framework integrating visual understanding and generation through a shared semantic representation, enhancing efficiency and performance in cross-modal tasks. https://arxiv.org/abs//2506.18898 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Watermarking Autoregressive Image Generation 7:39

25 days ago7:39

7:39

https://arxiv.org/abs//2506.16349 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

A

Arxiv Papers

1
Watermarking Autoregressive Image Generation 27:33

25 days ago27:33

27:33

https://arxiv.org/abs//2506.16349 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

A

Arxiv Papers

1
[QA] Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights 6:43

25 days ago6:43

6:43

DnD introduces a prompt-conditioned parameter generator for LLMs, enabling rapid task-specific customization without separate training, achieving significant performance gains and lower overhead compared to traditional methods. https://arxiv.org/abs//2506.16406 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights 11:26

25 days ago11:26

11:26

DnD introduces a prompt-conditioned parameter generator for LLMs, enabling rapid task-specific customization without separate training, achieving significant performance gains and lower overhead compared to traditional methods. https://arxiv.org/abs//2506.16406 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Flat Channels to Infinity in Neural Loss Landscapes 7:16

27 days ago7:16

7:16

The paper characterizes special channels in neural network loss landscapes where slow loss decrease occurs, leading to gated linear units, enhancing understanding of gradient dynamics and optimization methods. https://arxiv.org/abs//2506.14951 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Flat Channels to Infinity in Neural Loss Landscapes 15:03

27 days ago15:03

15:03

The paper characterizes special channels in neural network loss landscapes where slow loss decrease occurs, leading to gated linear units, enhancing understanding of gradient dynamics and optimization methods. https://arxiv.org/abs//2506.14951 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Approximating Language Model Training Data from Weights 7:34

27 days ago7:34

7:34

The paper presents a method for approximating training data from model weights, improving performance significantly on classification tasks using a gradient-based approach to select relevant public documents. https://arxiv.org/abs//2506.15553 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Approximating Language Model Training Data from Weights 21:37

27 days ago21:37

21:37

The paper presents a method for approximating training data from model weights, improving performance significantly on classification tasks using a gradient-based approach to select relevant public documents. https://arxiv.org/abs//2506.15553 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] GenRecal: Generation after Recalibration from Large to Small Vision-Language Models 7:40

28 days ago7:40

7:40

GenRecal is a novel distillation framework for vision-language models that enhances knowledge transfer across diverse architectures, improving performance on resource-constrained devices while outperforming large-scale VLMs. https://arxiv.org/abs//2506.15681 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models 17:19

28 days ago17:19

17:19

GenRecal is a novel distillation framework for vision-language models that enhances knowledge transfer across diverse architectures, improving performance on resource-constrained devices while outperforming large-scale VLMs. https://arxiv.org/abs//2506.15681 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs 8:30

28 days ago8:30

8:30

https://arxiv.org/abs//2506.15211 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

A

Arxiv Papers

1
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs 12:10

28 days ago12:10

12:10

https://arxiv.org/abs//2506.15211 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

A

Arxiv Papers

1
[QA] Sampling from Your Language Model One Byte at a Time 7:05

30 days ago7:05

7:05

This paper presents a method to convert autoregressive language models with BPE tokenizers into character-level models, addressing tokenization issues and enabling model interoperability and improved performance through ensemble and proxy-tuning. https://arxiv.org/abs//2506.14123 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Sampling from Your Language Model One Byte at a Time 13:35

30 days ago13:35

13:35

This paper presents a method to convert autoregressive language models with BPE tokenizers into character-level models, addressing tokenization issues and enabling model interoperability and improved performance through ensemble and proxy-tuning. https://arxiv.org/abs//2506.14123 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Don't throw the baby out with the bathwater: How and why deep learning for ARC 7:44

30 days ago7:44

7:44

This paper demonstrates that deep learning, through on-the-fly training and innovative techniques, significantly enhances performance on the Abstraction and Reasoning Corpus, achieving state-of-the-art results. https://arxiv.org/abs//2506.14276 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Don't throw the baby out with the bathwater: How and why deep learning for ARC 32:30

30 days ago32:30

32:30

This paper demonstrates that deep learning, through on-the-fly training and innovative techniques, significantly enhances performance on the Abstraction and Reasoning Corpus, achieving state-of-the-art results. https://arxiv.org/abs//2506.14276 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] What Happens During the Loss Plateau? Understanding Abrupt Learning in Transformers 7:18

4 weeks ago7:18

7:18

This study explores abrupt learning in shallow Transformers, revealing a performance plateau characterized by repetition bias and representation collapse, with attention map learning as a critical bottleneck. https://arxiv.org/abs//2506.13688 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
What Happens During the Loss Plateau? Understanding Abrupt Learning in Transformers 19:43

4 weeks ago19:43

19:43

This study explores abrupt learning in shallow Transformers, revealing a performance plateau characterized by repetition bias and representation collapse, with attention map learning as a critical bottleneck. https://arxiv.org/abs//2506.13688 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention 8:28

4 weeks ago8:28

8:28

https://arxiv.org/abs//2506.13585 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

A

Arxiv Papers

1
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention 25:05

4 weeks ago25:05

25:05

https://arxiv.org/abs//2506.13585 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

A

Arxiv Papers

1
[QA] Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation 8:10

5 weeks ago8:10

8:10

We present a diffusion-based framework for aligned novel view image and geometry generation, utilizing warping, inpainting, and cross-modal attention distillation for enhanced synthesis and prediction accuracy. https://arxiv.org/abs//2506.11924 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation 16:59

5 weeks ago16:59

16:59

We present a diffusion-based framework for aligned novel view image and geometry generation, utilizing warping, inpainting, and cross-modal attention distillation for enhanced synthesis and prediction accuracy. https://arxiv.org/abs//2506.11924 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] TreeRL: LLM Reinforcement Learning with On-Policy Tree Search 7:17

5 weeks ago7:17

7:17

TreeRL is a novel reinforcement learning framework that integrates on-policy tree search, improving exploration and efficiency in reasoning tasks, outperforming traditional methods in benchmarks. https://arxiv.org/abs//2506.11902 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
TreeRL: LLM Reinforcement Learning with On-Policy Tree Search 19:00

5 weeks ago19:00

19:00

TreeRL is a novel reinforcement learning framework that integrates on-policy tree search, improving exploration and efficiency in reasoning tasks, outperforming traditional methods in benchmarks. https://arxiv.org/abs//2506.11902 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Solving Inequality Proofs with Large Language Models 8:20

5 weeks ago8:20

8:20

The paper addresses challenges in inequality proving for LLMs, introducing the INEQMATH dataset and a novel evaluation framework, revealing significant gaps in reasoning accuracy among leading models. https://arxiv.org/abs//2506.07927 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Solving Inequality Proofs with Large Language Models 23:49

5 weeks ago23:49

23:49

The paper addresses challenges in inequality proving for LLMs, introducing the INEQMATH dataset and a novel evaluation framework, revealing significant gaps in reasoning accuracy among leading models. https://arxiv.org/abs//2506.07927 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Reinforcement Learning Teachers of Test Time Scaling 7:54

5 weeks ago7:54

7:54

The paper introduces Reinforcement-Learned Teachers (RLTs) that enhance distillation efficiency by providing detailed explanations, outperforming larger models in reasoning tasks without requiring extensive exploration. https://arxiv.org/abs//2506.08388 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Reinforcement Learning Teachers of Test Time Scaling 22:37

5 weeks ago22:37

22:37

The paper introduces Reinforcement-Learned Teachers (RLTs) that enhance distillation efficiency by providing detailed explanations, outperforming larger models in reasoning tasks without requiring extensive exploration. https://arxiv.org/abs//2506.08388 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers 7:05

5 weeks ago7:05

7:05

This study explores out-of-context reasoning in large language models, linking generalization and hallucination to a single mechanism, and formalizes it as a synthetic factual recall task. https://arxiv.org/abs//2506.10887 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers 18:28

5 weeks ago18:28

18:28

This study explores out-of-context reasoning in large language models, linking generalization and hallucination to a single mechanism, and formalizes it as a synthetic factual recall task. https://arxiv.org/abs//2506.10887 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Spurious Rewards: Rethinking Training Signals in RLVR 7:41

5 weeks ago7:41

7:41

Reinforcement learning with verifiable rewards (RLVR) enhances mathematical reasoning in Qwen2.5-Math, achieving notable performance improvements, but spurious rewards may not benefit other models like Llama3 or OLMo2. https://arxiv.org/abs//2506.10947 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Spurious Rewards: Rethinking Training Signals in RLVR 30:13

5 weeks ago30:13

30:13

Reinforcement learning with verifiable rewards (RLVR) enhances mathematical reasoning in Qwen2.5-Math, achieving notable performance improvements, but spurious rewards may not benefit other models like Llama3 or OLMo2. https://arxiv.org/abs//2506.10947 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation 8:08

5 weeks ago8:08

8:08

https://arxiv.org/abs//2506.09991 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

A

Arxiv Papers

1
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation 24:30

5 weeks ago24:30

24:30

https://arxiv.org/abs//2506.09991 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

A

Arxiv Papers

1
[QA] Reinforcement Pre-Training 7:23

5 weeks ago7:23

7:23

Reinforcement Pre-Training (RPT) enhances language models by using reinforcement learning for next-token prediction, improving accuracy and providing a strong foundation for further fine-tuning. https://arxiv.org/abs//2506.08007 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Reinforcement Pre-Training 11:07

5 weeks ago11:07

11:07

Reinforcement Pre-Training (RPT) enhances language models by using reinforcement learning for next-token prediction, improving accuracy and providing a strong foundation for further fine-tuning. https://arxiv.org/abs//2506.08007 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Corrector Sampling in Language Models 7:31

6 weeks ago7:31

7:31

https://arxiv.org/abs//2506.06215 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

A

Arxiv Papers

1
Corrector Sampling in Language Models 19:02

6 weeks ago19:02

19:02

https://arxiv.org/abs//2506.06215 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

A

Arxiv Papers

1
[QA] Distillation Robustifies Unlearning 7:05

6 weeks ago7:05

7:05

The paper presents UNDO, a method that enhances unlearning in LLMs through distillation, achieving robust capability removal with reduced compute and data requirements compared to traditional retraining methods. https://arxiv.org/abs//2506.06278 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Distillation Robustifies Unlearning 14:58

6 weeks ago14:58

14:58

The paper presents UNDO, a method that enhances unlearning in LLMs through distillation, achieving robust capability removal with reduced compute and data requirements compared to traditional retraining methods. https://arxiv.org/abs//2506.06278 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Log-Linear Attention 7:50

6 weeks ago7:50

7:50

This paper introduces log-linear attention, enhancing linear attention's efficiency by using a logarithmically growing set of hidden states, improving sequence modeling while maintaining computational efficiency. https://arxiv.org/abs//2506.04761 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Log-Linear Attention 21:59

6 weeks ago21:59

21:59

This paper introduces log-linear attention, enhancing linear attention's efficiency by using a logarithmically growing set of hidden states, improving sequence modeling while maintaining computational efficiency. https://arxiv.org/abs//2506.04761 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening 7:45

6 weeks ago7:45

7:45

This paper critiques GRPO's bias in training language models for theorem proving and introduces the unlikeliness reward to enhance performance and sample diversity, achieving competitive results. https://arxiv.org/abs//2506.02355 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening 16:56

6 weeks ago16:56

16:56

This paper critiques GRPO's bias in training language models for theorem proving and introduces the unlikeliness reward to enhance performance and sample diversity, achieving competitive results. https://arxiv.org/abs//2506.02355 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

A

Arxiv Papers

1
[QA] Self-Challenging Language Model Agents 7:26

6 weeks ago7:26

7:26

The Self-Challenging framework enables agents to generate and train on high-quality tasks autonomously, achieving significant performance improvements using self-generated data in tool-use benchmarks. https://arxiv.org/abs//2506.01716 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers…

Similar to Arxiv Papers

Amazon eGift Card - Bright Balloons (Animated)

Apple Watch Series 10 [GPS 42mm case] Smartwatch with Rose Gold Aluminium Case with Light Blush Sport Band - S/M. Fitness Tracker, ECG App, Always-On Retina Display, Water Resistant

The Secret Of Life: Partners, Volume 2

Podcasts Worth a Listen

Arxiv Papers « » Are Reasoning Models More Prone to Hallucination?

Are Reasoning Models More Prone to Hallucination?

Podcasts Worth a Listen

Welcome to Player FM!

Norton 360 Premium 2025, Antivirus software for 10 Devices with Auto-Renewal – Includes Advanced AI Scam Protection, VPN, Dark Web Monitoring & PC Cloud Backup [Download]

Microsoft System Builder | Windоws 11 Home | Intended use for new systems | Install on a new PC | Branded by Microsoft

The Dark Side of the Moon (50th Anniversary Remaster)

Arxiv Papers « »
Are Reasoning Models More Prone to Hallucination?