Go offline with the Player FM app!
[QA] Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory
Manage episode 476424312 series 3524393
Dynamic Cheatsheet (DC) enhances language models with persistent memory, improving performance on various tasks by enabling test-time learning and efficient reuse of problem-solving insights without altering model parameters.
https://arxiv.org/abs//2504.07952
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
2303 episodes
Manage episode 476424312 series 3524393
Dynamic Cheatsheet (DC) enhances language models with persistent memory, improving performance on various tasks by enabling test-time learning and efficient reuse of problem-solving insights without altering model parameters.
https://arxiv.org/abs//2504.07952
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
2303 episodes
All episodes
×
1 [QA] HYPERSTEER: Activation Steering at Scale with Hypernetworks 7:49

1 HYPERSTEER: Activation Steering at Scale with Hypernetworks 9:15



1 [QA] Accelerating Diffusion LLMs via Adaptive Parallel Decoding 8:08

1 Accelerating Diffusion LLMs via Adaptive Parallel Decoding 21:09

1 [QA] Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning 7:34

1 Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning 16:44

1 [QA] Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning 8:08

1 Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning 23:02

1 [QA] ALPHAONE: Reasoning Models Thinking Slow and Fast at Test Time 7:21

1 ALPHAONE: Reasoning Models Thinking Slow and Fast at Test Time 17:12

1 [QA] ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models 7:40
Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.