23 subscribers
Go offline with the Player FM app!
RAFT: Adapting Language Model to Domain Specific RAG
Manage episode 426158561 series 3448051
Where adapting LLMs to specialized domains is essential (e.g., recent news, enterprise private documents), we discuss a paper that asks how we adapt pre-trained LLMs for RAG in specialized domains. SallyAnn DeLucia is joined by Sai Kolasani, researcher at UC Berkeley’s RISE Lab (and Arize AI Intern), to talk about his work on RAFT: Adapting Language Model to Domain Specific RAG.
RAFT (Retrieval-Augmented FineTuning) is a training recipe that improves an LLM’s ability to answer questions in a “open-book” in-domain settings. Given a question, and a set of retrieved documents, the model is trained to ignore documents that don’t help in answering the question (aka distractor documents). This coupled with RAFT’s chain-of-thought-style response, helps improve the model’s ability to reason. In domain-specific RAG, RAFT consistently improves the model’s performance across PubMed, HotpotQA, and Gorilla datasets, presenting a post-training recipe to improve pre-trained LLMs to in-domain RAG.
Read it on the blog: https://arize.com/blog/raft-adapting-language-model-to-domain-specific-rag/
Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
49 episodes
Manage episode 426158561 series 3448051
Where adapting LLMs to specialized domains is essential (e.g., recent news, enterprise private documents), we discuss a paper that asks how we adapt pre-trained LLMs for RAG in specialized domains. SallyAnn DeLucia is joined by Sai Kolasani, researcher at UC Berkeley’s RISE Lab (and Arize AI Intern), to talk about his work on RAFT: Adapting Language Model to Domain Specific RAG.
RAFT (Retrieval-Augmented FineTuning) is a training recipe that improves an LLM’s ability to answer questions in a “open-book” in-domain settings. Given a question, and a set of retrieved documents, the model is trained to ignore documents that don’t help in answering the question (aka distractor documents). This coupled with RAFT’s chain-of-thought-style response, helps improve the model’s ability to reason. In domain-specific RAG, RAFT consistently improves the model’s performance across PubMed, HotpotQA, and Gorilla datasets, presenting a post-training recipe to improve pre-trained LLMs to in-domain RAG.
Read it on the blog: https://arize.com/blog/raft-adapting-language-model-to-domain-specific-rag/
Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
49 episodes
All episodes
×
1 Scalable Chain of Thoughts via Elastic Reasoning 28:54

1 Sleep-time Compute: Beyond Inference Scaling at Test-time 30:24

1 LibreEval: The Largest Open Source Benchmark for RAG Hallucination Detection 27:19

1 AI Benchmark Deep Dive: Gemini 2.5 and Humanity's Last Exam 26:11

1 AI Roundup: DeepSeek’s Big Moves, Claude 3.7, and the Latest Breakthroughs 30:23

1 How DeepSeek is Pushing the Boundaries of AI Development 29:54

1 Multiagent Finetuning: A Conversation with Researcher Yilun Du 30:03

1 Training Large Language Models to Reason in Continuous Latent Space 24:58

1 LLMs as Judges: A Comprehensive Survey on LLM-Based Evaluation Methods 28:57

1 Merge, Ensemble, and Cooperate! A Survey on Collaborative LLM Strategies 28:47

1 Agent-as-a-Judge: Evaluate Agents with Agents 24:54


1 Swarm: OpenAI's Experimental Approach to Multi-Agent Systems 46:46
Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.