Artwork

Content provided by Nicolay Gerold. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Nicolay Gerold or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

#045 RAG As Two Things - Prompt Engineering and Search

1:02:44
 
Share
 

Manage episode 469963411 series 3585930
Content provided by Nicolay Gerold. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Nicolay Gerold or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

John Berryman moved from aerospace engineering to search, then to ML and LLMs. His path: Eventbrite search → GitHub code search → data science → GitHub Copilot. He was drawn to more math and ML throughout his career.

RAG Explained

"RAG is not a thing. RAG is two things." It breaks into:

  1. Search - finding relevant information
  2. Prompt engineering - presenting that information to the model

These should be treated as separate problems to optimize.

The Little Red Riding Hood Principle

When prompting LLMs, stay on the path of what models have seen in training. Use formats, structures, and patterns they recognize from their training data:

  • For code, use docstrings and proper formatting
  • For financial data, use SEC report structures
  • Use Markdown for better formatting

Models respond better to familiar structures.

Testing Prompts

Testing strategies:

  • Start with "vibe testing" - human evaluation of outputs
  • Develop systematic tests based on observed failure patterns
  • Use token probabilities to measure model confidence
  • For few-shot prompts, watch for diminishing returns as examples increase

Managing Token Limits

When designing prompts, divide content into:

  • Static elements (boilerplate, instructions)
  • Dynamic elements (user inputs, context)

Prioritize content by:

  1. Must-have information
  2. Nice-to-have information
  3. Optional if space allows

Even with larger context windows, efficiency remains important for cost and latency.

Completion vs. Chat Models

Chat models are winning despite initial concerns about their constraints:

  • Completion models allow more flexibility in document format
  • Chat models are more reliable and aligned with common use cases
  • Most applications now use chat models, even for completion-like tasks

Applications: Workflows vs. Assistants

Two main LLM application patterns:

  • Assistants: Human-in-the-loop interactions where users guide and correct
  • Workflows: Decomposed tasks where LLMs handle well-defined steps with safeguards

Breaking Down Complex Problems

Two approaches:

  • Horizontal: Split into sequential steps with clear inputs/outputs
  • Vertical: Divide by case type, with specialized handling for each scenario

Example: For SOX compliance, break horizontally (understand control, find evidence, extract data, compile report) and vertically (different audit types).

On Agents

Agents exist on a spectrum from assistants to workflows, characterized by:

  • Having some autonomy to make decisions
  • Using tools to interact with the environment
  • Usually requiring human oversight

Best Practices

For building with LLMs:

  1. Start simple: API key + Jupyter notebook
  2. Build prototypes and iterate quickly
  3. Add evaluation as you scale
  4. Keep users in the loop until models prove reliability

John Berryman:

Nicolay Gerold:

  • ⁠LinkedIn⁠
  • ⁠X (Twitter)

    00:00 Introduction to RAG: Retrieval and Generation
    00:19 Optimizing Retrieval Systems
    01:11 Introducing John Berryman
    02:31 John's Journey from Search to Prompt Engineering
    04:05 Understanding RAG: Search and Prompt Engineering
    05:39 The Little Red Riding Hood Principle in Prompt Engineering
    14:14 Balancing Static and Dynamic Elements in Prompts
    25:52 Assistants vs. Workflows: Choosing the Right Approach
    30:15 Defining Agency in AI
    30:35 Spectrum of Assistance and Workflows
    34:35 Breaking Down Problems Horizontally and Vertically
    37:57 SOX Compliance Case Study
    40:56 Integrating LLMs into Existing Applications
    44:37 Favorite Tools and Missing Features
    46:37 Exploring Niche Technologies in AI
    52:52 Key Takeaways and Future Directions

  continue reading

51 episodes

Artwork
iconShare
 
Manage episode 469963411 series 3585930
Content provided by Nicolay Gerold. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Nicolay Gerold or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

John Berryman moved from aerospace engineering to search, then to ML and LLMs. His path: Eventbrite search → GitHub code search → data science → GitHub Copilot. He was drawn to more math and ML throughout his career.

RAG Explained

"RAG is not a thing. RAG is two things." It breaks into:

  1. Search - finding relevant information
  2. Prompt engineering - presenting that information to the model

These should be treated as separate problems to optimize.

The Little Red Riding Hood Principle

When prompting LLMs, stay on the path of what models have seen in training. Use formats, structures, and patterns they recognize from their training data:

  • For code, use docstrings and proper formatting
  • For financial data, use SEC report structures
  • Use Markdown for better formatting

Models respond better to familiar structures.

Testing Prompts

Testing strategies:

  • Start with "vibe testing" - human evaluation of outputs
  • Develop systematic tests based on observed failure patterns
  • Use token probabilities to measure model confidence
  • For few-shot prompts, watch for diminishing returns as examples increase

Managing Token Limits

When designing prompts, divide content into:

  • Static elements (boilerplate, instructions)
  • Dynamic elements (user inputs, context)

Prioritize content by:

  1. Must-have information
  2. Nice-to-have information
  3. Optional if space allows

Even with larger context windows, efficiency remains important for cost and latency.

Completion vs. Chat Models

Chat models are winning despite initial concerns about their constraints:

  • Completion models allow more flexibility in document format
  • Chat models are more reliable and aligned with common use cases
  • Most applications now use chat models, even for completion-like tasks

Applications: Workflows vs. Assistants

Two main LLM application patterns:

  • Assistants: Human-in-the-loop interactions where users guide and correct
  • Workflows: Decomposed tasks where LLMs handle well-defined steps with safeguards

Breaking Down Complex Problems

Two approaches:

  • Horizontal: Split into sequential steps with clear inputs/outputs
  • Vertical: Divide by case type, with specialized handling for each scenario

Example: For SOX compliance, break horizontally (understand control, find evidence, extract data, compile report) and vertically (different audit types).

On Agents

Agents exist on a spectrum from assistants to workflows, characterized by:

  • Having some autonomy to make decisions
  • Using tools to interact with the environment
  • Usually requiring human oversight

Best Practices

For building with LLMs:

  1. Start simple: API key + Jupyter notebook
  2. Build prototypes and iterate quickly
  3. Add evaluation as you scale
  4. Keep users in the loop until models prove reliability

John Berryman:

Nicolay Gerold:

  • ⁠LinkedIn⁠
  • ⁠X (Twitter)

    00:00 Introduction to RAG: Retrieval and Generation
    00:19 Optimizing Retrieval Systems
    01:11 Introducing John Berryman
    02:31 John's Journey from Search to Prompt Engineering
    04:05 Understanding RAG: Search and Prompt Engineering
    05:39 The Little Red Riding Hood Principle in Prompt Engineering
    14:14 Balancing Static and Dynamic Elements in Prompts
    25:52 Assistants vs. Workflows: Choosing the Right Approach
    30:15 Defining Agency in AI
    30:35 Spectrum of Assistance and Workflows
    34:35 Breaking Down Problems Horizontally and Vertically
    37:57 SOX Compliance Case Study
    40:56 Integrating LLMs into Existing Applications
    44:37 Favorite Tools and Missing Features
    46:37 Exploring Niche Technologies in AI
    52:52 Key Takeaways and Future Directions

  continue reading

51 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide

Listen to this show while you explore
Play