Artwork

Player FM - Internet Radio Done Right
Checked 4d ago
Added forty-four weeks ago
Content provided by Nicolay Gerold. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Nicolay Gerold or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!
icon Daily Deals

#046 Building a Search Database From First Principles

53:29
 
Share
 

Manage episode 471129384 series 3585930
Content provided by Nicolay Gerold. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Nicolay Gerold or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Modern search is broken. There are too many pieces that are glued together.

  • Vector databases for semantic search
  • Text engines for keywords
  • Rerankers to fix the results
  • LLMs to understand queries
  • Metadata filters for precision

Each piece works well alone.

Together, they often become a mess.

When you glue these systems together, you create:

  • Data Consistency Gaps Your vector store knows about documents your text engine doesn't. Which is right?
  • Timing Mismatches New content appears in one system before another. Users see different results depending on which path their query takes.
  • Complexity Explosion Every new component doubles your integration points. Three components means three connections. Five means ten.
  • Performance Bottlenecks Each hop between systems adds latency. A 200ms search becomes 800ms after passing through four components.
  • Brittle Chains When one system fails, your entire search breaks. More pieces mean more breaking points.

I recently built a system where we had query specific post-filters but the requirement to deliver a fixed number of results to the user.

A lot of times, the query had to be run multiple times to achieve the desired amount.

So we had an unpredictable latency. A high load on the backend, where some queries hammered the database 10+ times. A relevance cliff, where results 1-6 look great, but the later ones were poor matches.

Today on How AI Is Built, we are talking to Marek Galovic from TopK.

We talk about how they built a new search database with modern components. "How would search work if we built it today?”

Cloud storage is cheap. Compute is fast. Memory is plentiful.

One system that handles vectors, text, and filters together - not three systems duct-taped into one.

One pass handles everything:

Vector search + Text search + Filters → Single sorted result
Built with hand-optimized Rust kernels for both x86 and ARM, the system scales to 100M documents with 200ms P99 latency.

The goal is to do search in 5 lines of code.

Marek Galovic:

Nicolay Gerold:

00:00 Introduction to TopK and Snowflake Comparison

00:35 Architectural Patterns and Custom Formats

01:30 Query Execution Engine Explained

02:56 Distributed Systems and Rust

04:12 Query Execution Process

06:56 Custom File Formats for Search

11:45 Handling Distributed Queries

16:28 Consistency Models and Use Cases

26:47 Exploring Database Versioning and Snapshots

27:27 Performance Benchmarks: Rust vs. C/C++

29:02 Scaling and Latency in Large Datasets

29:39 GPU Acceleration and Use Cases

31:04 Optimizing Search Relevance and Hybrid Search

34:39 Advanced Search Features and Custom Scoring

38:43 Future Directions and Research in AI

47:11 Takeaways for Building AI Applications

  continue reading

53 episodes

Artwork
iconShare
 
Manage episode 471129384 series 3585930
Content provided by Nicolay Gerold. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Nicolay Gerold or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Modern search is broken. There are too many pieces that are glued together.

  • Vector databases for semantic search
  • Text engines for keywords
  • Rerankers to fix the results
  • LLMs to understand queries
  • Metadata filters for precision

Each piece works well alone.

Together, they often become a mess.

When you glue these systems together, you create:

  • Data Consistency Gaps Your vector store knows about documents your text engine doesn't. Which is right?
  • Timing Mismatches New content appears in one system before another. Users see different results depending on which path their query takes.
  • Complexity Explosion Every new component doubles your integration points. Three components means three connections. Five means ten.
  • Performance Bottlenecks Each hop between systems adds latency. A 200ms search becomes 800ms after passing through four components.
  • Brittle Chains When one system fails, your entire search breaks. More pieces mean more breaking points.

I recently built a system where we had query specific post-filters but the requirement to deliver a fixed number of results to the user.

A lot of times, the query had to be run multiple times to achieve the desired amount.

So we had an unpredictable latency. A high load on the backend, where some queries hammered the database 10+ times. A relevance cliff, where results 1-6 look great, but the later ones were poor matches.

Today on How AI Is Built, we are talking to Marek Galovic from TopK.

We talk about how they built a new search database with modern components. "How would search work if we built it today?”

Cloud storage is cheap. Compute is fast. Memory is plentiful.

One system that handles vectors, text, and filters together - not three systems duct-taped into one.

One pass handles everything:

Vector search + Text search + Filters → Single sorted result
Built with hand-optimized Rust kernels for both x86 and ARM, the system scales to 100M documents with 200ms P99 latency.

The goal is to do search in 5 lines of code.

Marek Galovic:

Nicolay Gerold:

00:00 Introduction to TopK and Snowflake Comparison

00:35 Architectural Patterns and Custom Formats

01:30 Query Execution Engine Explained

02:56 Distributed Systems and Rust

04:12 Query Execution Process

06:56 Custom File Formats for Search

11:45 Handling Distributed Queries

16:28 Consistency Models and Use Cases

26:47 Exploring Database Versioning and Snapshots

27:27 Performance Benchmarks: Rust vs. C/C++

29:02 Scaling and Latency in Large Datasets

29:39 GPU Acceleration and Use Cases

31:04 Optimizing Search Relevance and Hybrid Search

34:39 Advanced Search Features and Custom Scoring

38:43 Future Directions and Research in AI

47:11 Takeaways for Building AI Applications

  continue reading

53 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

icon Daily Deals
icon Daily Deals
icon Daily Deals

Quick Reference Guide

Listen to this show while you explore
Play