How Hybrid Retrieval Works: Vector + BM25 + Graph Fusion
A deep dive into Knol's adaptive retrieval engine: intent classification, Reciprocal Rank Fusion, and how we combine three search signals for sub-5ms results.
Three Retrieval Signals
Knol's retrieval engine doesn't rely on a single search signal. Instead, it fuses three complementary approaches:
**Vector Similarity**: Captures semantic meaning. If a user asks "What was John's favorite food?" and a past message mentioned "John loves sushi," vector similarity finds it even without exact keyword matching.
**BM25 Full-Text Search**: Captures keyword relevance. Essential for factual lookups ("What year did we switch to Postgres?") where exact terms matter more than semantic closeness.
**Knowledge Graph Traversal**: Captures relationship context. If you're asking about a user's preferences but you only know their ID, graph traversal can find connected entities (projects, team members, organizations) that provide crucial context.
Intent Classification
Before retrieval, Knol classifies the incoming query into one of several intents:
- Fact lookup (high precision BM25 weight)
- Relationship query (high precision graph weight)
- Preference inference (high precision vector weight)
- Open-ended context (balanced all three)This classification happens in real-time using a lightweight classifier. It's fast and dramatically improves retrieval quality by adjusting the retrieval strategy to the query type.
Reciprocal Rank Fusion (RRF)
Once you have results from three different retrieval methods, how do you combine them? Simple averaging of scores doesn't work — the scales are different. Vector similarity is 0-1, BM25 scores can be arbitrarily large.
Knol uses Reciprocal Rank Fusion to combine results:
score = 1/(k + rank_in_vector_results) +
1/(k + rank_in_bm25_results) +
1/(k + rank_in_graph_results)This method is robust to score scale differences and lets each signal contribute equally regardless of its native scale.
Sub-5ms Performance
The entire retrieval pipeline — intent classification, three parallel searches, RRF scoring, and deduplication — completes in under 5ms on typical hardware. This is possible because:
1. PostgreSQL with pgvector does approximate nearest-neighbor search (HNSW), not exhaustive search 2. BM25 uses inverted indexes 3. Graph traversal is bounded to N hops 4. Rust implementations of the fusion logic eliminate Python overhead
The result: context assembly that feels instantaneous to the application.