BM25 vs Vector Search
Every RAG pipeline needs a retrieval step: given a user query, find the most relevant document chunks. There are two main approaches: keyword-based scoring (BM25) and semantic embeddings (vector search). This page explains how each works, when each wins, and why AgentLens uses both.
How Each Mechanism Works
Keyword scoring
Counts exact word matches, weighted by rarity
Semantic embedding
Compares meaning, not words
An Analogy
Like a book index. You look up a keyword, and it tells you exactly which pages mention that word. Fast and precise, but it cannot find pages about the same concept when different words are used.
Like a librarian. You describe what you are looking for in plain language, and they bring you books that are about the right topic – even if the books never use your exact words.
Score Ranges
Unbounded scale. Values depend on corpus size and query length.
Normalized [0, 1]. Using all-MiniLM-L6-v2 embeddings.
Live Comparison
Pick a query to see how BM25 and vector search score the same document chunk differently. The scores are from real AgentLens pipeline runs.
The PromptAssembler uses a 4-layer token budget to order system prompt, retrieved context, conversation history, and user query within the model context window.
Both match well. BM25 finds exact "PromptAssembler" and "token" keywords. Vector understands the semantic concept of token budgeting.
Why Hybrid Retrieval
Neither approach is strictly better. BM25 excels at exact-match queries – names, error codes, specific terms. Vector search excels at conceptual queries where the user does not know the exact terminology.
AgentLens uses hybrid retrieval: the ReAct agent has access to both vector_search and keyword_search tools, and decides which to call (or both) based on the query.
Hybrid Search with RRF
As of 2025-2026, hybrid search is the production default for enterprise RAG. Every major vector database (Pinecone, Qdrant, Weaviate, Elasticsearch) supports it natively. BM25 and vector search have complementary strengths, and combining them yields 15-20% precision improvement over either alone.
RRF merges ranked results from BM25 and vector search without needing to normalize their incompatible score scales. It works on rank position, not raw scores.
# Reciprocal Rank Fusion (Cormack et al., SIGIR 2009)
# The industry-standard way to combine BM25 + Vector results
def reciprocal_rank_fusion(rankings, k=60):
scores = {}
for ranking in rankings:
for rank, doc_id in enumerate(ranking, 1):
scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank)
return sorted(scores.items(), key=lambda x: x[1], reverse=True)
# k=60 is the standard default (from the original paper)
# Documents appearing in BOTH lists get boosted
# No score normalization needed. Pure rank-based fusionWhy RRF Over Score Normalization
BM25 scores are unbounded (0 to 20+) while cosine similarity is bounded (0 to 1). You cannot simply combine them. A BM25 score of 16.8 and a cosine score of 0.44 are incomparable. RRF solves this elegantly by ignoring raw scores entirely and fusing only rank positions. This is why it is the industry default: zero calibration needed.
Optimized RAG Retrieval Pipeline
Based on the industry research, here is the production-standard retrieval pipeline that top-performing RAG systems use:
Reranked Contextual Embedding + Contextual BM25 reduced the top-20-chunk retrieval failure rate by 67% (5.7% to 1.9%). Hybrid search + reranking is worth the investment.
Sources
- Cormack, Clarke, Buttcher: "Reciprocal Rank Fusion Outperforms Condorcet" (SIGIR 2009)
- Elastic: A Comprehensive Hybrid Search Guide
- Google Cloud: About Hybrid Search in Vertex AI