Semantic Search
The Semantic Search tool uses vector embeddings to find documents by meaning rather than exact words. It encodes the query into a dense vector and retrieves the closest document chunks from ChromaDB using cosine similarity.
How Vector Search Works
Vector search maps text into high-dimensional vectors where geometric proximity corresponds to semantic similarity. A query is encoded into the same vector space as the indexed documents, and a nearest-neighbor lookup returns the closest chunks. Documents about similar topics cluster together even when they share no exact words.
Embedding Model Comparison
AgentLens uses all-MiniLM-L6-v2 from Sentence Transformers to generate 384-dimensional embeddings. This model balances speed and quality: it encodes text in milliseconds while capturing semantic relationships between concepts.
Just as BM25's tokenizer determines what keywords are searchable, the embedding model determines what meanings are capturable. A model that was not trained on domain jargon will not embed specialized terms meaningfully. This is why domain-specific fine-tuning or contextual chunk enrichment (prepending context to chunks before embedding) dramatically improves retrieval quality.
Vector Search Configuration
ef_construction improves index quality at the cost of build time; M controls the number of bi-directional links per node. Typical production values.Score Thresholds
The Retrieval Agent's system prompt embeds these vector score thresholds so it can reason about result quality:
| Score | Quality |
|---|---|
0.5+ | strong Near-semantic match |
0.35-0.5 | good Clearly relevant |
0.2-0.35 | partial Possibly relevant |
0.1-0.2 | weak Likely irrelevant |
<0.1 | none Random proximity |
Industry practice: set similarity_cutoff at 0.25-0.3 minimum. LlamaIndex uses SimilarityPostprocessor(similarity_cutoff=0.75) for high-precision. LangChain uses score_threshold: 0.25 (cosine distance = 1 - similarity).
Production Checklist
| Setting | Recommended | Impact |
|---|---|---|
| Vector threshold | similarity_cutoff=0.25 | Eliminates noise from irrelevant queries |
| Reranker | BGE Reranker or cross-encoder | 67% reduction in retrieval failure (Anthropic benchmark) |
Interactive Comparison
See the BM25 vs Vector Search page for an interactive demo comparing keyword and semantic scoring side by side.
Sources
- Anthropic: Contextual Retrieval (Sep 2024)
- LlamaIndex: SimilarityPostprocessor documentation
- Superlinked VectorHub: Optimizing RAG with Hybrid Search & Reranking