Semantic Search

The Semantic Search tool uses vector embeddings to find documents by meaning rather than exact words. It encodes the query into a dense vector and retrieves the closest document chunks from ChromaDB using cosine similarity.

How Vector Search Works

Vector search maps text into high-dimensional vectors where geometric proximity corresponds to semantic similarity. A query is encoded into the same vector space as the indexed documents, and a nearest-neighbor lookup returns the closest chunks. Documents about similar topics cluster together even when they share no exact words.

Embedding Model Comparison

AgentLens uses all-MiniLM-L6-v2 from Sentence Transformers to generate 384-dimensional embeddings. This model balances speed and quality: it encodes text in milliseconds while capturing semantic relationships between concepts.

Embedding Models

all-MiniLM-L6-v2

384 dims

AgentLens

Open-source, fast, good for local/on-prem. Sub-millisecond encoding with strong semantic quality for general English text.

text-embedding-3-large

3072 dims

OpenAI. Highest quality, supports Matryoshka truncation (reduce dims without retraining).

text-embedding-3-small

1536 dims

OpenAI. Good balance of cost and quality for production workloads.

Cohere Embed v3

1024 dims

Enterprise-grade, supports int8/binary compression for reduced storage and faster retrieval.

BGE-M3

1024 dims

Multilingual, hybrid sparse+dense embeddings in a single model. Strong for cross-lingual retrieval.

Key insight: The embedding model is the "tokenizer" of vector search.

Just as BM25's tokenizer determines what keywords are searchable, the embedding model determines what meanings are capturable. A model that was not trained on domain jargon will not embed specialized terms meaningfully. This is why domain-specific fine-tuning or contextual chunk enrichment (prepending context to chunks before embedding) dramatically improves retrieval quality.

Vector Search Configuration

Vector Search Parameters

Similarity metric

cosine

Cosine similarity is the standard for text. Measures the angle between vectors, ignoring magnitude. Range: -1 to 1 (in practice, 0 to 1 for text). Alternatives: dot product (when model was trained with it), Euclidean distance (for clustering).

top_k

5-20

Number of nearest neighbors to retrieve. Industry standard: retrieve 10-20 candidates initially, then rerank down to 3-5 for the LLM context. AgentLens default: 5.

Chunk size

400-800 tokens

Too small = loss of context. Too large = diluted embeddings. The enterprise standard (per Anthropic's contextual retrieval benchmarks) is 400-800 tokens with overlap.

HNSW

ef=200, M=16

Approximate nearest-neighbor index. Higher ef_construction improves index quality at the cost of build time; M controls the number of bi-directional links per node. Typical production values.

similarity_cutoff

0.25

AgentLens

Results below this cosine similarity are filtered out before being returned to the agent. Without a cutoff, irrelevant queries return results with scores 0.06-0.14, random vector proximity, not real matches. The industry standard is to add a cutoff so irrelevant queries return empty results instead of noise.

Score Thresholds

The Retrieval Agent's system prompt embeds these vector score thresholds so it can reason about result quality:

Vector (Cosine) Score Thresholds

Score	Quality
`0.5+`	strong Near-semantic match
`0.35-0.5`	good Clearly relevant
`0.2-0.35`	partial Possibly relevant
`0.1-0.2`	weak Likely irrelevant
`<0.1`	none Random proximity

Industry practice: set similarity_cutoff at 0.25-0.3 minimum. LlamaIndex uses SimilarityPostprocessor(similarity_cutoff=0.75) for high-precision. LangChain uses score_threshold: 0.25 (cosine distance = 1 - similarity).

Production Checklist

Setting	Recommended	Impact
Vector threshold	`similarity_cutoff=0.25`	Eliminates noise from irrelevant queries
Reranker	BGE Reranker or cross-encoder	67% reduction in retrieval failure (Anthropic benchmark)

Interactive Comparison

See the BM25 vs Vector Search page for an interactive demo comparing keyword and semantic scoring side by side.

Sources

Anthropic: Contextual Retrieval (Sep 2024)
LlamaIndex: SimilarityPostprocessor documentation
Superlinked VectorHub: Optimizing RAG with Hybrid Search & Reranking