Docs / Agents / Retrieval

Retrieval Agent

The Retrieval Agent implements a ReAct (Reasoning + Acting) loop to find documents relevant to the user's query. It reasons about what information is needed, selects a tool, observes the result, and decides whether to search again or finalize.

Default Model

The Retrieval Agent runs on nemotron-3-nano:30b (30B, NVIDIA) by default. This model supports both tool use and chain-of-thought thinking, making it well suited for the Thought–Action–Observation cycle. In the AgentLens Live tab, ReAct events appear with cyan headers.

ReAct Loop

Each iteration follows the Thought–Action–Observation cycle. The agent outputs structured text that is parsed into these fields:

  • Thought – the agent reasons about what to search for next
  • Action – it names a tool to call (vector_search or keyword_search)
  • Action Input – JSON parameters for the tool call
  • Observation – formatted tool results with chunk counts and quality labels
  • Final Answer – a summary of findings (terminates the loop)

The loop runs for 1–5 iterations. The agent stops early when it determines it has enough relevant material to answer the query. The system prompt instructs it to prefer 1–2 calls when possible.

Available Tools

The agent has access to two retrieval tools:

  • vector_search – semantic similarity search over document chunks, best for conceptual or meaning-based queries
  • keyword_search – exact term matching using BM25, best for specific names, codes, IDs, or technical terms

Both tools accept a query string and an optional top_k (default 5). The agent decides which tool to call based on LLM reasoning – there is no hardcoded routing logic. See BM25 vs Vector Search for a comparison of retrieval strategies.

Quality Labels

Observations include quality labels derived from raw retrieval scores. The agent's system prompt embeds these thresholds so it can reason about result quality:

  • BM25: strong (15+), good (8–15), partial (3–8), weak (1–3), none (0)
  • Vector: strong (0.5+), good (0.35–0.5), marginal (0.2–0.35), noise (<0.2)

Chunk Tracking

Retrieved chunks are tracked by ID (format: filename#idx). The agent deduplicates chunks by text content hash, keeping the highest score. On retry rounds, the orchestrator maintains a rejection set of previously graded-out chunks (MD5 hashes) to prevent re-fetching discarded material.