Docs / Agents / Overview

Agents Overview

AgentLens uses a two-agent architecture built from scratch – no LangGraph, LlamaIndex, or other frameworks. The pipeline coordinates a Retrieval Agent, a Grader, and a Quality Judge through a structured feedback loop managed by the TwoAgentOrchestrator.

Pipeline Flow

When a query arrives, a heuristic classifier first decides whether retrieval is needed. Greetings and short non-questions skip directly to a direct LLM call. For retrieval queries, the orchestrator runs:

  1. Retrieval Agent runs a ReAct loop (1–5 tool calls) to find relevant documents
  2. Grader scores each retrieved chunk on a 1–5 relevance scale and filters out irrelevant ones
  3. Quality Judge evaluates the filtered chunks, generates an answer, and issues ACCEPT or RETRY
  4. On RETRY, the Retrieval Agent runs again with the Judge's targeted feedback
  5. If the Judge produces no extractable answer, Fallback generates a final answer via PromptAssembler

Default Models

Each pipeline role runs on its own model, chosen to balance reasoning quality with inference speed. The defaults scale up in size as the pipeline progresses from retrieval to final judgment:

RoleDefault ModelParamsWhy
ReAct Agentnemotron-3-nano:30b30BFast tool-use planning with thinking support
Graderqwen3-next:80b80BAccurate chunk relevance scoring
Judgegpt-oss:120b117BStrong answer generation + verdict reasoning
Fallbackgemini-3-flash-previewCloudDirect LLM answer when judge defers

Models can be overridden per-request via the UI dropdowns or per-deployment via environment variables. See Models Overview for the full override priority chain.

LLM Call Budget

The common case uses 2–3 LLM calls (one retrieval round + grader + judge accept). The worst case is 7 calls across 2 retrieval rounds, grading, judging, and fallback. This is fewer than the 4–8 calls used in the prior single-agent design.

Fail-Open Design

If any agent produces unparseable output, the system defaults to accepting with a low confidence score of 0.1 rather than failing the query. This ensures users always get an answer, even when a model misbehaves.