Agents Overview

AgentLens uses a two-agent architecture built from scratch – no LangGraph, LlamaIndex, or other frameworks. The pipeline coordinates a Retrieval Agent, a Grader, and a Quality Judge through a structured feedback loop managed by the TwoAgentOrchestrator.

Pipeline Flow

When a query arrives, a heuristic classifier first decides whether retrieval is needed. Greetings and short non-questions skip directly to a direct LLM call. For retrieval queries, the orchestrator runs:

Retrieval Agent runs a ReAct loop (1–5 tool calls) to find relevant documents
Grader scores each retrieved chunk on a 1–5 relevance scale and filters out irrelevant ones
Quality Judge evaluates the filtered chunks, generates an answer, and issues ACCEPT or RETRY
On RETRY, the Retrieval Agent runs again with the Judge's targeted feedback
If the Judge produces no extractable answer, Fallback generates a final answer via PromptAssembler

Default Models

Each pipeline role runs on its own model, chosen to balance reasoning quality with inference speed. The defaults scale up in size as the pipeline progresses from retrieval to final judgment:

Role	Default Model	Params	Why
ReAct Agent	`nemotron-3-nano:30b`	30B	Fast tool-use planning with thinking support
Grader	`qwen3-next:80b`	80B	Accurate chunk relevance scoring
Judge	`gpt-oss:120b`	117B	Strong answer generation + verdict reasoning
Fallback	`gemini-3-flash-preview`	Cloud	Direct LLM answer when judge defers

Models can be overridden per-request via the UI dropdowns or per-deployment via environment variables. See Models Overview for the full override priority chain.

LLM Call Budget

The common case uses 2–3 LLM calls (one retrieval round + grader + judge accept). The worst case is 7 calls across 2 retrieval rounds, grading, judging, and fallback. This is fewer than the 4–8 calls used in the prior single-agent design.

Fail-Open Design

If any agent produces unparseable output, the system defaults to accepting with a low confidence score of 0.1 rather than failing the query. This ensures users always get an answer, even when a model misbehaves.