Docs / Models / Overview

Models Overview

AgentLens supports any model served through the Ollama API. Each agent role – ReAct, Grader, Judge, and Fallback – can run on a different model, letting you balance speed, quality, and cost per stage.

Per-Role Model Assignment

Each pipeline role gets its own model picker in the UI. The defaults assign mid-range models that balance reasoning quality with inference speed:

RoleDefault ModelSizePurpose
ReAct Agentnemotron-3-nano:30b30BTool-use retrieval planning (1–5 iterations)
Graderqwen3-next:80b80BChunk relevance scoring (1–5 scale)
Judgegpt-oss:120b117BAnswer generation + ACCEPT/RETRY verdict
Fallbackgemini-3-flash-previewCloudDirect LLM answer when judge defers

Thinking Toggle

Chain-of-thought reasoning can be toggled per role. When enabled, the model's thinking process is captured in the Thinking layer and visible in the Trace view. By default, thinking is enabled for ReAct, Grader, and Judge, and disabled for Fallback.

Override Priority

Model selection follows a 3-level priority chain. Each level overrides the one below it:

  1. Per-request override – set via UI dropdowns, sent in the query request body
  2. Environment variableLLM_MODEL_AGENT, LLM_MODEL_GRADER, LLM_MODEL_JUDGE, LLM_MODEL_FALLBACK
  3. Hardcoded default – the values in the table above

This means you can set baseline models in your environment and override them per-query from the UI without restarting services.

Cloud API

When deployed, AgentLens connects to the Ollama Cloud API rather than a local Ollama instance. The cloud API exposes 30+ models across multiple families and handles inference remotely, so the deployment host needs no GPU. See the Ollama Cloud API page for the full model catalog and tier breakdown.