Models Overview

AgentLens supports any model served through the Ollama API. Each agent role – ReAct, Grader, Judge, and Fallback – can run on a different model, letting you balance speed, quality, and cost per stage.

Per-Role Model Assignment

Each pipeline role gets its own model picker in the UI. The defaults assign mid-range models that balance reasoning quality with inference speed:

Role	Default Model	Size	Purpose
ReAct Agent	`nemotron-3-nano:30b`	30B	Tool-use retrieval planning (1–5 iterations)
Grader	`qwen3-next:80b`	80B	Chunk relevance scoring (1–5 scale)
Judge	`gpt-oss:120b`	117B	Answer generation + ACCEPT/RETRY verdict
Fallback	`gemini-3-flash-preview`	Cloud	Direct LLM answer when judge defers

Thinking Toggle

Chain-of-thought reasoning can be toggled per role. When enabled, the model's thinking process is captured in the Thinking layer and visible in the Trace view. By default, thinking is enabled for ReAct, Grader, and Judge, and disabled for Fallback.

Override Priority

Model selection follows a 3-level priority chain. Each level overrides the one below it:

Per-request override – set via UI dropdowns, sent in the query request body
Environment variable – LLM_MODEL_AGENT, LLM_MODEL_GRADER, LLM_MODEL_JUDGE, LLM_MODEL_FALLBACK
Hardcoded default – the values in the table above

This means you can set baseline models in your environment and override them per-query from the UI without restarting services.

Cloud API

When deployed, AgentLens connects to the Ollama Cloud API rather than a local Ollama instance. The cloud API exposes 30+ models across multiple families and handles inference remotely, so the deployment host needs no GPU. See the Ollama Cloud API page for the full model catalog and tier breakdown.