Models Overview
AgentLens supports any model served through the Ollama API. Each agent role – ReAct, Grader, Judge, and Fallback – can run on a different model, letting you balance speed, quality, and cost per stage.
Per-Role Model Assignment
Each pipeline role gets its own model picker in the UI. The defaults assign mid-range models that balance reasoning quality with inference speed:
| Role | Default Model | Size | Purpose |
|---|---|---|---|
| ReAct Agent | nemotron-3-nano:30b | 30B | Tool-use retrieval planning (1–5 iterations) |
| Grader | qwen3-next:80b | 80B | Chunk relevance scoring (1–5 scale) |
| Judge | gpt-oss:120b | 117B | Answer generation + ACCEPT/RETRY verdict |
| Fallback | gemini-3-flash-preview | Cloud | Direct LLM answer when judge defers |
Thinking Toggle
Chain-of-thought reasoning can be toggled per role. When enabled, the model's thinking process is captured in the Thinking layer and visible in the Trace view. By default, thinking is enabled for ReAct, Grader, and Judge, and disabled for Fallback.
Override Priority
Model selection follows a 3-level priority chain. Each level overrides the one below it:
- Per-request override – set via UI dropdowns, sent in the query request body
- Environment variable –
LLM_MODEL_AGENT,LLM_MODEL_GRADER,LLM_MODEL_JUDGE,LLM_MODEL_FALLBACK - Hardcoded default – the values in the table above
This means you can set baseline models in your environment and override them per-query from the UI without restarting services.
Cloud API
When deployed, AgentLens connects to the Ollama Cloud API rather than a local Ollama instance. The cloud API exposes 30+ models across multiple families and handles inference remotely, so the deployment host needs no GPU. See the Ollama Cloud API page for the full model catalog and tier breakdown.