Per Role Agent

AgentLens lets you assign a different LLM to each agent role. This enables cost and latency optimization – use a fast, small model for retrieval and a more capable model for quality judgment.

Role Dropdowns

The settings bar provides four model dropdowns, one for each role:

ReAct (R) – the Retrieval Agent's reasoning model (default: ministral-3:3b)
Grader (G) – the chunk relevance scorer (default: gemma3:4b)
Judge (J) – the Quality Judge's evaluation model (default: devstral-small-2:24b)
Fallback (F) – the model used when fallback triggers (default: gemini-3-flash-preview)

Each role also has a color-coded chip (R/G/J/F) displayed below the query input, matching the role colors from the Live tab.

Configuration

Per-role models are configured via environment variables:

LLM_MODEL_AGENT – ReAct Agent model
LLM_MODEL_GRADER – Grader model
LLM_MODEL_JUDGE – Quality Judge model
LLM_MODEL_FALLBACK – Fallback model

Override priority: per-request (UI dropdown) > environment variable > hardcoded default. This means UI selections override config without restarting services.

Practical Configurations

A common configuration pairs a fast 3B model for ReAct (where multiple tool calls make latency compound) with a larger 24B model for the Judge (where a single high-quality evaluation matters most). The Grader typically uses a mid-size model since it only produces numeric scores.

Model Comparison

By running the same query with different per-role assignments, you can compare how model choice affects answer quality, retrieval strategy, and total latency. The Live tab's token metrics make these comparisons quantitative.