Per Role Agent
AgentLens lets you assign a different LLM to each agent role. This enables cost and latency optimization – use a fast, small model for retrieval and a more capable model for quality judgment.
Role Dropdowns
The settings bar provides four model dropdowns, one for each role:
- ReAct (R) – the Retrieval Agent's reasoning model (default:
ministral-3:3b) - Grader (G) – the chunk relevance scorer (default:
gemma3:4b) - Judge (J) – the Quality Judge's evaluation model (default:
devstral-small-2:24b) - Fallback (F) – the model used when fallback triggers (default:
gemini-3-flash-preview)
Each role also has a color-coded chip (R/G/J/F) displayed below the query input, matching the role colors from the Live tab.
Configuration
Per-role models are configured via environment variables:
LLM_MODEL_AGENT– ReAct Agent modelLLM_MODEL_GRADER– Grader modelLLM_MODEL_JUDGE– Quality Judge modelLLM_MODEL_FALLBACK– Fallback model
Override priority: per-request (UI dropdown) > environment variable > hardcoded default. This means UI selections override config without restarting services.
Practical Configurations
A common configuration pairs a fast 3B model for ReAct (where multiple tool calls make latency compound) with a larger 24B model for the Judge (where a single high-quality evaluation matters most). The Grader typically uses a mid-size model since it only produces numeric scores.
Model Comparison
By running the same query with different per-role assignments, you can compare how model choice affects answer quality, retrieval strategy, and total latency. The Live tab's token metrics make these comparisons quantitative.