Token Metrics

AgentLens tracks token usage and timing at every level of the pipeline – per LLM call, per agent role, and across the entire pipeline. This makes it possible to compare model costs, identify bottlenecks, and optimize per-role model assignments.

Per-Call Metrics

Every LLM call to the Ollama API returns:

eval_count – number of tokens generated (completion tokens)
eval_duration – time spent generating tokens (nanoseconds)
prompt_eval_count – number of prompt tokens processed
prompt_eval_duration – time spent processing the prompt (nanoseconds)

These are captured for every call: each ReAct iteration, each Grader invocation, the Judge call, and the Fallback call (if triggered).

Per-Role Aggregation

The orchestrator aggregates metrics by role across all rounds:

Agent tokens – sum of all ReAct iterations across all rounds (1-10 calls in worst case: 5 iterations x 2 rounds)
Grader tokens – sum of grader calls (1-2, one per round)
Judge tokens – sum of judge calls (1-2, one per round)
Fallback tokens – single fallback call (0 or 1)

Each role's total tokens and timing are available in the pipeline response.

Pipeline Metrics

The PipelineMetrics object in the response includes:

tokens_generated – total completion tokens across all roles
tokens_per_sec – overall throughput (total tokens / total time)
fallback_used – whether the Fallback path was triggered
fallback_ms – Fallback LLM call duration (if used)

Live Tab Display

Token metrics appear in two places in the Live tab:

Stage headers – each completed stage (ReAct, Grader, Judge, Fallback) shows its token count and timing inline with the role-colored header
Done banner – the final Done event displays total tokens generated and overall tokens-per-second rate for the entire pipeline

This makes it easy to spot which stage dominates token usage. A common finding: the Judge (running a 24B model) generates fewer tokens than the ReAct Agent (running a 3B model with multiple iterations) but takes longer per token due to model size.

Model Comparison

Token metrics enable quantitative model comparison. Run the same query with different per-role model assignments and compare:

Total tokens generated (cost proxy)
Tokens per second (throughput)
Per-stage timing (latency breakdown)
Answer quality vs. token cost tradeoff