Quality Judge
The Quality Judge evaluates the pre-filtered chunks from the Grader in a single LLM call. It checks relevance, groundedness, and completeness, then generates an answer and issues a verdict.
The Judge runs on gpt-oss:120b (117B, OpenAI) by default – the largest model in the pipeline. A bigger model at the judgment stage gives stronger reasoning for answer generation and verdict decisions. In the AgentLens Live tab, Judge events appear with green headers.
Verdict Protocol
The Judge outputs a structured response. The format differs based on the verdict:
On ACCEPT (answer is satisfactory):
- VERDICT: ACCEPT
- CONFIDENCE – a score between 0.0 and 1.0
- ANSWER – detailed answer citing sources as [filename]
- ASSESSMENT – one sentence about groundedness
On RETRY (more retrieval needed):
- VERDICT: RETRY
- FEEDBACK – specific guidance for the Retrieval Agent on what to search for differently
On ACCEPT, the Judge's answer becomes the final response. On RETRY, the feedback is forwarded to the Retrieval Agent for a targeted second round.
Retry Budget
The TwoAgentOrchestrator allows a maximum of 2 rounds. On the final round, the Judge receives a FINAL_ROUND_INSTRUCTION that forces it to respond with ACCEPT and provide the best possible answer using available chunks, even if evidence is incomplete.
Fallback Trigger
If the Judge produces no extractable answer (the ANSWER field is empty or unparseable), the pipeline moves to the Fallback stage. This is distinct from RETRY – fallback triggers when the Judge accepts but has nothing to return.
Fail-Open Parsing
If the Judge's output cannot be parsed:
- Unparseable VERDICT defaults to ACCEPT
- Unparseable CONFIDENCE defaults to 0.1
- Unparseable ANSWER defaults to None (triggers fallback)
- Unparseable FEEDBACK defaults to "Search for more specific information"
This ensures the pipeline never stalls on a misbehaving model.