Fallback

The Fallback stage activates when the Quality Judge produces no extractable answer. Instead of returning empty, the pipeline uses the PromptAssembler with a separate LLM call to generate a final answer from whatever context has been gathered.

Default Model

The Fallback runs on gemini-3-flash-preview (Google, cloud-hosted) by default. Unlike the other roles that use local Ollama models, the Fallback uses a cloud-hosted model for fast, reliable direct generation when the local pipeline cannot produce an answer. In the AgentLens Live tab, Fallback events appear with pink headers.

PromptAssembler

The Fallback uses a 4-layer token-budgeted PromptAssembler that constructs a prompt from:

System instructions (with or without document context)
Retrieved document chunks (if any passed the Grader filter)
Conversation history (newest-to-oldest, fit within token budget)
The current question

Token budgeting ensures the prompt stays within the model's context window.

When Fallback Triggers

Fallback triggers when the Judge's ANSWER field is empty or unparseable (answer = None). This can happen when:

The Judge's LLM output is malformed and the answer cannot be extracted
The retrieved context is too sparse for the Judge to formulate an answer
The Grader removed all chunks (synthetic ACCEPT with confidence 0.1)

Note that on the final round (round 2), the Judge receives a FINAL_ROUND_INSTRUCTION forcing it to ACCEPT and provide an answer. Fallback only triggers if that forced answer still cannot be parsed.

Live Tab Visibility

The Fallback stage is only visible when triggered – successful pipelines that end with a Judge ACCEPT skip it entirely. The pipeline response marks fallback_used: true and tracks fallback-specific token metrics and timing separately.