Query Classification

Before entering the multi-agent pipeline, every query passes through a heuristic classifier. The classifier decides whether the query needs document retrieval or can be answered directly by the LLM, avoiding unnecessary search overhead for simple messages.

Classification Logic

The QueryClassifier uses pattern-based heuristics (no ML model) to categorize queries:

Greetings – messages like "hello", "hi there", "good morning" are detected and routed to direct LLM mode
Short non-questions – very brief inputs that lack question indicators skip retrieval
Retrieval queries – everything else enters the full multi-agent pipeline

The classifier runs in microseconds since it uses string matching and simple rules rather than an LLM call.

Direct LLM Path

When the classifier determines retrieval is unnecessary, the query bypasses the Retrieval Agent, Grader, and Judge entirely. Instead:

The PromptAssembler builds a prompt with no retrieved chunks
A single LLM call generates the response using the Fallback model
The response is returned with auto_skip_retrieval: true and empty sources

This path uses 1 LLM call instead of the usual 2-7, significantly reducing latency for conversational messages.

Manual Override

The classification can be overridden via the skip_retrieval flag in the request body. Setting it to true forces direct LLM mode regardless of the classifier's decision. This is useful for testing the LLM's knowledge without document context.

Live Tab Behavior

When a query skips retrieval, the Live tab shows a classified status event with the classification reason, then jumps directly to the Done banner. The ReAct, Grader, Judge, and Fallback stages are not rendered since they were never executed.