Prompt Assembly

Every agent in the pipeline receives a purpose-built prompt. The prompt is the specification layer that defines what the agent knows, how it reasons, and what format it outputs. AgentLens designs 4 distinct prompts, one per agent role.

Per-Agent Prompt Design

ReAct	5-section system prompt (role, tools, format, rules, thresholds)
Grader	1-5 relevance scale, chunk-level scoring
Judge	ACCEPT/RETRY structured output, forced-accept on final round
Fallback	4-layer token-budgeted assembly (System, Documents, History, Question)

ReAct Agent Prompt

The ReAct agent receives a 5-section system prompt built by build_agent_system_prompt():

Role + Tools	Defines the agent as a research assistant with access to vector_search, keyword_search, and document_lookup
Output Format	Enforces Thought / Action / Final Answer structure so the ReAct loop can parse each step
Rules	Maximum 5 tool calls per query, must cite sources, stop when confident
How Observations Work	Tool results return 150-character previews so the agent knows to use document_lookup for full content
When to Stop	Embeds BM25 + Vector quality thresholds so the agent can self-evaluate retrieval results

The prompt uses a compact 3-message format (system, user context, user query) optimized for 3B-class models. On RETRY rounds, the Judge's feedback is injected into the user message so the agent can adjust its search strategy.

Grader Prompt

The Grader scores each retrieved chunk on a 1-5 relevance scale:

1	Completely irrelevant, no connection to the query
2	Marginally related, mentions the topic but no useful information
3	Somewhat relevant, contains partial information
4	Relevant, contains information that helps answer the query
5	Highly relevant, directly answers the query

The key design decision: the Grader scores whether a chunk contains relevant information, not whether it is about the topic. This distinction matters for chunks that mention a concept in passing versus chunks that explain it. Output format is bare "N: S" (chunk number: score), one per line, for reliable parsing.

Judge Prompt

The Quality Judge outputs a structured format with 5 fields:

VERDICT	ACCEPT or RETRY
CONFIDENCE	0.0 to 1.0 score
ANSWER	The final answer text (may be empty on RETRY)
ASSESSMENT	Reasoning about answer quality
FEEDBACK	Instructions for the ReAct agent on what to search next (on RETRY)

On the final round, the prompt includes a forced-accept instruction: the Judge must return ACCEPT regardless of quality, so the pipeline always produces an answer. If the Judge's output is unparseable, the system fails open, defaulting to ACCEPT with confidence 0.5.

4-Layer Token-Budgeted Assembly

The Fallback and Direct LLM paths use the PromptAssembler, which builds prompts from four ordered layers:

System instructions	Base system prompt, either with-docs or no-docs variant depending on whether chunks are available
Retrieved documents	Chunked context from the Retrieval Agent, ordered by relevance score
Conversation history	Prior turns, added newest-to-oldest until the token budget is exhausted
Current question	The user's query, always included in full

Token budgeting (3000 tokens via tiktoken cl100k_base) ensures the assembled prompt fits within the model's context window. If retrieved documents exceed the budget, lower-scored chunks are dropped first. The assembler returns metadata: per-layer token counts, included/dropped chunks, and history turns included.

Trace Tab Visibility

Every agent's assembled prompt is visible in the Trace tab's Prompt layer. For ReAct, Grader, and Judge stages, this shows the system prompt and injected context. For Fallback stages, the [System] / [Instructions] / [Context] separators show exactly how the 4-layer assembly was constructed and what each layer contributed.