Prompt Assembly
Every agent in the pipeline receives a purpose-built prompt. The prompt is the specification layer that defines what the agent knows, how it reasons, and what format it outputs. AgentLens designs 4 distinct prompts, one per agent role.
Per-Agent Prompt Design
| ReAct | 5-section system prompt (role, tools, format, rules, thresholds) |
| Grader | 1-5 relevance scale, chunk-level scoring |
| Judge | ACCEPT/RETRY structured output, forced-accept on final round |
| Fallback | 4-layer token-budgeted assembly (System, Documents, History, Question) |
ReAct Agent Prompt
The ReAct agent receives a 5-section system prompt built by build_agent_system_prompt():
| Role + Tools | Defines the agent as a research assistant with access to vector_search, keyword_search, and document_lookup |
| Output Format | Enforces Thought / Action / Final Answer structure so the ReAct loop can parse each step |
| Rules | Maximum 5 tool calls per query, must cite sources, stop when confident |
| How Observations Work | Tool results return 150-character previews so the agent knows to use document_lookup for full content |
| When to Stop | Embeds BM25 + Vector quality thresholds so the agent can self-evaluate retrieval results |
The prompt uses a compact 3-message format (system, user context, user query) optimized for 3B-class models. On RETRY rounds, the Judge's feedback is injected into the user message so the agent can adjust its search strategy.
Grader Prompt
The Grader scores each retrieved chunk on a 1-5 relevance scale:
| 1 | Completely irrelevant, no connection to the query |
| 2 | Marginally related, mentions the topic but no useful information |
| 3 | Somewhat relevant, contains partial information |
| 4 | Relevant, contains information that helps answer the query |
| 5 | Highly relevant, directly answers the query |
The key design decision: the Grader scores whether a chunk contains relevant information, not whether it is about the topic. This distinction matters for chunks that mention a concept in passing versus chunks that explain it. Output format is bare "N: S" (chunk number: score), one per line, for reliable parsing.
Judge Prompt
The Quality Judge outputs a structured format with 5 fields:
| VERDICT | ACCEPT or RETRY |
| CONFIDENCE | 0.0 to 1.0 score |
| ANSWER | The final answer text (may be empty on RETRY) |
| ASSESSMENT | Reasoning about answer quality |
| FEEDBACK | Instructions for the ReAct agent on what to search next (on RETRY) |
On the final round, the prompt includes a forced-accept instruction: the Judge must return ACCEPT regardless of quality, so the pipeline always produces an answer. If the Judge's output is unparseable, the system fails open, defaulting to ACCEPT with confidence 0.5.
4-Layer Token-Budgeted Assembly
The Fallback and Direct LLM paths use the PromptAssembler, which builds prompts from four ordered layers:
| System instructions | Base system prompt, either with-docs or no-docs variant depending on whether chunks are available |
| Retrieved documents | Chunked context from the Retrieval Agent, ordered by relevance score |
| Conversation history | Prior turns, added newest-to-oldest until the token budget is exhausted |
| Current question | The user's query, always included in full |
Token budgeting (3000 tokens via tiktoken cl100k_base) ensures the assembled prompt fits within the model's context window. If retrieved documents exceed the budget, lower-scored chunks are dropped first. The assembler returns metadata: per-layer token counts, included/dropped chunks, and history turns included.
Trace Tab Visibility
Every agent's assembled prompt is visible in the Trace tab's Prompt layer. For ReAct, Grader, and Judge stages, this shows the system prompt and injected context. For Fallback stages, the [System] / [Instructions] / [Context] separators show exactly how the 4-layer assembly was constructed and what each layer contributed.