Concepts

AgentLens is a streaming pipeline debugger for multi-agent RAG systems. It streams every stage of a query – retrieval, grading, judging, and fallback – so you can see exactly how an answer is assembled. Built from scratch with no LangGraph, LlamaIndex, or other agent frameworks.

RAG

Retrieval-Augmented Generation grounds LLM responses in your own documents. Instead of relying on the model's training data alone, the system retrieves relevant chunks from a document corpus and includes them in the prompt. This reduces hallucination and keeps answers traceable to source material.

Multi-Agent Architecture

AgentLens coordinates four autonomous agents, each making independent decisions that affect downstream behavior. The Retrieval Agent runs a ReAct reasoning loop to find documents using vector search and keyword search tools. The Grader scores each chunk for relevance and filters out irrelevant material. The Quality Judge evaluates the filtered chunks, generates an answer, and can send the Retrieval Agent back for another round with targeted feedback. The Fallback generates a direct LLM answer when the Judge defers. This structured communication between agents produces higher-quality answers with fewer hallucinations.

Microservice Architecture

AgentLens runs as 4 FastAPI microservices plus a vector database:

Ingestion (port 8001) – document parsing, chunking, and embedding
Retrieval (port 8002) – vector search (ChromaDB) and keyword search (BM25)
Query (port 8003) – multi-agent orchestrator, ReAct agent, Quality Judge
API Gateway (port 8080) – request routing, model listing, config
ChromaDB (port 8000) – vector database for document embeddings

Query Classification

Before entering the multi-agent pipeline, every query passes through a heuristic classifier. Greetings and short non-questions skip retrieval entirely and go straight to a direct LLM call. This avoids unnecessary search overhead for conversational messages.

Pipeline Observability

Every LLM call in the pipeline is captured across six observation layers: Prompt, Thinking, Raw Response, Parsed Output, Execution, and Observation. The Live tab streams the outer three in real time as the pipeline runs. The Trace tab adds the inner three so you can inspect every detail after the query completes.