Docs / Get Started / Concepts

Concepts

AgentLens is a streaming pipeline debugger for multi-agent RAG systems. It streams every stage of a query – retrieval, grading, judging, and fallback – so you can see exactly how an answer is assembled. Built from scratch with no LangGraph, LlamaIndex, or other agent frameworks.

RAG

Retrieval-Augmented Generation grounds LLM responses in your own documents. Instead of relying on the model's training data alone, the system retrieves relevant chunks from a document corpus and includes them in the prompt. This reduces hallucination and keeps answers traceable to source material.

Multi-Agent Architecture

AgentLens uses two cooperating agents. The Retrieval Agent runs a ReAct reasoning loop to find documents using vector search and keyword search tools. The Quality Judge evaluates the retrieved chunks, generates an answer, and can send the Retrieval Agent back for another round with targeted feedback. A separate Grader scores each chunk for relevance between the two agents. This feedback loop produces higher-quality answers with fewer hallucinations.

Microservice Architecture

AgentLens runs as 4 FastAPI microservices plus a vector database:

  • Ingestion (port 8001) – document parsing, chunking, and embedding
  • Retrieval (port 8002) – vector search (ChromaDB) and keyword search (BM25)
  • Query (port 8003) – multi-agent orchestrator, ReAct agent, Quality Judge
  • API Gateway (port 8080) – request routing, model listing, config
  • ChromaDB (port 8000) – vector database for document embeddings

Query Classification

Before entering the multi-agent pipeline, every query passes through a heuristic classifier. Greetings and short non-questions skip retrieval entirely and go straight to a direct LLM call. This avoids unnecessary search overhead for conversational messages.

Pipeline Observability

Every LLM call in the pipeline is captured across six observation layers: Prompt, Thinking, Raw Response, Parsed Output, Execution, and Observation. The Live tab streams the outer three in real time as the pipeline runs. The Trace tab adds the inner three so you can inspect every detail after the query completes.