Multi-Agent Document Intelligence

See inside your AI pipeline. AgentLens lets you watch your agents reason, grade, and judge every LLM call, from thinking to response.

Upload your documents. Ask anything. Get precise answers powered by a team of AI agents working across your knowledge base.

Try AgentLens
AgentLens pipeline console
Retrieval Agent

ReAct loop with hybrid vector + BM25 retrieval

An autonomous agent reasons step-by-step through a ReAct loop – choosing between semantic vector search and BM25 keyword search on each iteration, merging and deduplicating chunks until it has enough evidence to answer.

Learn more →
Retrieval Agent
Grader + Judge

CRAG-style grading with a quality judge

A retrieval grader scores every chunk on a 1-5 relevance scale, filtering noise before a second LLM – the Quality Judge – evaluates the answer for groundedness and completeness. If the verdict is RETRY, targeted feedback steers the next retrieval round.

Learn more →
Grader + Judge
Pipeline Lens

Prompt, Thinking, and Raw Response on every call

Open any pipeline stage in the Trace view and inspect three layers: the exact prompt sent to the LLM, the chain-of-thought reasoning (when thinking is enabled), and the raw text returned. Live tab streams execution in real time; Trace tab lets you replay it call by call.

Learn more →
Pipeline Lens

Choose any model

Pick the right model for the job

Each pipeline role – ReAct agent, Grader, Judge, Fallback – gets its own model picker. Mix fast models for retrieval with stronger ones for evaluation, and swap them without restarting.

Models overview
Ask a question about your documents
ReAct Agentnemotron-3-nano:30b
ministral-3:3b
gemma3:4btools
ministral-3:8b
gemma3:12btools
gpt-oss:20btoolsthinking
devstral-small-2:24btools
gemma3:27btools
nemotron-3-nano:30btoolsthinking
qwen3-next:80btoolsthinking
gpt-oss:120btoolsthinking

Thinking models built in

Toggle chain-of-thought reasoning per role. See the model think through retrieval decisions, grading logic, and quality evaluation – all visible in the Trace view.

Thinking Layer
ROLETHINK
ReAct
nemotron-3-nano:30b
Grader
qwen3-next:80b
Judge
gpt-oss:120b
Fallback
gemini-3-flash-preview
thinking The user asks about retrieval strategies. I should use vector_search first for semantic relevance...

From 3B to 1T parameters

Lightweight 3B‑4B models for fast iteration, mid-range defaults like nemotron‑3‑nano:30b, qwen3‑next:80b, and gpt‑oss:120b for production, and frontier models up to glm‑5 and kimi‑k2.5 at 1T parameters. 30+ models from Ollama Cloud, all through the same pipeline.

Browse library
MODELSIZE
ministral-3:3b
Mistral
2.0 GB
gemma3:4b
Google
3.3 GB
nemotron-3-nano:30b
NVIDIA
18 GB
qwen3-next:80b
Alibaba
47 GB
gpt-oss:120b
OpenAI
71 GB
glm-5
Z.ai
704 GB
kimi-k2.5
Moonshot
1042 GB

Built by Adityo Nugroho

Designed and built end-to-end retrieval-augmented generation systems – from microservice architecture and infrastructure-as-code to multi-agent orchestration with autonomous reasoning, quality evaluation, and retry feedback loops. A 4-service pipeline with ReAct agents, CRAG-style grading, hybrid retrieval (vector + BM25), token-budgeted prompt assembly, and real-time streaming observability. Deployed to production on AWS with Terraform, Docker, and Nginx reverse proxy. No frameworks, no LangChain, no LlamaIndex – 260+ tests.

Ollama
ChromaDB
BM25
Vector Search
Python
FastAPI
Next.js
Docker
Terraform
AWS

Try AgentLens.

Open AgentLens