Docs / Models / Ollama Cloud API

Ollama Cloud API

Ollama is the inference backend for AgentLens. It serves open-source LLMs through a simple HTTP API, handling model loading, quantization, and GPU acceleration transparently. The library spans from lightweight 3B models at 2 GB to frontier models over 1 TB.

API Endpoints

AgentLens uses three Ollama API endpoints:

  • POST /api/chat – chat completion with model, messages, options, and think flag
  • GET /api/tags – list available models (used for health checks and model dropdowns)
  • POST /api/show – fetch model metadata (capabilities, context length, parameter size)

All requests include an Authorization: Bearer header when an OLLAMA_API_KEY is configured.

Local Development

During local development, Ollama runs on your machine at localhost:11434. Pull any model from the Ollama library and it becomes available to all agent roles immediately.

Cloud API

In production, AgentLens connects to the Ollama Cloud API at api.ollama.com using a Bearer token. The cloud API provides access to 30+ models without requiring a local GPU. The model list is fetched dynamically and cached for one hour. A POST /models/refresh endpoint clears the cache and re-fetches on demand.

Model Tiers

The API Gateway groups models into three tiers for the UI dropdown, sorted by parameter count within each tier:

  • Fast (3B–21B) – 7 models: ministral-3:3b, gemma3:4b, ministral-3:8b, rnj-1:8b, ministral-3:14b, gpt-oss:20b, plus gemini-3-flash-preview (Fallback default, cloud-hosted)
  • Balanced (12B–117B) – 7 models including the three pipeline defaults: nemotron-3-nano:30b (ReAct), qwen3-next:80b (Grader), gpt-oss:120b (Judge)
  • Quality (123B–1.0T) – 18 frontier models: devstral-2 (123B), minimax-m2 series (230B), qwen3-vl (235B), glm-4 series (357B), qwen3.5 (397B), qwen3-coder (480B), deepseek-v3 and cogito (671B), mistral-large-3 (675B), glm-5 (756B), kimi-k2 series (1.0T)

See the full catalog with file sizes, vendors, and capability tags on the Models Library page.

Model Families

The library includes models from 10 vendors: NVIDIA (Nemotron), Alibaba (Qwen), OpenAI (GPT-OSS), Z.ai (GLM), Moonshot (Kimi), Mistral (Ministral, Devstral, Mistral Large), Google (Gemma), MiniMax, DeepSeek, and Cogito. Each model includes capability tags – tools, thinking, and vision – shown in the model picker dropdown.