/posts/inside-docker-cagent-technical-walkthrough

Inside Docker cagent: A Technical Walkthrough

TL;DR

cagent keeps agents, models, tools, and RAG in one YAML file, then runs that same setup in TUI, --exec, API, or OCI sharing without rewriting glue code. You lose some improv freedom, but you get repeatable behavior that is easier to ship and debug.

Feb 22, 2026 8 min updated Feb 22, 2026

Guide

Run these two commands back to back on Docker Desktop 4.49+:

cagent run
cagent run --exec "summarize this repo"

The first opens a TUI. The second prints to stdout and exits. Before you run the second one, predict: does the agent loop change, or only the interface? Only the surface changes. One YAML config, one Go runtime loop, multiple entrypoints. That constraint lets you move from local prompting to scripted CI automation without rewriting agent definitions. It also means you cannot do anything the schema does not anticipate.

Start cagent run in a terminal. You get an interactive TUI with streaming output, tool approval prompts, and session history. Kill it. Run cagent run --exec "summarize this repo". The same tools fire, the same model calls happen, the same session database (~/.cagent/session.db) records the exchange. Only the rendering differs -- TUI streams to a terminal widget, exec dumps to stdout.

The run command branches on mode after startup, not before. It loads config, restores or creates a session, and enters the same tool-calling loop regardless of whether a human is watching. Session continuity survives interface switches because the store is a shared SQLite file, not an in-memory buffer tied to a process.

Debug the loop once. If tool calls behave differently between TUI and exec, the bug is in rendering, not in orchestration. Need a third surface -- HTTP API, webhook handler, cron job -- add a transport layer, not a second runtime.

This holds as long as runtime customization fits the schema fields (toolsets, permissions, sub_agents). The moment you need behavior the schema cannot express, you are outside cagent's design.

Four YAML blocks control the runtime

Open a cagent config and change one block at a time. Before each rerun, predict what will change in agent behavior.

agents: Names, roles, prompts, tool access, and the delegation graph. Every runnable flow starts at one root agent. Remove a specialist from sub_agents and delegation stops at that boundary -- the root agent will not route tasks to an agent it cannot see.
models: Decouples model references from agent roles. Swap gpt-5-mini for claude-sonnet-4-5 without touching agent definitions. Fallback and routing rules handle provider failure, so a model outage does not halt the pipeline.
toolsets + permissions: Grants capability first, then narrows execution. toolsets declares what tools an agent can access (filesystem, shell, think, todo). permissions applies allow, ask, and deny patterns to shell commands. Deny rm -rf* and the agent cannot run it even if the model asks. No pattern match, no execution.
rag: Adds retrieval without code. Strategies include chunked embeddings (BM25, vector, or hybrid), and post-processing controls fusion method, reranking, deduplication, and result limits. If vector_dimensions mismatches your embedding model's output, retrieval fails at index time with a clear error instead of returning silently wrong results.

A complete multi-agent config with routing, permissions, and RAG

version: "5"

models:
  fast:
    provider: openai
    model: gpt-5-mini
    thinking_budget: low
  deep:
    provider: anthropic
    model: claude-sonnet-4-5

rag:
  codebase:
    docs: [./src, ./README.md]
    strategies:
      - type: chunked-embeddings
        model: openai/text-embedding-3-small
        vector_dimensions: 1536
        database: ./rag.db
      - type: bm25
        database: ./bm25.db
    results:
      fusion:
        strategy: rrf
        k: 60
      limit: 8

permissions:
  allow:
    - shell:cmd=go test*
  deny:
    - shell:cmd=rm -rf*

agents:
  root:
    model: fast
    description: Coordinates implementation tasks
    instruction: Delegate coding to developer and return concise output.
    sub_agents: [developer]
    toolsets:
      - type: think
      - type: todo

  developer:
    model: deep
    description: Writes and verifies code
    instruction: Implement requested changes and run tests.
    rag: [codebase]
    toolsets:
      - type: filesystem
      - type: shell

Same config, three entrypoints, zero adapter code

Three separate runtimes would mean three config files, three adapter layers, three places for behavior to drift. cagent runs one config file across cagent run, cagent run --exec, and cagent api. The team schema, tool pipeline, permission enforcement, and session lifecycle are shared.

Model-provider swaps are ref changes. Retrieval changes are strategy blocks. Agent distribution uses cagent share push and cagent pull instead of custom packaging. One YAML file, read three times, not three codebases maintained in parallel.

If your workflow includes handoffs between local dev and CI, keep one source config and run it in both TUI and --exec. Drift usually comes from duplicated configs, not from model choice.

Five operations from prompt to answer

Trace a single prompt through the runtime:

Resolve source -- accepts a local file, directory, alias, or OCI reference and resolves it to one canonical agent source. An OCI ref means the agent config was pulled from a registry, not written locally. This is where cagent pull artifacts enter the pipeline.
Load team -- parses the schema: models, providers, toolsets, prompt files, sub-agent graph. Structural validation happens here. A missing field or type mismatch fails before any model call. Config errors surface at startup, not mid-conversation.
Start or resume session -- loads by session ID or creates a new row in SQLite. Session history gives the model conversational continuity across runs. Resume a session started in TUI from an exec call, and the model sees the full prior exchange.
Execute agentic loop -- the model reasons, calls tools, may invoke transfer_task to hand off to sub-agents. Each step routes through tool approval and permission checks. The loop runs until the model signals completion or hits a token limit.
Render and persist -- streams output to TUI, CLI JSON, or API transport while writing session state to SQLite. The conversation is replayable from the database.

Delegation is the critical boundary. In multi-agent mode, transfer_task is always auto-approved so specialists can work without human confirmation at every handoff. A root agent fans out to five specialists without five approval dialogs -- but a poorly scoped specialist with broad shell access can execute destructive commands with no human in the loop.

Broad autonomy spends safety margin for speed

Grant broad shell access, enable --yolo, issue a vague task, and watch: the agent executes commands immediately because approval friction is gone. That speed is the point, but the safety margin you spent does not come back.

What matters in practice:

Tool approval is the primary brake, and --yolo removes it. Side-effect tools (shell, filesystem writes) prompt by default. Auto-approve skips that prompt for every tool call, not just safe ones.
transfer_task auto-approval amplifies weak role boundaries. If a specialist has broader access than it needs, the root agent can delegate destructive work to it without friction. One agent's permissions mistake becomes the whole team's problem.
RAG quality depends on indexing choices, not retrieval method. Semantic search can return high-confidence results that miss operationally relevant context. Chunk size, overlap, and fusion strategy are tuning decisions that affect correctness, not just performance.
One config governs every runtime surface. A bad config commit affects TUI, exec, and API simultaneously. No per-environment override exists unless you maintain separate config files, which defeats the single-source design.
Long sessions accumulate stale context. Session continuity helps until the model makes decisions based on information from 200 turns ago that no longer applies. Reset sessions deliberately.

Treat --yolo as an environment-level policy, not a convenience flag. In shared repos, pair it with strict permissions.deny patterns for destructive commands.

Multi-agent design is not "more agents." Each agent needs a narrow role, minimal toolset, and an explicit handoff contract. Without those, coordination cost erases the parallelism benefit.

Config-driven vs code-first vs hosted

Axis	cagent	Code-first agent framework	Hosted no-code builder
Startup time	Minutes; `cagent run` works immediately	Slower; scaffold app and wiring first	Fast if your use case fits product limits
Change velocity	High for config-level edits	High for custom behavior in code	Medium; fast in UI, slower at edge cases
Operational control	Strong local control; files, shell, OCI, API	Maximum; you own every layer	Lowest; vendor runtime constraints
Safety model	Declarative permissions plus approval flow	Customizable but easy to misconfigure	Vendor-guarded, less transparent
Distribution	Native OCI push/pull for agent packages	You build your own packaging	Usually platform-specific sharing
When it wins	Repeatable ops with local execution	Novel orchestration logic	Managed simplicity

cagent is a poor fit when your differentiator is runtime behavior that cannot be expressed in schema fields or plugin surfaces. Custom tool implementations, novel orchestration patterns, or control over the inference loop itself -- a code-first framework gives you those layers. cagent bets that most agent work is configuration, not code. For teams where that holds, it removes a class of operational problems.

Docker Desktop 4.49+ ships cagent pre-installed

No separate binary download, no PATH management, no version coordination across a team. The install step disappears for anyone already on Docker tooling.

Frequent tagged releases mean pinning versions matters for reproducibility.
cagent share push and cagent pull treat agent configs like container images. Teams version, tag, and distribute agent definitions through the same registries they use for application images.
API mode accepts OCI refs with --pull-interval, so long-running services refresh agent definitions on a schedule without restarts.
Hybrid fusion (rrf, weighted, max) combines lexical and semantic retrieval signals in one config block.

Index a 120,000 character docs set with default chunk settings (size=1000, overlap=75) and rough chunk count is 120000 / (1000 - 75) = ~130. That estimate predicts indexing cost, retrieval fan-out, and reranking budget before you run the first query.

Validate in TUI and --exec, then ship that same file to OCI and cagent api.

TUI and exec share one runtime loop