/posts/inside-docker-cagent-technical-walkthrough
Inside Docker cagent: A Technical Walkthrough
TL;DR
cagent keeps agents, models, tools, and RAG in one YAML file, then runs that same setup in TUI, --exec, API, or OCI sharing without rewriting glue code. You lose some improv freedom, but you get repeatable behavior that is easier to ship and debug.
Guide
Run these two commands back to back on Docker Desktop 4.49+:
cagent run
cagent run --exec "summarize this repo"
The first opens a TUI. The second prints to stdout and exits. Before you run the second one, predict: does the agent loop change, or only the interface? Only the surface changes. One YAML config, one Go runtime loop, multiple entrypoints. That constraint lets you move from local prompting to scripted CI automation without rewriting agent definitions. It also means you cannot do anything the schema does not anticipate.
TUI and exec share one runtime loop
Start cagent run in a terminal. You get an interactive TUI with streaming output, tool approval prompts, and session history. Kill it. Run cagent run --exec "summarize this repo". The same tools fire, the same model calls happen, the same session database (~/.cagent/session.db) records the exchange. Only the rendering differs -- TUI streams to a terminal widget, exec dumps to stdout.
The run command branches on mode after startup, not before. It loads config, restores or creates a session, and enters the same tool-calling loop regardless of whether a human is watching. Session continuity survives interface switches because the store is a shared SQLite file, not an in-memory buffer tied to a process.
Debug the loop once. If tool calls behave differently between TUI and exec, the bug is in rendering, not in orchestration. Need a third surface -- HTTP API, webhook handler, cron job -- add a transport layer, not a second runtime.
This holds as long as runtime customization fits the schema fields (toolsets, permissions, sub_agents). The moment you need behavior the schema cannot express, you are outside cagent's design.
Four YAML blocks control the runtime
Open a cagent config and change one block at a time. Before each rerun, predict what will change in agent behavior.
agents: Names, roles, prompts, tool access, and the delegation graph. Every runnable flow starts at one root agent. Remove a specialist fromsub_agentsand delegation stops at that boundary -- the root agent will not route tasks to an agent it cannot see.models: Decouples model references from agent roles. Swapgpt-5-miniforclaude-sonnet-4-5without touching agent definitions. Fallback and routing rules handle provider failure, so a model outage does not halt the pipeline.toolsets+permissions: Grants capability first, then narrows execution.toolsetsdeclares what tools an agent can access (filesystem, shell, think, todo).permissionsapplies allow, ask, and deny patterns to shell commands. Denyrm -rf*and the agent cannot run it even if the model asks. No pattern match, no execution.rag: Adds retrieval without code. Strategies include chunked embeddings (BM25, vector, or hybrid), and post-processing controls fusion method, reranking, deduplication, and result limits. Ifvector_dimensionsmismatches your embedding model's output, retrieval fails at index time with a clear error instead of returning silently wrong results.
A complete multi-agent config with routing, permissions, and RAG
version: "5"
models:
fast:
provider: openai
model: gpt-5-mini
thinking_budget: low
deep:
provider: anthropic
model: claude-sonnet-4-5
rag:
codebase:
docs: [./src, ./README.md]
strategies:
- type: chunked-embeddings
model: openai/text-embedding-3-small
vector_dimensions: 1536
database: ./rag.db
- type: bm25
database: ./bm25.db
results:
fusion:
strategy: rrf
k: 60
limit: 8
permissions:
allow:
- shell:cmd=go test*
deny:
- shell:cmd=rm -rf*
agents:
root:
model: fast
description: Coordinates implementation tasks
instruction: Delegate coding to developer and return concise output.
sub_agents: [developer]
toolsets:
- type: think
- type: todo
developer:
model: deep
description: Writes and verifies code
instruction: Implement requested changes and run tests.
rag: [codebase]
toolsets:
- type: filesystem
- type: shell
Same config, three entrypoints, zero adapter code
Three separate runtimes would mean three config files, three adapter layers, three places for behavior to drift. cagent runs one config file across cagent run, cagent run --exec, and cagent api. The team schema, tool pipeline, permission enforcement, and session lifecycle are shared.
Model-provider swaps are ref changes. Retrieval changes are strategy blocks. Agent distribution uses cagent share push and cagent pull instead of custom packaging. One YAML file, read three times, not three codebases maintained in parallel.
If your workflow includes handoffs between local dev and CI, keep one source config and run it in both TUI and
--exec. Drift usually comes from duplicated configs, not from model choice.
Five operations from prompt to answer
Trace a single prompt through the runtime:
- Resolve source -- accepts a local file, directory, alias, or OCI reference and resolves it to one canonical agent source. An OCI ref means the agent config was pulled from a registry, not written locally. This is where
cagent pullartifacts enter the pipeline. - Load team -- parses the schema: models, providers, toolsets, prompt files, sub-agent graph. Structural validation happens here. A missing field or type mismatch fails before any model call. Config errors surface at startup, not mid-conversation.
- Start or resume session -- loads by session ID or creates a new row in SQLite. Session history gives the model conversational continuity across runs. Resume a session started in TUI from an exec call, and the model sees the full prior exchange.
- Execute agentic loop -- the model reasons, calls tools, may invoke
transfer_taskto hand off to sub-agents. Each step routes through tool approval and permission checks. The loop runs until the model signals completion or hits a token limit. - Render and persist -- streams output to TUI, CLI JSON, or API transport while writing session state to SQLite. The conversation is replayable from the database.
Delegation is the critical boundary. In multi-agent mode, transfer_task is always auto-approved so specialists can work without human confirmation at every handoff. A root agent fans out to five specialists without five approval dialogs -- but a poorly scoped specialist with broad shell access can execute destructive commands with no human in the loop.
Broad autonomy spends safety margin for speed
Grant broad shell access, enable --yolo, issue a vague task, and watch: the agent executes commands immediately because approval friction is gone. That speed is the point, but the safety margin you spent does not come back.
What matters in practice:
- Tool approval is the primary brake, and
--yoloremoves it. Side-effect tools (shell, filesystem writes) prompt by default. Auto-approve skips that prompt for every tool call, not just safe ones. transfer_taskauto-approval amplifies weak role boundaries. If a specialist has broader access than it needs, the root agent can delegate destructive work to it without friction. One agent's permissions mistake becomes the whole team's problem.- RAG quality depends on indexing choices, not retrieval method. Semantic search can return high-confidence results that miss operationally relevant context. Chunk size, overlap, and fusion strategy are tuning decisions that affect correctness, not just performance.
- One config governs every runtime surface. A bad config commit affects TUI, exec, and API simultaneously. No per-environment override exists unless you maintain separate config files, which defeats the single-source design.
- Long sessions accumulate stale context. Session continuity helps until the model makes decisions based on information from 200 turns ago that no longer applies. Reset sessions deliberately.
Treat
--yoloas an environment-level policy, not a convenience flag. In shared repos, pair it with strictpermissions.denypatterns for destructive commands.
Multi-agent design is not "more agents." Each agent needs a narrow role, minimal toolset, and an explicit handoff contract. Without those, coordination cost erases the parallelism benefit.
Config-driven vs code-first vs hosted
| Axis | cagent | Code-first agent framework | Hosted no-code builder |
|---|---|---|---|
| Startup time | Minutes; cagent run works immediately | Slower; scaffold app and wiring first | Fast if your use case fits product limits |
| Change velocity | High for config-level edits | High for custom behavior in code | Medium; fast in UI, slower at edge cases |
| Operational control | Strong local control; files, shell, OCI, API | Maximum; you own every layer | Lowest; vendor runtime constraints |
| Safety model | Declarative permissions plus approval flow | Customizable but easy to misconfigure | Vendor-guarded, less transparent |
| Distribution | Native OCI push/pull for agent packages | You build your own packaging | Usually platform-specific sharing |
| When it wins | Repeatable ops with local execution | Novel orchestration logic | Managed simplicity |
cagent is a poor fit when your differentiator is runtime behavior that cannot be expressed in schema fields or plugin surfaces. Custom tool implementations, novel orchestration patterns, or control over the inference loop itself -- a code-first framework gives you those layers. cagent bets that most agent work is configuration, not code. For teams where that holds, it removes a class of operational problems.
Docker Desktop 4.49+ ships cagent pre-installed
No separate binary download, no PATH management, no version coordination across a team. The install step disappears for anyone already on Docker tooling.
- Frequent tagged releases mean pinning versions matters for reproducibility.
cagent share pushandcagent pulltreat agent configs like container images. Teams version, tag, and distribute agent definitions through the same registries they use for application images.- API mode accepts OCI refs with
--pull-interval, so long-running services refresh agent definitions on a schedule without restarts. - Hybrid fusion (
rrf, weighted, max) combines lexical and semantic retrieval signals in one config block.
Index a 120,000 character docs set with default chunk settings (size=1000, overlap=75) and rough chunk count is 120000 / (1000 - 75) = ~130. That estimate predicts indexing cost, retrieval fan-out, and reranking budget before you run the first query.
Validate in TUI and --exec, then ship that same file to OCI and cagent api.