Autonomous Agents

Agents That Run in Your Cloud, in Their Own Sandbox

Some AI tasks don't fit a DAG. They iterate. They explore. They write files, run shells, check git. Grove's agent runtime gives those tasks a goal-driven loop with an allowlisted tool surface, a per-run sandboxed workspace, durable turn history, and a budget cap — in the same Postgres-backed Rust service that runs your workflows.

What an Agent Is

An agent is a stored definition — name, system prompt, tool allowlist, optional model. Definitions are per-tenant and unique by name. A run executes one definition against a goal: POST /agents/:name/run mints a run id, allocates a workspace if the allowlist needs one, and spawns the loop on a background task. The response is an SSE stream of turn events.

The loop is dead simple: call the LLM with the running message history, dispatch any tool calls it requests, fold the results back into the history, repeat. It terminates on the finish tool, a text-only response, the turn cap, a token or USD budget cap, or external cancellation.

An agent is for…

Tasks where the steps aren't known up front — "investigate this failure", "implement this feature", "research and write a memo".
Long-horizon work that needs to iterate, backtrack, and self-correct.
Anything that benefits from filesystem and git tooling alongside LLM reasoning.
Workflows where the LLM should pick which tool to use next from a curated allowlist.

A DAG workflow is for…

Tasks where the structure is known up front — extract, classify, route, transform.
Parallelism with fan-out, merge, and conditional branches the operator wants to see in a diagram.
Compliance contexts where every step's purpose is documented in the workflow definition itself.
Generate-and-vet loops where the generator/critic pattern is the whole shape of the job.

Many real systems use both: a DAG workflow that calls an agent at one node, or an agent that delegates a structured sub-task to a workflow run. Grove ships both in the same binary; they share the broker, the persistence layer, and the auth layer.

The Tool Surface

An agent's allowlist can name tools from four buckets. The runtime validates the allowlist at definition time and builds a per-run tool registry on every POST /run.

Built-in

Server-side tools that ship with grove-core. Grove documentation lookup, the finish termination sentinel, the delegate_task sub-agent spawn.

Workspace (`ws_*`)

Seven server-executed file and shell tools: ws_file_read, ws_file_write, ws_file_edit, ws_multi_edit, ws_list_files, ws_grep, ws_shell. Bound to the per-run sandboxed workspace.

Git (`git_*`)

Twelve git operations: clone, init, status, add, commit, checkout, branch, diff, log, push, pull, fetch. The agent gets its tenant's git credentials via a SecretStore credential provider — tokens never leave the secret store.

Client-required

Stubs whose implementation lives in your application. The agent pauses on these and waits for your runtime to POST the result back. Same pause/resume mechanism Grove's DAG workflows use for external tools — credentials never reach the orchestrator.

MCP (`provider:tool`)

Your own external MCP servers, registered in a per-tenant catalog. An agent declares which providers it may use; Grove connects and calls out on its behalf mid-run — no client kept in the loop. Tools appear in the allowlist namespaced as provider:tool.

Per-Run Sandboxed Workspace

A run whose allowlist names any ws_* or git_* tool gets a fresh ephemeral workspace. The WorkspaceManager allocates it, binds it to the run, and the run's WorkspaceGuard RAII-cleans it on completion, error, or panic.

Container Backend

Default sandbox: Docker or Podman with namespaces, cgroups, --network none, and a bind-mounted workspace root. Shell-out is unprivileged; the network is off unless the agent has explicit tools for it.

Real-Path Containment

Every file op resolves symlinks and asserts the result stays within the canonicalized workspace root. The runtime refuses to start a file-writing run when no usable sandbox is available — the server boots without Docker, but ws_* runs are refused, not run unsandboxed.

Ephemeral Lifecycle

A workspace lives exactly as long as its run. Crash mid-run? The reaper sweeps the abandoned mount. Resume? The workspace is rebound if still present, or freshly allocated if it was reaped.

Tenant-Scoped Git Credentials

The git tools fetch credentials via a SecretStoreCredentialProvider bound to the run's tenant. Per-host keys (git.<host>.token or username/password) come from the tenant's namespace in the secret store — never from the agent's prompt or the request.

Durable Turn History & Resume

Each turn writes to agent_run_turns at the start; the LLM response (tool calls, tokens, USD cost, free text) is persisted immediately after the LLM call, before any tool runs. Then every tool call is bracketed by a durable marker in the turn's tool_call_progress column — Started before execution, rewritten to Completed after — so a crash mid-tool is recoverable without re-running side-effecting work.

Resume Semantics POST /agents/:name/runs/:id/resume re-enters the single incomplete turn rather than discarding it; replays the recorded LLM response verbatim where present

Per-Tool-Call Idempotency A tool that completed before a crash replays its recorded result; a tool that started but didn't complete is at-most-once — the LLM sees an "interrupted" error rather than risking a duplicate side effect

Budget Caps & Caching Per-run token and USD budget caps stop the loop the moment they trip; cost is computed from the model_prices table. The static system-prompt-plus-tools prefix is marked for provider prompt caching, so the re-sent context is billed at the cache rate turn after turn

Context Compaction Older completed turns are summarized once the message history crosses a threshold; runs in both the first-run and resume loops

Sub-Agent Delegation

Agents whose allowlist names delegate_task can spawn child runs. Delegation is synchronous — the parent's turn pauses while the child runs — and bounded by hard caps and budget containment.

Bounded Tree

Depth cap (default 5) and per-parent fan-out cap (default 10). The child-count check is a single atomic UPDATE … WHERE child_count < max RETURNING — no read-check-write race that lets siblings exceed the cap.

Budget Containment

A child's budget is the parent's remaining budget minus what earlier siblings have already consumed. The delegation tree's worst-case spend is roughly 2× the root's cap (root + tree), not unbounded.

Cancellation Cascade

Every descendant shares the root run's cancellation flag. Cancelling an ancestor stops in-flight children at the next checkpoint.

Recursive Context

A child is the parent of any grandchild. Each run builds its own delegation context and its own tool registry — no shared mutable state across siblings.

Operating & Observing Agent Runs

The HTTP surface is small and stable. The Arborist operator SPA renders the same data — agents list, agent-runs list, per-run turn-by-turn view, live SSE replay.

POST /agents — create or replace an agent definition (per-tenant unique name)

Author

POST /agents/:name/run — SSE-returning run start; CLI/SDK use this shape

Start (stream)

POST /agents/:name/runs — JSON-returning variant; returns run_id immediately and registers a broadcast for later subscribers

Start (JSON)

GET /agent-runs/:id/stream — live SSE replay; emits not_live when the run isn't currently broadcasting so the client can fall back to polling

Observe

POST /agents/:name/runs/:id/cancel — stop the run (and its descendants) at the next checkpoint

Cancel

POST /agents/:name/runs/:id/resume — continue a failed or interrupted run from its last persisted turn

Resume

Every read and cancel endpoint applies a fetch-then-fence tenant check: the agent must belong to the caller's tenant, and the run's tenant must match too. Cross-tenant access returns 404, not 403 — existence is not leaked.

Authoring Agents from Your MCP Client

Agents are also authorable through Grove's MCP server. Point Claude Desktop, Claude Code, or Cursor at POST /mcp/rpc with a Provision-scoped key and the LLM can call grove__upsert_agent, grove__list_agents_all, grove__delete_agent, and grove__list_available_agent_tools through standard tools/call. See the MCP page for the full setup and the two-key Invoke/Provision pattern that bounds the LLM's authoring rights.