Reply first, finish later: respond_here for chat-shaped workflows

A common shape in production AI systems: the user is waiting on an answer, but the workflow has more to do. Write the conversation to memory. Run a tagger so the next inbound from this customer routes correctly. File an internal ticket. Log the prompt for compliance. None of that should block the reply. None of it should be torn off into a separate fire-and-forget job either, because the moment you fragment the workflow you fragment the audit trail.

Grove’s answer is respond_here: a per-node flag that says this node’s output is what the caller is waiting for. The instant that node completes, the engine emits a response_ready event and persists the response payload on the run record. The caller can render the answer immediately. The DAG keeps executing. Run completion happens at the end. There is one workflow id, one run id, and one audit row.

Here’s a customer-support skeleton:

import {
  GroveClient,
  InputNode, OutputNode, LlmNode, buildWorkflow,
} from '@thedouglenz/grove-sdk';

const grove = new GroveClient({ baseUrl: 'http://localhost:3000' });

const input = new InputNode('input');

// Critical path. respondHere() marks this as the node whose output the
// caller is waiting on — emitted via SSE the moment this node completes.
const reply = new LlmNode('reply', input, {
  systemPrompt: `You are a customer support agent. Answer the user's
question helpfully and concisely.

Question: {{inputs.question}}`,
  model: 'fast',
}).respondHere();

// Side branch. Shares only the input dependency, so it runs in parallel
// with the reply. onFailure('continue') means a tagging blip doesn't
// fail the whole run.
const tag = new LlmNode('tag', input, {
  systemPrompt: `Classify the question. Return JSON.

Question: {{inputs.question}}`,
  model: 'mid',
  responseFormat: {
    type: 'json_schema', name: 'tags',
    schema: {
      type: 'object', required: ['intent', 'urgency'],
      properties: {
        intent:  { type: 'string', enum: ['billing', 'bug', 'feature', 'other'] },
        urgency: { type: 'number' },
      },
    },
  },
}).onFailure('continue');

const ticket = new LlmNode('ticket', tag, {
  systemPrompt: `Draft a one-line internal ticket title.
Tags: {{nodes.tag}}
Question: {{inputs.question}}`,
  model: 'fast',
}).onFailure('continue');

const wf = await grove.workflows.create(
  buildWorkflow('support-bot', [
    new OutputNode('reply',  reply),
    new OutputNode('ticket', ticket),
  ]),
);

const run = await grove.workflows.startRun(wf.id, {
  inputs: { question: 'My invoice is showing the wrong amount this month.' },
});

for await (const e of run.events()) {
  if (e.type === 'response_ready') {
    // Render to the user NOW. The DAG keeps running.
    console.log('reply:', e.output);
  }
  if (e.type === 'node_completed' && e.node_id === 'ticket') {
    console.log('ticket queued in background:', e.output);
  }
  if (e.type === 'run_completed' ||
      e.type === 'run_completed_with_non_critical_errors') {
    console.log('run done');
  }
}

Three Grove-specific things are doing the heavy lifting.

`respond_here`

The reply node is on the critical path; everything else hangs off input independently. The DAG scheduler runs nodes the moment their dependencies are satisfied, so reply and tag start in parallel as soon as input is available. When reply completes, the engine writes the response payload to the run record and emits a response_ready SSE event — atomically — so a subscriber that connects late can still recover the response via GET /runs/:id.

At most one node per workflow may set respondHere(). The validator rejects workflows that put onFailure('continue') on the response node itself or on any of its ancestors, because both would make “the response was emitted” ambiguous: if a load-bearing upstream is allowed to fail-soft, the reply could be silently skipped while the run still reports success.

`on_failure: continue`

The default failure policy is fail_run — one failed node fails the whole run. That’s the right default for the reply path. For background work it’s the wrong default: a flaky tagging model shouldn’t poison a run whose user-facing answer succeeded. onFailure('continue') says “if this node fails, mark its descendants skipped and let the run finish anyway.” The terminal event becomes run_completed_with_non_critical_errors instead of run_completed, with the failed node ids attached, and the per-node executions ledger preserves what failed and why.

The two flags compose. A pre-response telemetry call can be continue. A post-response audit write can be fail_run. The workflow author picks per node, the validator enforces the safety rule.

Multiple outputs, one DAG

buildWorkflow('support-bot', [...]) walks back from a list of terminal nodes to collect the whole graph. Each OutputNode becomes a key in run.outputs: outputs.reply is the answer the user already saw; outputs.ticket is what the support queue picks up after the run completes. One workflow id, one run id, one row in the runs table — and the compliance document for the run shows both branches and which one was the response.

What you actually see

reply: Your invoice this month includes a one-time pro-rated charge for
       the seat you added on the 12th. Here's the breakdown: ...
ticket queued in background: Billing — pro-rated seat charge unclear to user
run done

The reply lands in the terminal almost immediately. The ticket message arrives a few seconds later, after the tagger and ticket-drafting LLM have finished. Both come from one event stream on one run — no second workflow to correlate, no fire-and-forget side effect escaping the audit trail.

When to reach for it

Anywhere one node’s output is what the caller is waiting on, and you also want to do work that doesn’t need to block the reply. Chat assistants with memory writes. Support bots with ticket drafting and routing. Document Q&A that logs the question to a feedback queue. Internal copilots that send a Slack ping when an answer cites a sensitive policy. Anywhere you’d be tempted to fork a background job from your application code, respond_here keeps the work inside one DAG — visible to Arborist, captured in audit logs, reproducible on replay.