Build a self-improving essay writer in 50 lines

Generator/critic loops are everywhere in agentic AI: write a draft, score it, retry if it isn’t good enough, give up after N tries. They’re tedious to wire up by hand. You end up with bookkeeping for the iteration counter, branching logic for the accept/reject decision, and prompts that drift between “first attempt” and “revision” framings.

Grove has a node type for this: RefineNode. You declare the body subgraph, the accept criterion (as a predicate), and an iteration cap. The engine runs the loop, emits a refine_iteration_rejected event each time the predicate fails, an refine_iteration_accepted event the moment it passes, and exposes the prior iteration’s output to the next one through template variables.

Here’s a working self-improving essay writer:

import {
  GroveClient,
  InputNode, OutputNode, LlmNode, RefineNode,
} from '@thedouglenz/grove-sdk';

const grove = new GroveClient({ baseUrl: 'http://localhost:3000' });

const input = new InputNode('input');

// Body subgraph: draft → critic. Body nodes reach outer data via templates;
// graph edges to outer nodes are forbidden (the body is sealed).
const draft = new LlmNode('draft', undefined, {
  systemPrompt: `Write a 200-word essay on: {{inputs.topic}}.

Prior attempt (score {{refine.previous_feedback}}/1.0):
{{refine.previous_output}}`,
  model: 'mid',
});

const critic = new LlmNode('critic', draft, {
  systemPrompt: `Score the essay 0-1 on clarity, evidence, and prose.
Return JSON. Essay:

{{body.draft}}`,
  model: 'mid',
  responseFormat: {
    type: 'json_schema', name: 'critique',
    schema: {
      type: 'object', required: ['score', 'feedback'],
      properties: {
        score:    { type: 'number' },
        feedback: { type: 'string' },
      },
    },
  },
});

const writer = new RefineNode('writer', input, {
  maxIterations: 5,
  acceptWhen: { left: '{{body.critic.score}}', op: '>=', right: 0.9 },
  body: critic,
  emit: draft,
});

const wf = await grove.workflows.create(
  new OutputNode('output', writer).toWorkflow('self-improving-essay'),
);
const run = await grove.workflows.startRun(wf.id, {
  inputs: { topic: 'why DAGs make AI workflows tractable' },
});

for await (const e of run.events()) {
  if (e.type === 'refine_iteration_rejected')
    console.log(`iter ${e.iteration}: score=${e.verdict}`);
  if (e.type === 'refine_iteration_accepted')
    console.log(`accepted at iter ${e.iteration}`);
  if (e.type === 'run_completed')
    console.log('\n' + e.outputs.output);
}

Three Grove-specific things are doing the heavy lifting here.

The Refine pattern

The body is draft → critic. The RefineNode runs that pair as a loop, evaluates {{body.critic.score}} >= 0.9 after each run, and stops when the predicate holds or maxIterations is hit. The emit: draft line says “when we accept, the draft node’s output is what comes out of the writer.” On iteration 2+, the draft prompt sees the previous attempt and its score through {{refine.previous_output}} and {{refine.previous_feedback}} — both render as the empty string on iteration 0, so the same prompt template handles “first attempt” and “revision” without branching.

Structured output

The critic returns JSON conforming to the schema you pass. Grove validates the model’s response against the schema and retries on parse failures. Because the response is structured, the acceptWhen predicate can compare a numeric field ({{body.critic.score}}) directly without any string-parsing fragility — the gate between “accept” and “reject” is just 0.9.

Templates

{{inputs.topic}} injects the run’s input. {{body.draft}} injects the prior body node’s output (within the same iteration). {{refine.previous_output}} and {{refine.previous_feedback}} carry forward across iterations. No string concatenation in application code, no message history bookkeeping — the prompt itself describes how the data flows.

What you actually see

Stream the run and you watch the loop tighten:

iter 0: score=0.62
iter 1: score=0.78
iter 2: score=0.91
accepted at iter 2

[The accepted essay]
DAGs make AI workflows tractable because they encode the only thing
that actually matters at scale: dependency. When two LLM calls don't
depend on each other, they should run in parallel — and a DAG says so
by structure, not by convention. ...

The output is tangible. You can read iteration 0, read iteration 2, and see the improvement — concretely, on the page, in the same terminal. And because the loop lives in the workflow rather than your app code, every iteration shows up in the run’s event stream and the compliance document — visible to Arborist, captured in audit logs, reproducible on replay.

When to reach for it

Refine works when “good enough” is measurable as a comparison: a numeric score, a boolean check, a category equal to a target. If your acceptance criterion is genuinely vibes-based, no orchestration primitive will save you. But for anything where you can articulate the goal as a predicate — schema validation, scoring, factual checks, output length thresholds — RefineNode collapses what used to be 200 lines of orchestration into one declarative node.