For Data Teams

The LLM Layer for Your Data Pipelines

Grove is not an Airflow, Dagster, or Prefect replacement. It is the LLM-heavy step that slots into whatever orchestrator you already run — classification, extraction, data-quality explanation, generate-and-vet checks — as a durable DAG of LLM calls, tool invocations, and routing nodes.

What Grove Is — And What It Isn't

Grove is narrowly scoped on purpose. If you already operate a data platform, you don't need another orchestrator competing with it. You need a clean place to run the LLM steps that don't fit cleanly into SQL or Python tasks — with the durability, observability, and retry semantics you expect from production infrastructure.

Grove Is…

A DAG engine for LLM workflows — classification, extraction, reasoning, generate-and-vet loops.
Durable: every node execution is persisted to PostgreSQL with inputs, outputs, failures, and failover chains.
Failover-aware: LLM calls route through model groups with automatic provider fallback, including inside tool-use continuations.
Structured: response-format schemas with automatic retry on parse failure.
Conditional: routing nodes send records down different paths based on upstream output.
Iterative: refine loops run a generator/critic pair until the critic approves.
Observable: SSE streams every node event in real time — perfect for plugging into your existing monitoring.

Grove Isn't…

A scheduler. Grove runs on demand; use Kubernetes CronJobs, Airflow, or your existing orchestrator to trigger it.
A backfill engine. There is no built-in notion of "run this for every partition between X and Y."
A dbt replacement. SQL transformations, incremental models, and SQL-level tests belong in dbt.
A data movement layer. Nodes pass JSON-scale values to each other. Large datasets stay in your warehouse or object storage — Grove references them by pointer.
A lineage catalog. Grove records node-level execution metadata; tools like OpenLineage remain the right choice for column-level dataset lineage.

Where Grove Earns Its Keep

Five concrete patterns where dropping Grove into your pipeline is a better answer than wiring LLM calls by hand.

LLM-Augmented Classification

A record arrives, an LLM tags it, a conditional node routes it to the right processing branch. Extract → classify → route → transform, all with per-node retries and structured output. The surrounding pipeline keeps looking like a regular DAG task.

Data-Quality Triage

When a dbt test fails or an anomaly detector fires, kick off a Grove workflow that inspects the failing rows, reads the upstream model, and produces a human-readable explanation. Turns cryptic test failures into incident-ready summaries.

Generate-And-Vet SQL or Checks

Use a Refine node to have one LLM propose a SQL test or transformation, a second LLM critique it against known constraints, and loop until the critic approves. Bounded iterations; if no attempt passes, the workflow returns the last attempt or fails according to policy.

Document → Warehouse Extraction

Parse contracts, filings, or email threads; extract structured fields against a JSON Schema; reconcile entities against a knowledge base; deliver rows ready for your staging layer. External tool execution lets you write directly to your warehouse without exposing credentials to the orchestration layer.

Pipeline-Failure Diagnosis

Wire an Airflow on-failure callback to start a Grove workflow that pulls logs, inspects the failing task's inputs, and produces a first-pass triage before a human is paged. Reduces the median time between alert and "I know what broke."

Integration Patterns

A Grove workflow run is a single HTTP POST with JSON inputs and an SSE stream of events. That's enough surface area to slot in almost anywhere.

Airflow Task

A PythonOperator or BashOperator starts a Grove run, streams events into the task log, and surfaces final outputs via XCom. Failures propagate normally; retries compose with Airflow's retry policy.

dbt Post-Hook

Use dbt run-operation to trigger a Grove workflow after a model refresh — for example, to classify newly loaded rows, generate documentation, or run LLM-driven data-quality checks on top of standard dbt tests.

Dagster Op or Asset

Wrap a Grove run in a Dagster op and it participates in your asset graph like any other computation. Per-node SSE events can be forwarded as Dagster compute logs for native observability.

Event-Triggered

Webhook, SQS, Kafka, EventBridge — any system that can issue an HTTP POST can start a Grove run. A common pattern: write a small bridge service that consumes the event bus and calls Grove's startRun endpoint.

Scheduling via Kubernetes

Grove does not include a scheduler. For deployments that already run on Kubernetes — which the Helm chart supports out of the box — a CronJob is all you need. The scheduler responsibility stays where it belongs: in your cluster, alongside the rest of your scheduled work.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: nightly-classification
  namespace: data
spec:
  schedule: "0 2 * * *"   # every night at 02:00
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
          - name: trigger
            image: curlimages/curl:8
            command:
              - sh
              - -c
              - |
                TODAY=$(date -I)
                curl -sf -X POST \
                  "http://grove-core.grove.svc.cluster.local:3000/workflows/$WORKFLOW_ID/runs" \
                  -H "Content-Type: application/json" \
                  -d "{\"inputs\":{\"date\":\"$TODAY\"}}"
            env:
              - name: WORKFLOW_ID
                value: "a3402464-a32f-41c1-a660-801ff701b756"

For partitioned backfills, chained schedules, or DAG-family orchestration, use Airflow or Dagster to trigger Grove. We won't pretend a CronJob replaces them.

Data Boundaries: Keep the Bytes Where They Belong

Grove passes JSON values between nodes. That works well up to tens of kilobytes per node output. For real data — rows, DataFrames, parquet files, images above a few megabytes — the right pattern is reference-based: your workflow passes a pointer (warehouse table name, S3 URI, row batch IDs), and external tools fetch and write on demand.

Node Outputs Are JSON

Every node's output is stored in PostgreSQL as JSON. Small summaries, classifications, extracted fields, structured decisions — yes. Bulk row payloads — no. Keep node outputs lean so audit queries stay fast and storage stays predictable.

External Tools Move the Data

A Grove tool call pauses the workflow, signals your runtime via SSE, and resumes when your code responds. That's the hand-off point for anything larger than a JSON payload: your tool queries the warehouse, reads the object, writes the result, and returns a reference.

Your Warehouse Stays Your Warehouse

Because external tools execute in your own code, Grove never needs credentials to your warehouse, lake, or blob store. Your runtime owns the connection; Grove owns the orchestration.

Streaming-Safe Observations

SSE events stream as the workflow runs. A dashboard can show "classifier is processing row batch 3 of 5" in real time by subscribing to the workflow's event stream — no polling, no log tailing.

When to Reach for Something Else

Grove is not the tool for every data job. A short, honest guide to when you should pick something purpose-built.

Cron-scheduling a whole DAG family with dependencies, backfills, and sensors

Airflow, Dagster, Prefect

SQL transformations, incremental models, SQL-level tests, documentation

dbt

Column-level lineage across datasets

OpenLineage, DataHub, Atlan

Stream processing on event firehoses at Kafka/Kinesis volumes

Flink, Beam, ksqlDB

Anomaly detection on numeric time series

Purpose-built monitors (Prometheus, Monte Carlo, Bigeye)

An LLM workflow inside any of the above

Grove