May 13, 2026 DocumentsStructured outputReview
Build a document processing pipeline
Extract structured fields, validate them, pause for review, and retry failed document steps safely.
Document workflows fail in predictable ways: bad scans, missing fields, malformed model output, and partial external writes. This cookbook builds a pipeline that makes each failure inspectable and recoverable.
Scenario
An operations team uploads invoices. The workflow extracts fields, validates the result, pauses for review when confidence is low, and stores approved data in a system of record.
What you build
- A document ingestion workflow.
- OCR or text extraction.
- Structured field extraction.
- Validation and confidence checks.
- Human review for exceptions.
- An idempotent write to the destination system.
Workflow shape
@workflow
async def process_invoice(ctx: WorkflowContext, document_id: str) -> InvoiceOutcome:
document = await ctx.step(load_document, document_id)
text = await ctx.step(extract_text, document)
invoice = await ctx.step(extract_invoice_fields, text)
validation = await ctx.step(validate_invoice, invoice)
if validation.needs_review:
decision = await ctx.wait_for_signal(
"invoice_review",
timeout="10d",
metadata={"document_id": document_id, "issues": validation.issues},
)
invoice = decision.corrected_invoice
receipt = await ctx.step(store_invoice_once, document_id, invoice)
return InvoiceOutcome(status="stored", receipt_id=receipt.id)The review path is part of the workflow, not an out-of-band spreadsheet.
Validation rules
Use deterministic validation before asking another model to judge the output.
- Required fields are present.
- Totals add up.
- Currency is supported.
- Vendor is recognized.
- Confidence passes the threshold.
Production checks
- Raw document, extracted text, structured output, and validation errors are in the trace.
- Low-confidence extractions pause for review.
- The store step uses a stable idempotency key.
- Reprocessing a document does not duplicate destination records.
- Corrected review output can become an eval case.