Skip to content
Docs
Cookbooks Build a data extraction workflow
May 13, 2026 Structured outputToolsTraces

Build a data extraction workflow

Call tools, force JSON outputs, recover from malformed responses, and inspect every extraction step.

This cookbook builds a structured extraction workflow for AI outputs that must be parsed, validated, retried, and explained.

Scenario

An analyst submits free-form notes. The workflow extracts accounts, contacts, dates, and next actions as JSON, validates the result, and stores the structured record.

What you build

  • A structured-output prompt.
  • A schema validator.
  • A repair step for malformed JSON.
  • A retry policy for transient model failures.
  • A trace that shows raw and parsed outputs.

Workflow shape

@workflow
async def extract_account_update(ctx: WorkflowContext, note_id: str) -> ExtractionResult:
    note = await ctx.step(load_note, note_id)
    raw = await ctx.step(call_extraction_agent, note.text)
    parsed = await ctx.step(parse_and_validate_update, raw)
    receipt = await ctx.step(store_update_once, note.id, parsed)
    return ExtractionResult(update_id=receipt.id)

Separating model call and parse step makes malformed output easy to inspect.

Schema-first extraction

Define the expected output before writing the prompt.

class AccountUpdate(BaseModel):
    account_name: str
    contacts: list[str]
    next_action: str
    due_date: date | None
    confidence: float

The validator should reject missing required fields and values that do not match business rules.

Malformed output recovery

If parsing fails, run a bounded repair step and keep both versions in the trace.

@function
async def parse_and_validate_update(raw: str) -> AccountUpdate:
    try:
        return AccountUpdate.model_validate_json(raw)
    except ValidationError:
        repaired = await repair_json(raw)
        return AccountUpdate.model_validate_json(repaired)

Production checks

  • Raw model output and parsed output are both trace-visible.
  • Repair attempts are bounded.
  • Invalid data fails before the storage step.
  • The storage step is idempotent.
  • Failed extractions can be converted into eval cases.

Next steps

© 2026 AGNT5
llms.txt