> For the complete documentation index, see [llms.txt](/llms.txt).
> A full single-fetch corpus is available at [llms-full.txt](/llms-full.txt).
---
title: Debug and replay a failed AI workflow
description: Build a support workflow that fails on malformed LLM output, inspect the trace, patch the step, and recover without repeating completed work.
tags: ["Traces", "Replay", "Recovery"]
date: 2026-05-13
last_verified: 2026-05-13
audience: both
---

This cookbook builds one production-shaped failure from start to finish: a
customer-support workflow calls an LLM, gets malformed structured output, fails
before any external side effect happens, and then gets debugged from the trace.

By the end, you should be able to answer the questions that matter during an AI
workflow incident:

- Which step failed?
- What input, prompt, model output, and parsed state led to the failure?
- Which steps are already checkpointed?
- Is it safe to fix the code and let the workflow continue?
- How do we turn this failure into a regression case later?

## What you build

A support reply workflow with five steps:

1. Load the ticket.
2. Load the customer profile.
3. Classify the ticket.
4. Draft a structured reply with an agent.
5. Create an internal note after the draft validates.

The failure is deliberately placed in step 4. The model returns JSON without a
required `confidence` field, so validation fails before `create_internal_note`
can run. That gives you a clean incident to debug: earlier reads are
checkpointed, later side effects have not happened.

## Prerequisites

- The AGNT5 CLI is installed and authenticated.
- Python 3.12 or newer.
- An OpenAI API key in your project environment.
- A local AGNT5 dev session.

Start from a support-style project:

```bash
agnt5 create support-debug --template python/support-triage
cd support-debug
```

Run the dev session in one terminal:

```bash
agnt5 dev
```

## Add the failing workflow

Create a small workflow dedicated to this incident. The important design choice
is the step boundary: each external read, model call, and side effect is a
separate `ctx.step(...)`.

```python
from typing import Literal

from agnt5 import WorkflowContext, function, workflow
from pydantic import BaseModel, Field, ValidationError


class Ticket(BaseModel):
    ticket_id: str
    customer_id: str
    subject: str
    body: str


class CustomerProfile(BaseModel):
    customer_id: str
    plan: str
    refund_eligible: bool


class Classification(BaseModel):
    category: Literal["billing", "technical", "account"]
    priority: Literal["low", "normal", "high"]


class DraftReply(BaseModel):
    body: str
    confidence: float = Field(ge=0, le=1)


class InternalNote(BaseModel):
    note_id: str
    ticket_id: str


@function
async def load_ticket(ticket_id: str) -> Ticket:
    return Ticket(
        ticket_id=ticket_id,
        customer_id="cus_123",
        subject="Need a refund",
        body="I upgraded by mistake and would like my money back.",
    )


@function
async def load_customer_profile(customer_id: str) -> CustomerProfile:
    return CustomerProfile(
        customer_id=customer_id,
        plan="pro",
        refund_eligible=True,
    )


@function
async def classify_ticket(ticket: Ticket, profile: CustomerProfile) -> Classification:
    return Classification(category="billing", priority="high")


@function
async def draft_structured_reply(
    ticket: Ticket,
    profile: CustomerProfile,
    classification: Classification,
) -> DraftReply:
    # In a real project this is an Agent or model call. The malformed payload
    # simulates the incident: `confidence` is missing.
    model_output = """
    {
      "body": "You're eligible for a refund. I can start that process now."
    }
    """

    return DraftReply.model_validate_json(model_output)


@function
async def create_internal_note(ticket: Ticket, draft: DraftReply) -> InternalNote:
    # This is the side effect we do not want to run until the draft validates.
    return InternalNote(note_id=f"note_{ticket.ticket_id}", ticket_id=ticket.ticket_id)


@workflow
async def support_reply_debug(ctx: WorkflowContext, ticket_id: str) -> dict:
    ticket = await ctx.step(load_ticket, ticket_id)
    profile = await ctx.step(load_customer_profile, ticket.customer_id)
    classification = await ctx.step(classify_ticket, ticket, profile)
    draft = await ctx.step(draft_structured_reply, ticket, profile, classification)
    note = await ctx.step(create_internal_note, ticket, draft)

    return {
        "ticket_id": ticket.ticket_id,
        "note_id": note.note_id,
        "draft": draft.model_dump(),
    }
```

Import this module from `app.py` or your project package so the workflow is
registered when the worker starts.

## Run the failure

Trigger the workflow from another terminal:

```bash
agnt5 run support_reply_debug --input '{"ticket_id":"TCK-1001"}'
```

The run should fail in `draft_structured_reply`. List recent runs:

```bash
agnt5 inspect runs ls --status failed --limit 5
```

Then inspect the failed run:

```bash
agnt5 inspect runs describe <run-id>
agnt5 inspect trace -r <run-id> --verbose
```

In the trace, confirm the incident shape:

- `load_ticket` completed.
- `load_customer_profile` completed.
- `classify_ticket` completed.
- `draft_structured_reply` failed with a validation error.
- `create_internal_note` did not run.

That last point is the recovery line. A user-visible side effect has not
happened yet, so it is safe to patch the draft step and retry from the failed
boundary.

## Patch the failed step

Now make the draft step production-ready. Keep the raw model output visible,
attempt one bounded repair, then validate again.

```python
def repair_draft_payload(raw: str) -> str:
    # Keep this deliberately conservative. In production, make the repair
    # explicit and trace-visible rather than silently accepting bad data.
    if '"confidence"' not in raw:
        return raw.rstrip().rstrip("}") + ', "confidence": 0.62 }'
    return raw


@function
async def draft_structured_reply(
    ticket: Ticket,
    profile: CustomerProfile,
    classification: Classification,
) -> DraftReply:
    model_output = """
    {
      "body": "You're eligible for a refund. I can start that process now."
    }
    """

    try:
        return DraftReply.model_validate_json(model_output)
    except ValidationError:
        repaired = repair_draft_payload(model_output)
        return DraftReply.model_validate_json(repaired)
```

Restart the worker so the new function code is registered.

## Re-run and compare traces

Run the same input again:

```bash
agnt5 run support_reply_debug --input '{"ticket_id":"TCK-1001"}'
```

Inspect the new trace:

```bash
agnt5 inspect runs ls --limit 5
agnt5 inspect trace -r <new-run-id> --verbose
```

Compare it with the failed trace. The first three steps should have the same
inputs. The draft step should now return a valid `DraftReply`, and the
`create_internal_note` side effect should run once after validation succeeds.

## What replay proves

AGNT5 replay is what makes the trace trustworthy:

- Completed step results are journaled.
- Workflow body code can be re-entered after a crash or restart.
- Replay walks the same `ctx.step(...)` sequence.
- Completed steps return their recorded outputs instead of calling external
  systems again.
- The first step without a successful journal entry is where work resumes.

In this incident, replay tells you the failed run had not crossed the side
effect boundary. That is why the fix is safe.

## Turn the failure into a regression case

After patching the incident, keep the bad model output as an eval case. The eval
should fail if a future prompt, model, or parser change allows a draft without
`confidence` to pass validation.

At minimum, save:

- workflow input,
- raw model output,
- validation error,
- expected repaired output,
- expected side-effect behavior.

That eval case is the difference between "we fixed the incident" and "this
incident stays fixed."

## Production checklist

- Every external read, model call, and side effect is inside `ctx.step(...)`.
- The trace shows step input, output, error, and retry attempts.
- The failed step is before the first user-visible side effect.
- The patch changes the failing step only.
- The fixed trace proves the side effect runs once after validation.
- The malformed output is added to an eval dataset.

## Next steps

- [Retry AI workflow steps without duplicate side effects](/cookbooks/retry-without-duplicate-side-effects.md)
- [Turn a failed production AI run into an eval](/cookbooks/production-run-to-eval.md)
- [Debug AI workflows with traces, not scattered logs](/cookbooks/workflow-native-observability.md)
