> For the complete documentation index, see [llms.txt](/llms.txt).
> A full single-fetch corpus is available at [llms-full.txt](/llms-full.txt).
---
title: Debug AI workflows with traces, not scattered logs
description: Compare log-only debugging with workflow-native traces that preserve inputs, outputs, retries, and state.
tags: ["Observability", "Traces", "Logs"]
date: 2026-05-13
last_verified: 2026-05-13
audience: both
---

Logs are still useful, but they rarely preserve the full execution context of an
AI workflow. This cookbook shows the same failure debugged with scattered logs
and then with AGNT5 workflow-native traces.

## Scenario

A lead-enrichment workflow returns the wrong company summary. The log line says
the LLM call succeeded. The support team needs to know which source documents,
tool outputs, prompts, retries, and state produced the answer.

## What you build

- A workflow with step-level trace capture.
- Minimal logs for infrastructure symptoms.
- Trace inspection for inputs, outputs, state, retries, and parent-child calls.
- A root-cause review flow that ends in a reproducible case.

## Workflow shape

Use steps to make the execution graph explicit.

```python
@workflow
async def enrich_lead(ctx: WorkflowContext, lead_id: str) -> LeadBrief:
    lead = await ctx.step(load_lead, lead_id)
    search_results = await ctx.step(search_company_sources, lead.company)
    facts = await ctx.step(extract_company_facts, search_results)
    brief = await ctx.step(write_lead_brief, lead, facts)
    return await ctx.step(save_brief_once, lead.id, brief)
```

Each step boundary becomes a trace boundary. The trace is the system of record
for what the workflow did.

## Log-only debugging

With logs alone, you usually see symptoms:

```text
INFO write_lead_brief completed model=gpt-4.1 latency_ms=1821
WARN user_reported_bad_summary lead_id=lead_123
```

This does not answer which source was wrong, whether a retry changed the output,
or whether the save step used the intended brief.

## Trace debugging

With the AGNT5 trace, inspect:

- `search_company_sources` input and source list,
- `extract_company_facts` output and confidence,
- `write_lead_brief` prompt, model output, and token usage,
- retry attempts and final selected result,
- the saved brief receipt.

The trace points to the root cause: an outdated source was ranked first and
passed through extraction.

## Production checks

- Every user-visible result links to a run ID.
- The trace has enough data to explain the output.
- Logs link to run IDs instead of duplicating trace payloads.
- Failed traces can be replayed or turned into eval cases.
- Sensitive fields are redacted before trace storage where required.

## Next steps

- [Turn a failed production AI run into an eval](/cookbooks/production-run-to-eval.md)
- [Debug and replay a failed AI workflow](/cookbooks/debug-production-run.md)
- [Build a data extraction workflow](/cookbooks/data-extraction.md)
