Debug AI workflows with traces, not scattered logs

Logs are still useful, but they rarely preserve the full execution context of an AI workflow. This cookbook shows the same failure debugged with scattered logs and then with AGNT5 workflow-native traces.

Scenario

A lead-enrichment workflow returns the wrong company summary. The log line says the LLM call succeeded. The support team needs to know which source documents, tool outputs, prompts, retries, and state produced the answer.

What you build

A workflow with step-level trace capture.
Minimal logs for infrastructure symptoms.
Trace inspection for inputs, outputs, state, retries, and parent-child calls.
A root-cause review flow that ends in a reproducible case.

Workflow shape

Use steps to make the execution graph explicit.

@workflow
async def enrich_lead(ctx: WorkflowContext, lead_id: str) -> LeadBrief:
    lead = await ctx.step(load_lead, lead_id)
    search_results = await ctx.step(search_company_sources, lead.company)
    facts = await ctx.step(extract_company_facts, search_results)
    brief = await ctx.step(write_lead_brief, lead, facts)
    return await ctx.step(save_brief_once, lead.id, brief)

Each step boundary becomes a trace boundary. The trace is the system of record for what the workflow did.

Log-only debugging

With logs alone, you usually see symptoms:

INFO write_lead_brief completed model=gpt-4.1 latency_ms=1821
WARN user_reported_bad_summary lead_id=lead_123

This does not answer which source was wrong, whether a retry changed the output, or whether the save step used the intended brief.

Trace debugging

With the AGNT5 trace, inspect:

search_company_sources input and source list,
extract_company_facts output and confidence,
write_lead_brief prompt, model output, and token usage,
retry attempts and final selected result,
the saved brief receipt.

The trace points to the root cause: an outdated source was ranked first and passed through extraction.

Production checks

Every user-visible result links to a run ID.
The trace has enough data to explain the output.
Logs link to run IDs instead of duplicating trace payloads.
Failed traces can be replayed or turned into eval cases.
Sensitive fields are redacted before trace storage where required.