> For the complete documentation index, see [llms.txt](/llms.txt).
> A full single-fetch corpus is available at [llms-full.txt](/llms-full.txt).
---
page_type: explanation
audience: both
sdk: python
title: Workflows
description: Durable orchestrators — async functions whose progress survives crashes through journaled step boundaries.
last_verified: 2026-04-30
---

> A **workflow** is a `@workflow`-decorated `async` function whose body orchestrates steps and whose progress survives crashes. Each `ctx.step(...)` call is the unit of replay.

```python
from agnt5 import WorkflowContext, function, workflow


@function
async def validate_order(ctx, order_id: str, items: list) -> dict:
    return {"valid": len(items) > 0, "item_count": len(items)}


@function
async def charge_card(ctx, order_id: str) -> str:
    return await payments.charge(order_id)


@function
async def create_shipment(ctx, order_id: str, txn: str) -> str:
    return await shipping.create(order_id, txn)


@workflow
async def order_fulfillment(ctx: WorkflowContext, order_id: str, items: list) -> dict:
    validation = await ctx.step(validate_order, order_id, items)
    if not validation["valid"]:
        return {"order_id": order_id, "status": "rejected"}

    txn = await ctx.step(charge_card, order_id)
    tracking = await ctx.step(create_shipment, order_id, txn)
    return {"order_id": order_id, "status": "fulfilled", "txn": txn, "tracking": tracking}
```

If the worker crashes between `charge_card` and `create_shipment`, the next attempt skips `validate_order` and `charge_card` (their results are journaled) and runs `create_shipment` against the recorded `txn`.

## The mental model

A workflow body looks like ordinary `async` Python: variable assignments, branches, loops, exception handlers. The runtime treats the body as a deterministic recipe and the journal as the cooked-pot history. On every replay, the runtime walks the recipe and asks one question at each `ctx.step(...)`: do I have a recorded result for this call in this run? If yes, replay returns the journaled value and continues. If no, the runtime executes the step, writes the input and output to the journal, then returns.

The unit of durability is the **step**, not the line. Code between two `ctx.step(...)` calls — branches, variable assignments, calls to deterministic helpers — re-executes on every replay. Code inside a step is a side effect that runs at most once per run, modulo the [durable-execution gotcha](/docs/concepts/durable-execution.md#edge-cases-and-gotchas) about partial side effects.

`WorkflowContext` is richer than `FunctionContext`. It carries the workflow's run identifier, session and user identifiers for memory scoping, an entity for state changes, and the step counter the runtime uses for journaling. The context is your handle on the durability machinery; the body is the recipe.

## Why it works this way

Step boundaries are explicit so you can see where the durability bargain is being made. Implicit checkpointing — at every `await`, every line, every function call — produces unreadable code and unbounded journals. Boundary-only checkpointing makes the journal proportional to your business logic, not your control flow.

The cost is a constraint on workflow code: the body must be deterministic. Replay must arrive at the same `ctx.step(...)` call sites in the same order, every time. AGNT5 trades this constraint for an automatic recovery model. Without it, the system would have no way to tell which journaled result belongs to which call site.

## Edge cases and gotchas

- **The body must be deterministic.** Wall-clock reads, random numbers, network calls, and in-process caches in the workflow body are replay hazards. Move them inside a step. See [Determinism](/docs/concepts/determinism.md).
- **Three forms of `ctx.step`.** `ctx.step(handler, *args)` calls a `@function` (the recommended form). `ctx.step("name", awaitable)` checkpoints arbitrary async work. `ctx.step("name", lambda: ...)` checkpoints a synchronous callable. Pick one form per workflow and stay with it.
- **Long-running steps hold a lease.** A step that takes hours blocks the run from progressing past it. Surface progress through smaller steps instead of waiting indefinitely inside one call.
- **Runs are not deduplicated by input.** Re-invoking the same workflow with the same input creates a new run with a new ID and a new journal. Dedupe at the caller if you need at-most-once semantics across submissions.
- **`ctx.task(...)` still works.** Older code uses `ctx.task` for the same shape. New code uses `ctx.step` everywhere; both currently coexist.
- **In-flight runs stay on their version.** When a new deployment ships, runs that started on the previous version keep running on it. New runs use the new version. See [Versioning and deployment model](/docs/concepts/versioning-and-deployment.md).

## Related concepts

- [Functions](/docs/concepts/functions.md) — the units a workflow calls through `ctx.step`.
- [Durable execution](/docs/concepts/durable-execution.md) — the runtime guarantee a workflow provides.
- [Determinism — why workflows have rules](/docs/concepts/determinism.md) — the constraint replay imposes on the body.
- [Picking the right primitive](/docs/concepts/picking-the-right-primitive.md) — when to reach for a workflow versus a plain function.


**Code primitive**: `@workflow` decorator (Python) / `workflow(...)` factory (TypeScript)
**Anatomy**: async function whose body is a sequence of `await ctx.step(...)` calls; arguments and return values are JSON-serializable
**Related CLI**: [agnt5 deploy](/cli/deploy.md), [agnt5 logs](/cli/deployments.md)