Human-in-the-Loop Approvals Without the Glue Code

The awkward question with human-in-the-loop flows is not the UI. It is: what happens to the process that is mid-execution when a human needs to step in? A workflow that is halfway through a refund decision, or a code review, or an outbound email it is not allowed to send without approval, cannot simply block a worker thread and wait three hours for a Slack reply.

AGNT5 treats this as durability, not as concurrency. The workflow reaches a pause point, writes a checkpoint, and the worker is free to drop the run entirely. When the human replies, the coordinator rehydrates the run and resumes it from the checkpoint with the response in hand.

From the SDK, it is one call:

from agnt5 import workflow, WorkflowContext

@workflow
async def refund_flow(ctx: WorkflowContext, order_id: str, amount_cents: int) -> dict:
    order = await ctx.step("fetch_order", lambda: get_order(order_id))

    # Small refunds auto-approve.
    if amount_cents < 5_000:
        await ctx.step("issue_refund",
                       lambda: issue_refund(order_id, amount_cents))
        return {"status": "auto_approved", "amount_cents": amount_cents}

    # Anything larger pauses for a human.
    decision = await ctx.wait_for_user(
        question=f"Approve ${amount_cents/100:.2f} refund for order {order_id}?",
        input_type="approval",
        metadata={"order_id": order_id, "customer": order["customer_email"]},
    )

    if decision["approved"]:
        await ctx.step("issue_refund",
                       lambda: issue_refund(order_id, amount_cents))
        return {"status": "approved_by_human", "approver": decision["user_id"]}
    else:
        return {"status": "rejected", "reason": decision.get("note", "")}

ctx.wait_for_user is not an async sleep. It is a cooperative stop.

What the runtime does on pause

When the workflow calls ctx.wait_for_user, the SDK packages the question, input type, and metadata and sends an awaiting_signal entry to the coordinator. The coordinator appends the entry to the run’s journal, marks the run’s status as paused, and returns control to the worker. The worker can now serve other runs or shut down cleanly — nothing about this run is still holding a handle.

On the gateway side, the pause shows up as an SSE event on the run’s stream. Studio or any subscribing client sees the run transition to paused and surfaces the prompt. The stream stays open; the run is still live, just not executing.

Meanwhile the engine-processor crate’s timer poller has scheduled a timeout — if configured — and the run’s state machine is in the awaiting_user state. No worker is consuming CPU on its behalf.

What happens on reply

A human clicks “approve” in Studio, or an API caller posts to the signal endpoint. The gateway hands the payload to the coordinator, which appends a signal_received entry to the journal and re-dispatches the run.

A worker picks up the run. The SDK replays from the journal: fetch_order returns its memoized output, the awaiting_signal entry is replaced by the signal_received payload, and ctx.wait_for_user returns {"approved": true, "user_id": "..."}. Execution continues into issue_refund as if no time had passed.

From the workflow’s perspective, wait_for_user is a normal await. From the runtime’s perspective, the worker that issued the wait and the worker that received the reply are not necessarily the same process — often not even the same node.

Why this is the right primitive

There are a few alternative shapes we considered and rejected.

“Fire a webhook and exit.” The workflow calls an external approval service, then returns. The caller is responsible for kicking off a new workflow when the decision arrives. This works but loses the single-execution mental model — the developer now has to reason about two workflows, correlation IDs, and partial state.

“Block a worker until the signal arrives.” Simple to implement, terrible to operate. Human approvals take minutes to days. A worker thread held open for that long is wasted capacity, and every deploy or autoscale event breaks in-flight decisions.

“Poll a database.” A worker wakes up every thirty seconds and checks whether the approval arrived. Wastes CPU, delays the reply, and still does not survive worker restarts cleanly.

The durable-pause shape works because the journal already exists. We are not building a separate signaling system — we are using the same append-only log that handles every other checkpoint, and the same state machine that handles every other run transition. The pause is a journal entry. The resume is a journal entry. The identity of the worker that handled each phase is irrelevant.

Timeouts and escalations

wait_for_user takes an optional timeout. When it fires, the SDK raises a typed exception that the workflow can catch:

from agnt5.exceptions import SignalTimeout

try:
    decision = await ctx.wait_for_user(
        question="Approve deploy to production?",
        input_type="approval",
        timeout_seconds=3600,
    )
except SignalTimeout:
    await ctx.step("notify_slack",
                   lambda: post_slack("Deploy approval timed out, escalating."))
    decision = await ctx.wait_for_user(
        question="ESCALATION: Approve deploy?",
        input_type="approval",
        metadata={"escalation": True},
    )

The timeout is itself durable — the processor’s timer poller schedules it in the journal. If the runtime restarts, the timeout is re-registered on recovery. An escalation that is supposed to fire at T+1h fires at T+1h, regardless of what happened to the worker in between.

Why this matters

AI systems that make real decisions need a human in the loop on a meaningful fraction of requests. If that loop is where your reliability story falls apart — lost decisions, duplicate approvals, workflows stuck in a weird zombie state — the rest of the durable execution story does not save you. Making the pause a first-class primitive, checkpointed and resumable like any other step, is how the operational story stays consistent from end to end.

One worker issues the wait. A different worker handles the reply. The run itself does not notice. That is the shape.