Get started Run

Run

Deploy, execute durably, and inspect every running workflow

Build gets your code written. Run is how that code actually executes in production — where durability, scale, and visibility stop being SDK concerns and start being platform guarantees.

Most agent frameworks stop at “here’s a decorator.” That works until the first real failure: an LLM call times out three minutes into a five-minute workflow, a worker process crashes mid-execution, a retry storm compounds a bad deploy. These aren’t prompting problems. They’re execution problems — and solving them in application code is exactly the wrong layer.

The Run section of AGNT5 is the half of the platform you can’t write with a library. It’s what turns the code you built into a running system.

The three pillars

Run is organized around three things the platform does that an SDK alone can’t:

Deploy and run

You ship AGNT5 code the same way you ship anything else — one command, from local to production. agnt5 deploy packages your agents, workflows, and functions and rolls them out to workers in the runtime. Workers are the processes that actually execute your code; AGNT5 manages their lifecycle, scaling, and recovery, so your application code doesn’t have to think about any of it.

No YAML. No custom orchestrator config. No DSL for “here’s what a production deployment looks like.” If it ran locally with the dev server, the same code runs on the runtime.

Durable execution

Once your code is running, the runtime’s job is to keep it running — through crashes, timeouts, redeploys, and the ordinary chaos of distributed systems. AGNT5 does this by treating every function call, step, and state transition as a checkpointed event. The workflow journal is the single source of truth: if a worker dies mid-execution, another worker picks up where the first left off, replays the journal to the last checkpoint, and keeps going.

This is what durable execution means in practice: you write code as if nothing will fail, and the platform catches the cases where it does. Retries with backoff, crash recovery, exactly-once semantics via idempotency keys — all handled by the runtime, not by your application code.

For the underlying execution model, see the Workflows foundation.

Inspectable by default

The same journal that makes execution durable makes it inspectable. Every workflow invocation is captured as a Run — a full, scrubable record of every function call, retry, checkpoint, tool invocation, and LLM request, with inputs, outputs, timing, and state transitions. Across the fleet, Metrics aggregate throughput, latency percentiles, and error rates, and LLM Usage breaks out cost and model distribution across every call.

None of this requires instrumentation. You don’t call span.start() or configure an OpenTelemetry exporter — the platform captures what the execution engine already sees, because to run your workflow durably it already has to see all of it. Observability as a platform property, not a library.

Why the three pillars are one section

Deploy, execute, observe — these look like three separate concerns, and most platforms treat them as three separate products. AGNT5 collapses them because they’re all views onto the same runtime. The workflow that’s deployed is the workflow that’s executing; the workflow that’s executing is the workflow that’s being recorded; the recording is the substrate for every Run, Metric, and LLM-usage cut.

This is the point of the Run layer. One runtime, three views, zero stitching.

Where to go next

Once your workflows are running reliably and you can see how they behave in production, the next question isn’t “is it working” — it’s “how do I make it better.” That’s the job of the Improve section: Experiments, Datasets, Scorers, and Prompts built on top of the same runtime, so the loop from production observation to measured improvement never leaves the platform.