Durable runtime for AI agents & workflows

Crashes resume. Failures replay. Fixes ship proven.

Every tool call, branch, and pause in every run is written to one durable journal. That journal is the recovery layer, the trace, and the eval harness. What you debug is exactly what ran.

Start building View Docs

The production gap

Building agentic workflows is easy. Keeping them working isn't.

Your first agent ran in a notebook. But in production, agents and workflows make decisions, call tools, wait on humans, and run for hours. When they fail, your stack has no answer, because it treats an eight-hour agentic workflow like a 200ms API request.

Long-running work loses state when it fails

Agentic workflows aren't request-response functions. They branch, loop, call external services, and pause for human input, sometimes for hours. When something fails mid-run, the execution context and completed work are gone. You can't resume from step forty-one. You start over from step one.

The debug-fix-verify loop is broken

Something went wrong, but your tools don't agree on what. Your traces capture inputs and outputs but miss what happened in between. So you piece it together across three dashboards, make a fix, push a deploy, wait, and find out it didn't help. Then you do it again.

The agent runtime

One journal. Not four tools.

Recovery, replay, observability, and evals all come from the same journal that runs your agents and workflows. Ship to production, replay when it breaks, and prove fixes against the runs that failed.

Build

Write agents that survive production

Write agents and workflows in Python or TypeScript. Add a decorator and every completed step is checkpointed. A crash picks up at the step where it stopped, while the runtime handles state, recovery, and coordination so you focus on what your agent actually does.

Write your first agent →

Durable SDKs for Python and TypeScript

Add @durable.function and your function gains automatic checkpointing, retries, and crash recovery. The learning surface is just two APIs and a decorator, but it changes what your code can survive.

Human-in-the-loop that actually works

Build workflows that pause for human approval and resume where they left off. The runtime suspends the run's full state, persists it, and picks up from where it stopped when the decision comes back.

Run

Ship fast, recover from anything

Your agents and workflows run on a Rust runtime that records every step and recovers from crashes automatically. Deploy from your laptop to production with one command.

Deploy your first agent →

Deploy anywhere — from a laptop to a cluster

The entire runtime ships as a single binary. Run it on your laptop during development, deploy to a VPS for production, or scale out behind a Kubernetes operator when you outgrow a single node.

Crashes don't lose work

When a run crashes partway, the runtime picks up where it left off. Every step is recorded as it happens, so completed work isn't lost and doesn't need to be re-executed.

Improve

See what happened. Fix it. Prove it works.

Every run is recorded automatically. When something goes wrong you have the full picture, and the runs you've already served become the eval set that proves the fix.

Explore replay & evals →

Replay any run, locally or in Studio

Pull any production run to your laptop with agnt5 replay and step through every decision, tool call, and state change exactly as it happened. Find the failure in minutes instead of hours.

Fix prompts and prove it works — before it ships

Change a prompt version and set it active. Future runs pick up the new version without a redeploy. Re-run the production runs that failed against the updated prompt and score the results with built-in evaluators.

Your first workflow,
deployed before lunch.

Start free. Add a decorator and ship. Every run is journaled from day one, so when the first incident hits, you'll be glad it was recording.

Start building Read the docs →

agent.py

from agnt5 import durable
 
@durable
def my_agent(query: str) -> str:
    # your agent logic
    return answer