One Binary, Three Roles: The AGNT5 Runtime Topology

A durable execution platform has a natural three-role split. The gateway accepts HTTP traffic and streams events back out. The engine owns the journal and drives run state machines. The coordinator dispatches work to workers and manages their lifecycle. Most systems in this space ship those three as separate services from day one, wire them together with gRPC, and call it done.

We did not. The AGNT5 runtime ships as one binary with a --target flag:

# Local dev and solo production: run everything in one process.
agnt5-runtime --target=all

# Scale out by splitting roles across processes.
agnt5-runtime --target=engine
agnt5-runtime --target=gateway
agnt5-runtime --target=coordinator

Both modes go through the same main.rs, the same tokio runtime, the same set of crates. The only difference is which subsystems are wired up on boot.

Why one binary

The easy argument is operational: one thing to build, one image to push, one health check to wire into your orchestrator. That is real, but it is not the main argument.

The main argument is that the three roles want to share a lot of state in the common deployment. When gateway, engine, and coordinator live in the same process, the gateway can reach the engine’s segment cache through a direct function call rather than a gRPC round trip. The coordinator can hand a dispatch record to the engine’s consumer without serializing anything. The engine’s SSE streams flow back to the gateway’s HTTP handlers over an in-memory channel.

For the single-node deployment — which covers local development, self-hosted starter tiers, and a meaningful fraction of production installs — this kills a lot of gRPC chatter that would otherwise sit on the hot path of every run.

How the split works

Each role is defined as a crate with a stable interface. The gateway crate exposes run_disaggregated(). The coordinator crate exposes the same. The engine’s entry points live under engine_processor and engine_store. In --target=all mode, main.rs instantiates all three crates as in-process subsystems and wires them through Rust trait objects — Backend for gateway-to-engine calls, Dispatcher for coordinator-to-engine calls.

In disaggregated mode, each process instantiates one subsystem and replaces the in-process backend with a gRPC client pointed at the others:

match args.target {
    Target::All => {
        run_engine(true).await  // engine + coordinator in-process
    }
    Target::Engine => {
        run_engine(false).await  // engine only, coordinator dials in
    }
    Target::Gateway => {
        run_gateway().await       // http server that dials engine+coordinator
    }
    Target::Coordinator => {
        run_coordinator().await   // dispatcher that dials engine
    }
}

The key constraint: the in-process and disaggregated paths use the same traits. A gateway handler does not know whether its Backend is a direct EngineServiceImpl or a tonic::Channel client. That uniformity is what makes the split cheap.

When splitting pays off

The “all” topology tops out somewhere around a few thousand concurrent runs on a well-provisioned node, limited mostly by RocksDB write throughput and worker pool size. Past that, the coordinator’s dispatch loop starts to contend with the engine’s append loop for CPU, and tail latency on journal appends climbs. That is the point at which --target=coordinator on its own box starts to matter.

The gateway splits last. HTTP and SSE are not CPU-bound for our workloads — they are IO-bound. We only see real benefit from --target=gateway when a single node cannot absorb the connection count, typically with thousands of concurrent SSE streams.

We deliberately did not split earlier. Premature disaggregation is one of the failure modes of this kind of system — you end up with three services, three config files, three deploy pipelines, and the same throughput you had on one node.

What stays the same across modes

The segment layout on disk does not change. A RocksDB path written by --target=all is read cleanly by --target=engine after a mode switch — same column families, same offsets, same meta. The replication crate’s quorum protocol works identically in both modes; peers just happen to dial localhost when they all live in one process.

The worker protocol does not change either. A Python or TypeScript worker built with the SDK has no idea whether its coordinator is a dedicated process or a thread inside the engine. It registers, polls, gets work, returns results. That contract is what lets operators scale the runtime without asking users to redeploy.

The tradeoff

The cost of this design is in the Backend trait itself. Every cross-role call in the hot path goes through a trait object, even in --target=all mode where it is “really” a local call. The indirection costs a few nanoseconds per dispatch. In exchange we get topology flexibility at no behavioral cost.

We think that is the right trade. A durable runtime is not a latency-shaving exercise — it is a correctness exercise. Keeping the wire contract between roles identical whether they share a process or not means less drift, less special-case code, and fewer bugs that only reproduce at production scale.

Why this matters

Most teams deploying AGNT5 start with --target=all and a single node. Many never move past it. For those who do, the migration is a rollout of a second binary and a config change, not a re-architecture. That continuity — from laptop, to staging, to a fleet — is what the one-binary story buys. The runtime does not ask you to think about topology until you have to.