Durable Entities and the Single-Writer Guarantee

Most state in an AI application is awkward to hold. Conversation memory wants to outlive any one request. A user’s balance wants one writer at a time, even when a dozen workflows are touching it. A pending approval wants to live somewhere between “in a database row” and “in a running process.” AGNT5 entities are the primitive for this middle ground.

An entity is a typed object keyed by a string. Each key has at most one active writer. Method calls on the same key serialize. State is persisted automatically, without the author writing a save call.

Here is a real one:

from pydantic import BaseModel
from agnt5 import Entity
from agnt5.entity import query


class AccountState(BaseModel):
    balance_cents: int = 0
    status: str = "active"
    transactions: list[dict] = []


class Account(Entity[AccountState]):
    async def deposit(self, amount_cents: int, memo: str) -> int:
        if self.state.status != "active":
            raise ValueError(f"account is {self.state.status}")
        self.state.balance_cents += amount_cents
        self.state.transactions.append(
            {"type": "deposit", "amount": amount_cents, "memo": memo}
        )
        return self.state.balance_cents

    async def withdraw(self, amount_cents: int, memo: str) -> int:
        if amount_cents > self.state.balance_cents:
            raise ValueError("insufficient funds")
        self.state.balance_cents -= amount_cents
        self.state.transactions.append(
            {"type": "withdraw", "amount": amount_cents, "memo": memo}
        )
        return self.state.balance_cents

    @query
    def get_balance(self) -> int:
        return self.state.balance_cents

Two write methods, one query method. The author mutates self.state directly — no explicit save, no explicit lock. The runtime handles both.

What the runtime is doing

When a caller invokes Account("user-423").deposit(5000, "payroll"), the coordinator routes the call to whatever worker currently owns the entity key Account/user-423. Ownership is tracked as a lease inside the coordinator’s routing table. If no worker currently owns the key, the coordinator picks one and grants it the lease.

Concurrent calls to the same key queue up behind the lease holder. A second caller issuing withdraw(2000, ...) at the same time does not race — it waits until deposit finishes and the state has been persisted. The queue is per-key; Account("user-424") proceeds in parallel, unaffected.

Before the method body runs, the SDK rehydrates self.state from the last durable snapshot. After the body returns, the SDK hashes the state and — if it changed — serializes it, writes it to the journal, and acknowledges to the caller. Read-only methods (marked with @query) skip the post-hash because the SDK can detect they did not mutate.

The single-writer guarantee falls out of the lease. There is, at any given instant, exactly one process that believes it holds the lease for a given entity key. That process is the only one executing method bodies. When it dies, the lease expires, the coordinator grants a new lease to another worker, and the new worker rehydrates state from the journal — including the most recent successful mutation.

The replay contract for entity methods

Entity methods are not workflows. They do not have a ctx.step() mechanism. They are expected to be short, focused state transitions. The runtime’s durability model for entities is different: each method invocation is itself the journal entry. If deposit(5000, ...) succeeds and persists a new balance, that success is durable. If it crashes halfway through, the journal has no completion record, the caller sees a retriable error, and the next attempt runs against the pre-crash state.

This is why entity method bodies should avoid doing long-running or effectful work directly. If you need to call an LLM or hit a slow external API as part of mutating an entity, do it from a workflow that reads the entity first and writes the entity after. The workflow gets step-level checkpointing; the entity stays a fast, atomic state transition.

Why single-writer instead of optimistic concurrency

Optimistic concurrency — read a version, compute a new state, CAS it back — is the default move for a lot of durable state systems. It works, and it scales horizontally.

It also pushes retry logic out to every caller. When two callers collide, one has to lose, observe the loss, refetch, recompute, and retry. Writing that retry loop correctly is not hard, but doing it in every call site, in multiple languages, is the kind of tax we try not to charge.

Single-writer with a lease keeps the concurrency logic in one place — the coordinator’s routing table. Callers write method calls. The runtime serializes them. When an entity becomes hot, the queue fills; when it cools off, the queue drains. There is no application-level retry to write.

The tradeoff is that a single entity key does not scale past one worker’s throughput. If you find yourself wanting Account("global") to handle millions of deposits per second, you are modeling wrong. Split the key. Shard the entity. That is a design conversation, not a runtime feature.

What this enables

Conversation memory as ConversationMemory("thread-42"). Rate limits as RateLimiter("api_key/xyz"). Feature flags as FeatureFlag("checkout_v2"). Running tallies, drafts, saved carts, session state — anywhere you would otherwise reach for a Redis hash and an optimistic counter, the entity gives you the same shape with durability and ordering built in.

The payoff of single-writer semantics is that your state-mutation code looks like plain Python. No locks, no CAS, no retry loops, no “what if two of these fire at once” annotations in the comments. The runtime enforces the invariant once, at the edge, and everything inside runs in peace.

Durable Entities and the Single-Writer Guarantee

What the runtime is doing

The replay contract for entity methods

Why single-writer instead of optimistic concurrency

What this enables

Tags

On this page

Share this article