Quickstart

Build a workflow that handles failures automatically and pauses for human approval

What you’ll build

A support triage workflow that demonstrates AGNT5’s core guarantees—all without writing retry loops, state management, or recovery code.

This support triage agent shows:

Automatic recovery: A flaky API call fails and retries automatically—visible in the timeline
Human-in-the-loop: The workflow pauses for human approval, and that pause survives restarts
Exactly-once: Side effects (like posting a reply) happen exactly once, even after failures

Prerequisites:

AGNT5 CLI installed
Python 3.12+

No API keys required — this template uses mock responses by default.

Time to complete: ~5 minutes

Create your project

agnt5 create --template python/support-triage my-support-agent
cd my-support-agent

This scaffolds a support triage workflow with functions, human-in-the-loop pauses, and simulated side effects.

Deploy to AGNT5

agnt5 auth login
agnt5 deploy

This deploys your workflow to AGNT5 cloud. Wait for the deployment to complete.

Open the Dashboard

After deployment completes, the CLI displays your dashboard URL. Open it in your browser.

You’ll see the Dashboard with your registered components in the left sidebar:

Functions: fetch_customer_info, analyze_ticket, post_reply
Workflows: support_triage

Run your first workflow

Click support_triage in the sidebar

In the Body panel, enter:

{
  "ticket_id": "TCK-1001",
  "subject": "Need refund",
  "body": "I upgraded by mistake and need my money back"
}

Click Run

Watch the Events timeline as each step executes:

fetch_customer_info fails on first attempt (simulated flaky API)
Platform automatically retries — second attempt succeeds
analyze_ticket runs with the customer context
Workflow pauses — waiting for human approval

The workflow is now paused at a human-in-the-loop checkpoint. In the timeline, you’ll see a prompt asking you to approve the draft reply.

Click Send Reply to approve

Watch post_reply execute and the workflow complete — all streaming live in the timeline.

See AGNT5 in action

You just witnessed three core capabilities:

Automatic recovery

Look at the timeline. The fetch_customer_info step shows:

First attempt failed (CRM API timeout)
Platform automatically retried after a short backoff
Second attempt succeeded

You didn’t write any retry logic. The function is decorated with a retry policy, and the platform handles the rest:

@function(retries={"max_attempts": 3, "initial_interval_ms": 500})
async def fetch_customer_info(ctx: FunctionContext, ticket_id: str) -> dict:
    # Fail on first attempt to demonstrate automatic retry
    if ctx.attempt == 0:
        raise ConnectionError("CRM API timeout")

    return CUSTOMER_DB.get(ticket_id)

The ctx.attempt counter lets you see which retry you’re on. In production, you wouldn’t check this — transient failures would naturally succeed on retry.

Durable pauses

The approval step isn’t stored in browser memory or session state — it’s a durable checkpoint in the platform.

This pause could last 5 seconds or 5 days. Close your browser, restart the server, come back later. The approval is still waiting.

This is how real workflows work: a compliance review might take hours, a manager approval might take days. AGNT5 handles this naturally.

Exactly-once side effects

The post_reply function uses an idempotency key derived from the run ID:

@function
async def post_reply(ctx: FunctionContext, ticket_id: str, message: str) -> dict:
    idempotency_key = f"{ctx.run_id}:{ticket_id}"

    # Check if already posted
    if idempotency_key in seen_keys:
        return {"posted": False, "reason": "duplicate_suppressed"}

    # Post the reply...

Even if the platform crashes mid-execution and replays the workflow, the reply posts exactly once. Check the log to verify:

cat .agnt5_demo/replies.log

Zero-config observability

Click any event in the timeline. You get:

Input Data — What was passed to the function
Output Data — What it returned
Logs — Any log messages from your code
Timing — How long each step took

No decorators. No OpenTelemetry setup. No trace backend. This comes free with every function.

Understand the code

The template has a clean structure:

src/support_triage/
├── workflows.py   # @workflow - orchestration with HITL pauses
├── functions.py   # @function - fetch, analyze, post
├── tools.py       # Agent tools and mock data
└── __init__.py

The workflow (workflows.py):

@workflow
async def support_triage(ctx: WorkflowContext, ticket: dict) -> dict:
    # Step 1: Fetch customer info (demonstrates automatic retry)
    customer = await ctx.task(fetch_customer_info, ticket_id=ticket["ticket_id"])

    # Step 2: Analyze ticket with AI agent
    analysis = await ctx.task(analyze_ticket, ticket=ticket, customer=customer)

    # Step 3: Human approval (durable pause)
    decision = await ctx.wait_for_user(
        question=f"Approve this reply?\n\n{analysis['draft_reply']}",
        input_type="selection",
        options=[
            {"id": "approve", "label": "Send Reply"},
            {"id": "reject", "label": "Reject"},
        ],
    )

    if decision == "reject":
        return {"status": "rejected"}

    # Step 4: Post reply (exactly-once side effect)
    result = await ctx.task(post_reply, ticket_id=ticket["ticket_id"], message=analysis["draft_reply"])
    return {"status": "sent", "reply_id": result["reply_id"]}

Key concepts:

ctx.task() calls functions with durability guarantees — results are checkpointed
ctx.wait_for_user() creates a durable pause that survives restarts
Each step is memoized — if the server restarts, completed steps don’t re-execute

Why don’t steps re-run? AGNT5 records inputs and outputs at each checkpoint. On recovery, it rehydrates saved results instead of re-executing — that’s what makes replay deterministic.

The functions (functions.py):

@function(retries={"max_attempts": 3, "initial_interval_ms": 500})
async def fetch_customer_info(ctx: FunctionContext, ticket_id: str) -> dict:
    # Transient failures are automatically retried by the platform
    return await fetch_from_crm(ticket_id)

@function
async def post_reply(ctx: FunctionContext, ticket_id: str, message: str) -> dict:
    # Side effect with exactly-once semantics via idempotency key
    return await post_to_ticket_system(ticket_id, message, idempotency_key=ctx.run_id)

What AGNT5 handled for you:

Checkpointing and recovery (no database schema, no serialization code)

Automatic retry with backoff (no retry loops, no exponential backoff math)

Durable HITL pause (no external queue, no webhook to resume)

Exactly-once side effects (no idempotency table, no distributed locks)

Real-time observability (no OpenTelemetry setup, no trace backend)

This would be 500+ lines with other frameworks. With AGNT5, it’s ~50.

Deploy to cloud (optional)

Ready to ship? Authenticate and deploy:

agnt5 auth login
agnt5 deploy

Your workflow is now running in the cloud. See Deploy to Production for environment configuration and production setup.

Next steps

You’ve seen AGNT5’s core guarantees in action: automatic recovery, durable human-in-the-loop, and exactly-once side effects. Now explore:

Concepts: Functions — How durability and checkpointing work
Concepts: Workflows — Orchestration and recovery
Deploy to Production — Staging and production deployment

Using a real LLM

The template uses mock responses by default. To use a real LLM, set environment variables for your deployment:

agnt5 env set AGNT5_USE_REAL_LLM=1
agnt5 env set OPENAI_API_KEY=sk-your-key
agnt5 deploy

The analyze_ticket function will now call GPT-4o-mini to generate replies.

Quickstart

Build a workflow that handles failures automatically and pauses for human approval

What you’ll build

A support triage workflow that demonstrates AGNT5’s core guarantees—all without writing retry loops, state management, or recovery code.

This support triage agent shows:

Automatic recovery: A flaky API call fails and retries automatically—visible in the timeline
Human-in-the-loop: The workflow pauses for human approval, and that pause survives restarts
Exactly-once: Side effects (like posting a reply) happen exactly once, even after failures

Prerequisites:

AGNT5 CLI installed
Python 3.12+

No API keys required — this template uses mock responses by default.

Time to complete: ~5 minutes

Create your project

agnt5 create --template python/support-triage my-support-agent
cd my-support-agent

This scaffolds a support triage workflow with functions, human-in-the-loop pauses, and simulated side effects.

Deploy to AGNT5

agnt5 auth login
agnt5 deploy

This deploys your workflow to AGNT5 cloud. Wait for the deployment to complete.

Open the Dashboard

After deployment completes, the CLI displays your dashboard URL. Open it in your browser.

You’ll see the Dashboard with your registered components in the left sidebar:

Functions: fetch_customer_info, analyze_ticket, post_reply
Workflows: support_triage

Run your first workflow

Click support_triage in the sidebar

In the Body panel, enter:

{
  "ticket_id": "TCK-1001",
  "subject": "Need refund",
  "body": "I upgraded by mistake and need my money back"
}

Click Run

Watch the Events timeline as each step executes:

fetch_customer_info fails on first attempt (simulated flaky API)
Platform automatically retries — second attempt succeeds
analyze_ticket runs with the customer context
Workflow pauses — waiting for human approval

The workflow is now paused at a human-in-the-loop checkpoint. In the timeline, you’ll see a prompt asking you to approve the draft reply.

Click Send Reply to approve

Watch post_reply execute and the workflow complete — all streaming live in the timeline.

See AGNT5 in action

You just witnessed three core capabilities:

Automatic recovery

Look at the timeline. The fetch_customer_info step shows:

First attempt failed (CRM API timeout)
Platform automatically retried after a short backoff
Second attempt succeeded

You didn’t write any retry logic. The function is decorated with a retry policy, and the platform handles the rest:

@function(retries={"max_attempts": 3, "initial_interval_ms": 500})
async def fetch_customer_info(ctx: FunctionContext, ticket_id: str) -> dict:
    # Fail on first attempt to demonstrate automatic retry
    if ctx.attempt == 0:
        raise ConnectionError("CRM API timeout")

    return CUSTOMER_DB.get(ticket_id)

The ctx.attempt counter lets you see which retry you’re on. In production, you wouldn’t check this — transient failures would naturally succeed on retry.

Durable pauses

The approval step isn’t stored in browser memory or session state — it’s a durable checkpoint in the platform.

This pause could last 5 seconds or 5 days. Close your browser, restart the server, come back later. The approval is still waiting.

This is how real workflows work: a compliance review might take hours, a manager approval might take days. AGNT5 handles this naturally.

Exactly-once side effects

The post_reply function uses an idempotency key derived from the run ID:

@function
async def post_reply(ctx: FunctionContext, ticket_id: str, message: str) -> dict:
    idempotency_key = f"{ctx.run_id}:{ticket_id}"

    # Check if already posted
    if idempotency_key in seen_keys:
        return {"posted": False, "reason": "duplicate_suppressed"}

    # Post the reply...

Even if the platform crashes mid-execution and replays the workflow, the reply posts exactly once. Check the log to verify:

cat .agnt5_demo/replies.log

Zero-config observability

Click any event in the timeline. You get:

Input Data — What was passed to the function
Output Data — What it returned
Logs — Any log messages from your code
Timing — How long each step took

No decorators. No OpenTelemetry setup. No trace backend. This comes free with every function.

Understand the code

The template has a clean structure:

src/support_triage/
├── workflows.py   # @workflow - orchestration with HITL pauses
├── functions.py   # @function - fetch, analyze, post
├── tools.py       # Agent tools and mock data
└── __init__.py

The workflow (workflows.py):

@workflow
async def support_triage(ctx: WorkflowContext, ticket: dict) -> dict:
    # Step 1: Fetch customer info (demonstrates automatic retry)
    customer = await ctx.task(fetch_customer_info, ticket_id=ticket["ticket_id"])

    # Step 2: Analyze ticket with AI agent
    analysis = await ctx.task(analyze_ticket, ticket=ticket, customer=customer)

    # Step 3: Human approval (durable pause)
    decision = await ctx.wait_for_user(
        question=f"Approve this reply?\n\n{analysis['draft_reply']}",
        input_type="selection",
        options=[
            {"id": "approve", "label": "Send Reply"},
            {"id": "reject", "label": "Reject"},
        ],
    )

    if decision == "reject":
        return {"status": "rejected"}

    # Step 4: Post reply (exactly-once side effect)
    result = await ctx.task(post_reply, ticket_id=ticket["ticket_id"], message=analysis["draft_reply"])
    return {"status": "sent", "reply_id": result["reply_id"]}

Key concepts:

ctx.task() calls functions with durability guarantees — results are checkpointed
ctx.wait_for_user() creates a durable pause that survives restarts
Each step is memoized — if the server restarts, completed steps don’t re-execute

Why don’t steps re-run? AGNT5 records inputs and outputs at each checkpoint. On recovery, it rehydrates saved results instead of re-executing — that’s what makes replay deterministic.

The functions (functions.py):

@function(retries={"max_attempts": 3, "initial_interval_ms": 500})
async def fetch_customer_info(ctx: FunctionContext, ticket_id: str) -> dict:
    # Transient failures are automatically retried by the platform
    return await fetch_from_crm(ticket_id)

@function
async def post_reply(ctx: FunctionContext, ticket_id: str, message: str) -> dict:
    # Side effect with exactly-once semantics via idempotency key
    return await post_to_ticket_system(ticket_id, message, idempotency_key=ctx.run_id)

What AGNT5 handled for you:

Checkpointing and recovery (no database schema, no serialization code)

Automatic retry with backoff (no retry loops, no exponential backoff math)

Durable HITL pause (no external queue, no webhook to resume)

Exactly-once side effects (no idempotency table, no distributed locks)

Real-time observability (no OpenTelemetry setup, no trace backend)

This would be 500+ lines with other frameworks. With AGNT5, it’s ~50.

Deploy to cloud (optional)

Ready to ship? Authenticate and deploy:

agnt5 auth login
agnt5 deploy

Your workflow is now running in the cloud. See Deploy to Production for environment configuration and production setup.

Next steps

You’ve seen AGNT5’s core guarantees in action: automatic recovery, durable human-in-the-loop, and exactly-once side effects. Now explore:

Concepts: Functions — How durability and checkpointing work
Concepts: Workflows — Orchestration and recovery
Deploy to Production — Staging and production deployment

Using a real LLM

The template uses mock responses by default. To use a real LLM, set environment variables for your deployment:

agnt5 env set AGNT5_USE_REAL_LLM=1
agnt5 env set OPENAI_API_KEY=sk-your-key
agnt5 deploy

The analyze_ticket function will now call GPT-4o-mini to generate replies.

Quickstart

What you’ll build

Create your project

Deploy to AGNT5

Open the Dashboard

Run your first workflow

See AGNT5 in action

Automatic recovery

Durable pauses

Exactly-once side effects

Zero-config observability

Understand the code

Deploy to cloud (optional)

Next steps

On this page

Quickstart

What you’ll build

Create your project

Deploy to AGNT5

Open the Dashboard

Run your first workflow

See AGNT5 in action

Automatic recovery

Durable pauses

Exactly-once side effects

Zero-config observability

Understand the code

Deploy to cloud (optional)

Next steps