Quickstart
Build a workflow that handles failures automatically and pauses for human approval
What you’ll build
A support triage workflow that demonstrates AGNT5’s core guarantees—all without writing retry loops, state management, or recovery code.
This support triage agent shows:
- Automatic recovery: A flaky API call fails and retries automatically—visible in the timeline
- Human-in-the-loop: The workflow pauses for human approval, and that pause survives restarts
- Exactly-once: Side effects (like posting a reply) happen exactly once, even after failures
Prerequisites:
- AGNT5 CLI installed
- Python 3.12+
No API keys required — this template uses mock responses by default.
Time to complete: ~5 minutes
Create your project
agnt5 create --template python/support-triage my-support-agent
cd my-support-agentThis scaffolds a support triage workflow with functions, human-in-the-loop pauses, and simulated side effects.
Deploy to AGNT5
agnt5 auth login
agnt5 deployThis deploys your workflow to AGNT5 cloud. Wait for the deployment to complete.
Open the Dashboard
After deployment completes, the CLI displays your dashboard URL. Open it in your browser.
You’ll see the Dashboard with your registered components in the left sidebar:
- Functions:
fetch_customer_info,analyze_ticket,post_reply - Workflows:
support_triage
Run your first workflow
- Click
support_triagein the sidebar - In the Body panel, enter:
{ "ticket_id": "TCK-1001", "subject": "Need refund", "body": "I upgraded by mistake and need my money back" } - Click Run
Watch the Events timeline as each step executes:
- fetch_customer_info fails on first attempt (simulated flaky API)
- Platform automatically retries — second attempt succeeds
- analyze_ticket runs with the customer context
- Workflow pauses — waiting for human approval
The workflow is now paused at a human-in-the-loop checkpoint. In the timeline, you’ll see a prompt asking you to approve the draft reply.
- Click Send Reply to approve
Watch post_reply execute and the workflow complete — all streaming live in the timeline.
See AGNT5 in action
You just witnessed three core capabilities:
Automatic recovery
Look at the timeline. The fetch_customer_info step shows:
- First attempt failed (CRM API timeout)
- Platform automatically retried after a short backoff
- Second attempt succeeded
You didn’t write any retry logic. The function is decorated with a retry policy, and the platform handles the rest:
@function(retries={"max_attempts": 3, "initial_interval_ms": 500})
async def fetch_customer_info(ctx: FunctionContext, ticket_id: str) -> dict:
# Fail on first attempt to demonstrate automatic retry
if ctx.attempt == 0:
raise ConnectionError("CRM API timeout")
return CUSTOMER_DB.get(ticket_id)The ctx.attempt counter lets you see which retry you’re on. In production, you wouldn’t check this — transient failures would naturally succeed on retry.
Durable pauses
The approval step isn’t stored in browser memory or session state — it’s a durable checkpoint in the platform.
This pause could last 5 seconds or 5 days. Close your browser, restart the server, come back later. The approval is still waiting.
This is how real workflows work: a compliance review might take hours, a manager approval might take days. AGNT5 handles this naturally.
Exactly-once side effects
The post_reply function uses an idempotency key derived from the run ID:
@function
async def post_reply(ctx: FunctionContext, ticket_id: str, message: str) -> dict:
idempotency_key = f"{ctx.run_id}:{ticket_id}"
# Check if already posted
if idempotency_key in seen_keys:
return {"posted": False, "reason": "duplicate_suppressed"}
# Post the reply...Even if the platform crashes mid-execution and replays the workflow, the reply posts exactly once. Check the log to verify:
cat .agnt5_demo/replies.logZero-config observability
Click any event in the timeline. You get:
- Input Data — What was passed to the function
- Output Data — What it returned
- Logs — Any log messages from your code
- Timing — How long each step took
No decorators. No OpenTelemetry setup. No trace backend. This comes free with every function.
Understand the code
The template has a clean structure:
src/support_triage/
├── workflows.py # @workflow - orchestration with HITL pauses
├── functions.py # @function - fetch, analyze, post
├── tools.py # Agent tools and mock data
└── __init__.pyThe workflow (workflows.py):
@workflow
async def support_triage(ctx: WorkflowContext, ticket: dict) -> dict:
# Step 1: Fetch customer info (demonstrates automatic retry)
customer = await ctx.task(fetch_customer_info, ticket_id=ticket["ticket_id"])
# Step 2: Analyze ticket with AI agent
analysis = await ctx.task(analyze_ticket, ticket=ticket, customer=customer)
# Step 3: Human approval (durable pause)
decision = await ctx.wait_for_user(
question=f"Approve this reply?\n\n{analysis['draft_reply']}",
input_type="selection",
options=[
{"id": "approve", "label": "Send Reply"},
{"id": "reject", "label": "Reject"},
],
)
if decision == "reject":
return {"status": "rejected"}
# Step 4: Post reply (exactly-once side effect)
result = await ctx.task(post_reply, ticket_id=ticket["ticket_id"], message=analysis["draft_reply"])
return {"status": "sent", "reply_id": result["reply_id"]}Key concepts:
ctx.task()calls functions with durability guarantees — results are checkpointedctx.wait_for_user()creates a durable pause that survives restarts- Each step is memoized — if the server restarts, completed steps don’t re-execute
Why don’t steps re-run? AGNT5 records inputs and outputs at each checkpoint. On recovery, it rehydrates saved results instead of re-executing — that’s what makes replay deterministic.
The functions (functions.py):
@function(retries={"max_attempts": 3, "initial_interval_ms": 500})
async def fetch_customer_info(ctx: FunctionContext, ticket_id: str) -> dict:
# Transient failures are automatically retried by the platform
return await fetch_from_crm(ticket_id)
@function
async def post_reply(ctx: FunctionContext, ticket_id: str, message: str) -> dict:
# Side effect with exactly-once semantics via idempotency key
return await post_to_ticket_system(ticket_id, message, idempotency_key=ctx.run_id)What AGNT5 handled for you:
- Checkpointing and recovery (no database schema, no serialization code)
- Automatic retry with backoff (no retry loops, no exponential backoff math)
- Durable HITL pause (no external queue, no webhook to resume)
- Exactly-once side effects (no idempotency table, no distributed locks)
- Real-time observability (no OpenTelemetry setup, no trace backend)
This would be 500+ lines with other frameworks. With AGNT5, it’s ~50.
Deploy to cloud (optional)
Ready to ship? Authenticate and deploy:
agnt5 auth login
agnt5 deployYour workflow is now running in the cloud. See Deploy to Production for environment configuration and production setup.
Next steps
You’ve seen AGNT5’s core guarantees in action: automatic recovery, durable human-in-the-loop, and exactly-once side effects. Now explore:
- Concepts: Functions — How durability and checkpointing work
- Concepts: Workflows — Orchestration and recovery
- Deploy to Production — Staging and production deployment
Using a real LLM
The template uses mock responses by default. To use a real LLM, set environment variables for your deployment:
agnt5 env set AGNT5_USE_REAL_LLM=1
agnt5 env set OPENAI_API_KEY=sk-your-key
agnt5 deployThe analyze_ticket function will now call GPT-4o-mini to generate replies.