Build a durable research agent with approval and recovery
Checkpoint search, extraction, artifacts, and final human approval across a long-running report workflow.
Research agents are useful when they survive real work: slow searches, document downloads, extraction failures, intermediate artifacts, and human approval before the final report is sent.
Scenario
A research agent investigates a vendor, gathers sources, extracts notes, drafts a report, waits for approval, and then publishes the report to a workspace.
What you build
- A multi-step research workflow.
- Checkpoints after search, fetch, extraction, synthesis, and approval.
- Artifact records for downloaded files and notes.
- A recovery path after a failed source fetch.
- Human approval before final publication.
Workflow shape
The workflow is long-running, but each unit of work is small.
@workflow
async def vendor_research(ctx: WorkflowContext, vendor: str) -> ResearchReport:
plan = await ctx.step(plan_research, vendor)
sources = await ctx.step(search_sources, plan)
documents = await ctx.step(fetch_documents, sources)
notes = await ctx.step(extract_notes, documents)
draft = await ctx.step(write_report, vendor, notes)
decision = await ctx.wait_for_signal(
"report_approval",
timeout="5d",
metadata={"vendor": vendor, "draft_artifact_id": draft.artifact_id},
)
if decision.status != "approved":
return ResearchReport(status="needs_changes", draft_id=draft.artifact_id)
published = await ctx.step(publish_report_once, draft.artifact_id)
return ResearchReport(status="published", url=published.url)If the worker stops after fetching documents, replay resumes from the journaled documents and continues at extraction.
Artifact checkpoints
Store artifact references in the journal instead of large blobs.
class ResearchArtifact(BaseModel):
artifact_id: str
kind: Literal["source", "notes", "draft", "report"]
uri: str
checksum: strThe trace should let a reviewer open the source list, extracted notes, and draft without rerunning the agent.
Recovery drill
Before shipping, force one source download to fail.
agnt5 runs replay --run-id run_01JRESEARCH --local
agnt5 runs resume run_01JRESEARCHThe recovered run should not repeat successful downloads, and the final report should include a trace back to the notes and sources used.
Production checks
- Every long external call is inside a step.
- Artifacts have stable IDs and checksums.
- A worker restart during approval does not lose the draft.
- The publish step is idempotent.
- Reviewers can inspect sources before approving.