Build a RAG chatbot with memory

This cookbook builds a RAG chatbot that behaves like a production workflow: tenant-aware retrieval, durable memory updates, traceable evidence, and recoverable answer generation.

Scenario

A SaaS user asks a product question. The chatbot retrieves relevant docs, combines them with user memory, generates an answer, and records useful context for the next turn.

What you build

Tenant-scoped retrieval.
Session memory lookup.
Evidence-grounded answer generation.
A memory update step.
Trace evidence for every answer.

Workflow shape

@workflow
async def answer_chat_turn(ctx: WorkflowContext, request: ChatRequest) -> ChatAnswer:
    memory = await ctx.step(load_session_memory, request.session_id)
    passages = await ctx.step(retrieve_docs, request.tenant_id, request.message)
    answer = await ctx.step(generate_grounded_answer, request.message, memory, passages)
    await ctx.step(update_memory_once, request.session_id, request.message, answer)
    return answer

The retrieval step must receive the tenant ID. Do not rely on global vector indexes without tenant filters.

Evidence model

Return citations as structured data.

class ChatAnswer(BaseModel):
    answer: str
    citations: list[DocumentCitation]
    memory_updates: list[str]

This lets the UI show citations and lets the trace explain the answer.

Production checks

Direct HTTP calls include X-TENANT-ID and X-DEPLOYMENT-ID.
Retrieval filters by tenant.
The answer stores citations.
Memory updates are idempotent per turn.
A bad answer can be replayed with the same retrieved passages.

Scenario

What you build

Workflow shape

Evidence model

Production checks

Next steps