Skip to content
Docs
Cookbooks Build a RAG chatbot with memory
May 13, 2026 RAGMemoryTenancy

Build a RAG chatbot with memory

Retrieve knowledge, preserve user context, isolate tenants, and trace each answer back to evidence.

This cookbook builds a RAG chatbot that behaves like a production workflow: tenant-aware retrieval, durable memory updates, traceable evidence, and recoverable answer generation.

Scenario

A SaaS user asks a product question. The chatbot retrieves relevant docs, combines them with user memory, generates an answer, and records useful context for the next turn.

What you build

  • Tenant-scoped retrieval.
  • Session memory lookup.
  • Evidence-grounded answer generation.
  • A memory update step.
  • Trace evidence for every answer.

Workflow shape

@workflow
async def answer_chat_turn(ctx: WorkflowContext, request: ChatRequest) -> ChatAnswer:
    memory = await ctx.step(load_session_memory, request.session_id)
    passages = await ctx.step(retrieve_docs, request.tenant_id, request.message)
    answer = await ctx.step(generate_grounded_answer, request.message, memory, passages)
    await ctx.step(update_memory_once, request.session_id, request.message, answer)
    return answer

The retrieval step must receive the tenant ID. Do not rely on global vector indexes without tenant filters.

Evidence model

Return citations as structured data.

class ChatAnswer(BaseModel):
    answer: str
    citations: list[DocumentCitation]
    memory_updates: list[str]

This lets the UI show citations and lets the trace explain the answer.

Production checks

  • Direct HTTP calls include X-TENANT-ID and X-DEPLOYMENT-ID.
  • Retrieval filters by tenant.
  • The answer stores citations.
  • Memory updates are idempotent per turn.
  • A bad answer can be replayed with the same retrieved passages.

Next steps

© 2026 AGNT5
llms.txt