May 13, 2026 RAGMemoryTenancy
Build a RAG chatbot with memory
Retrieve knowledge, preserve user context, isolate tenants, and trace each answer back to evidence.
This cookbook builds a RAG chatbot that behaves like a production workflow: tenant-aware retrieval, durable memory updates, traceable evidence, and recoverable answer generation.
Scenario
A SaaS user asks a product question. The chatbot retrieves relevant docs, combines them with user memory, generates an answer, and records useful context for the next turn.
What you build
- Tenant-scoped retrieval.
- Session memory lookup.
- Evidence-grounded answer generation.
- A memory update step.
- Trace evidence for every answer.
Workflow shape
@workflow
async def answer_chat_turn(ctx: WorkflowContext, request: ChatRequest) -> ChatAnswer:
memory = await ctx.step(load_session_memory, request.session_id)
passages = await ctx.step(retrieve_docs, request.tenant_id, request.message)
answer = await ctx.step(generate_grounded_answer, request.message, memory, passages)
await ctx.step(update_memory_once, request.session_id, request.message, answer)
return answerThe retrieval step must receive the tenant ID. Do not rely on global vector indexes without tenant filters.
Evidence model
Return citations as structured data.
class ChatAnswer(BaseModel):
answer: str
citations: list[DocumentCitation]
memory_updates: list[str]This lets the UI show citations and lets the trace explain the answer.
Production checks
- Direct HTTP calls include
X-TENANT-IDandX-DEPLOYMENT-ID. - Retrieval filters by tenant.
- The answer stores citations.
- Memory updates are idempotent per turn.
- A bad answer can be replayed with the same retrieved passages.