> For the complete documentation index, see [llms.txt](/llms.txt).
> A full single-fetch corpus is available at [llms-full.txt](/llms-full.txt).
---
title: Improve
description: Close the loop — add an eval, fix the failure, see the diff.
page_type: tutorial
audience: both
last_verified: 2026-05-13
---

This is stage **5 of 5** of [the AGNT5 loop](/docs/get-started/loop.md) — the part that makes the loop a loop. You already see runs in [Observe](/docs/get-started/observe.md); this stage turns observation into action.

**The flow:**

1. Pick a bad run from Studio (a regression, a model that hallucinated, a tool that timed out).
2. Capture its input into a dataset.
3. Write an eval — a function that grades a run's output against expected behavior.
4. Make a change — prompt, model, retry policy, or code.
5. Replay the dataset against the new version. Read the diff in Studio.
6. Gate the deploy on the eval if you want it enforced in CI.

This is how `gpt-5-mini → claude` swaps stop being scary and become measurable. Deeper material on datasets, eval functions, and CI gating is being filled in.
