Get started Datasets

Datasets

Curated input collections for repeatable experiments

A Dataset is a curated collection of inputs the agent should handle correctly — the cases you want to evaluate against. Datasets can be authored by hand, generated, or captured directly from production Runs, which is the common case: a Run that surfaced a novel failure mode becomes a Dataset entry with one action, and future Experiments are measured against it.

You’ll find Datasets under Improve → Datasets in Studio.

Datasets grow as production surfaces new cases. Every Experiment is measured against a Dataset version, so comparisons are stable even as the Dataset itself evolves — new cases don’t silently invalidate old scores.

A deeper guide is in progress. For the full improvement loop, see the Improve overview.