An Experiment is a single run of Agent × Dataset × Scorer, producing a comparable result. It’s the unit of measurement in the Improve loop: two Experiments on the same Dataset and Scorer, with different Prompts, tell you whether a change actually made things better — before the change reaches production.
You’ll find Experiments under Improve → Experiments in Studio.
Experiments execute against the same durable runtime as production Runs. Same code path, same observability, same guarantees. An Experiment’s results link back to the individual Runs it produced, so you can drill from an aggregate score into the specific cases where the new version regressed or improved.
A deeper guide is in progress. For the full improvement loop, see the Improve overview.