Improve with AGNT5

Use production run data to create datasets, score outputs, compare candidates, and gate changes.

Improve covers the eval loop for deployed AGNT5 components. You use production runs to curate datasets, score outputs, and compare candidate prompts, models, or agent behavior before rolling changes forward.

What you’ll do

Datasets: collect representative inputs from production runs or authored examples.
Scorers: define how AGNT5 decides whether an output meets the target behavior.
Experiments: compare a candidate prompt, model, or agent behavior against a baseline.
Prompts: version prompt changes and compare them before they affect production traffic.

Outcome: you can decide whether a change improves agent behavior using the same platform that runs the agent.

Next steps

Datasets: curate test cases from production runs and publish immutable versions.
Scorers: pick built-in checks or write custom scorer code.
Experiments: run, compare, and gate CI on eval results.
Run with AGNT5: capture the production run data that feeds improvement work.