Improve

Capture feedback, build datasets, run evals, compare models, and gate CI.

Close the loop with the runs you ship — collect feedback, turn it into datasets, run replay-based evals, compare model versions, and gate deploys in CI.

This section is being built out.