> For the complete documentation index, see [llms.txt](/llms.txt).
> A full single-fetch corpus is available at [llms-full.txt](/llms-full.txt).
---
title: Enable online evals
description: Sample production runs, score them asynchronously, and alert when pass rate drops below a threshold.
last_verified: 2026-06-07
---

Online evals let you measure production behavior without blocking user traffic. You attach a published **scorer**, a rule that grades a run or trace, to a **deployment**, a running AGNT5 worker, then AGNT5 samples completed runs and scores them in the background. After setup, Studio shows live score aggregates, sample decisions, alerts, and recent scores for that deployment.

## Prerequisites

- A deployment that is receiving production runs.
- At least one enabled scorer with a published version.
- A user or service token with developer access to the project.

## Set up online evals in Studio

1. Open the deployment in Studio.
2. Select the **Quality** tab.
3. Choose a **Scorer**.
4. Set **Sample %** for ordinary runs.
5. Set **Slow-run %** and **Slow-run threshold ms** when slow runs should be sampled more often.
6. Set **Pass-rate floor** and **Min count** for the optional alert.
7. Select **Preview** to estimate observed runs, selected runs, scorer jobs, and alert status.
8. Select **Enable** to create the online eval policy and alert.

Use the policy table to disable or enable a policy, disable or enable its alert, or select edit to create a new policy version with updated sampling and scorer settings.

## Preview with the API

Previewing does not create a policy. It evaluates the proposed sampling config against recent completed runs and returns selected/skipped counts, reason buckets, scorer job estimates, and optional alert status.

```bash
curl -X POST "https://api.agnt5.com/api/v1/projects/<project-id>/eval/online/preview" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "lookback_seconds": 86400,
    "max_runs": 1000,
    "policy": {
      "mode": "async_online",
      "binding_scope": "deployment",
      "deployment_id": "<deployment-id>",
      "sampling_config": {
        "type": "uniform",
        "rate": 0.02,
        "boost": [
          {
            "field": "duration_ms",
            "op": "gte",
            "value": 30000,
            "rate": 0.10
          }
        ]
      },
      "scorers": [
        {
          "scorer_id": "<scorer-id>",
          "scorer_version_id": "<scorer-version-id>",
          "scope": "run",
          "ordinal": 1,
          "required": true,
          "threshold": 0.9,
          "weight": 1
        }
      ]
    },
    "alert": {
      "name": "Production quality drop",
      "severity": "warning",
      "deployment_id": "<deployment-id>",
      "scorer_id": "<scorer-id>",
      "scorer_version_id": "<scorer-version-id>",
      "window_seconds": 1800,
      "metric": "pass_rate",
      "operator": "lt",
      "threshold": 0.9,
      "min_count": 50,
      "action_type": "notify"
    }
  }'
```

The preview response includes job volume, not a price estimate. Scorer cost depends on model/provider configuration and custom scorer runtime.

## Enable with the API

Create the policy first, then create an alert linked to the returned policy ID.

```bash
curl -X POST "https://api.agnt5.com/api/v1/projects/<project-id>/eval/policies" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "mode": "async_online",
    "binding_scope": "deployment",
    "deployment_id": "<deployment-id>",
    "sampling_config": {
      "type": "uniform",
      "rate": 0.02,
      "boost": [
        {
          "field": "duration_ms",
          "op": "gte",
          "value": 30000,
          "rate": 0.10
        }
      ]
    },
    "scorers": [
      {
        "scorer_id": "<scorer-id>",
        "scorer_version_id": "<scorer-version-id>",
        "scope": "run",
        "ordinal": 1,
        "required": true,
        "threshold": 0.9,
        "weight": 1
      }
    ]
  }'
```

```bash
curl -X POST "https://api.agnt5.com/api/v1/projects/<project-id>/eval/online/alerts" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Production quality drop",
    "severity": "warning",
    "evaluation_policy_id": "<policy-id>",
    "deployment_id": "<deployment-id>",
    "scorer_id": "<scorer-id>",
    "scorer_version_id": "<scorer-version-id>",
    "window_seconds": 1800,
    "metric": "pass_rate",
    "operator": "lt",
    "threshold": 0.9,
    "min_count": 50,
    "action_type": "notify"
  }'
```

## Operate online evals

Use these endpoints to inspect or change a running setup:

| Task | Endpoint |
| --- | --- |
| List policies | `GET /api/v1/projects/<project-id>/eval/policies?mode=async_online&deployment_id=<deployment-id>` |
| Disable a policy | `PATCH /api/v1/projects/<project-id>/eval/policies/<policy-id>` with `{ "enabled": false }` |
| Create a new policy version | `PATCH /api/v1/projects/<project-id>/eval/policies/<policy-id>` with new `sampling_config` or `scorers` |
| List sample decisions | `GET /api/v1/projects/<project-id>/eval/online/sample-decisions?deployment_id=<deployment-id>` |
| List live scores | `GET /api/v1/projects/<project-id>/eval/scores?source=live&deployment_id=<deployment-id>` |
| Get live score aggregate | `GET /api/v1/projects/<project-id>/eval/online/scores/aggregate?source=live&deployment_id=<deployment-id>` |
| Disable an alert | `PATCH /api/v1/projects/<project-id>/eval/online/alerts/<alert-id>` with `{ "enabled": false }` |

## Next steps

- [Improve with AGNT5](/docs/improve/overview.md): understand how production runs feed scorers, datasets, and experiments.
- [Deploying](/docs/run/deploying.md): deploy a worker before attaching online evals.
- [Workflows](/docs/build/workflows.md): structure runs so scorers can inspect stable inputs and outputs.
