LLM Usage

Tokens, cost, and model distribution across every call

LLM Usage breaks out every model call across every Run — token counts, costs, model distribution, prompt/completion split, cache hits. It answers the questions that raw latency and error-rate metrics don’t: which workflow is driving this month’s Anthropic bill, did that prompt change reduce token spend or just shift it around, which model is producing cache hits.

You’ll find LLM Usage under Observe → LLM Usage in Studio.

Like Metrics, LLM Usage is derived from the same Run data — so every aggregate drills back down to the individual LLM calls that produced it. No separate instrumentation, no sampling, no accounting drift.

When you want to act on this data — iterate on a prompt to cut token spend, experiment with a cheaper model — that moves into Improve.

A deeper guide is in progress. For the full Run story, see the Run overview.

LLM Usage

Tokens, cost, and model distribution across every call

| View as Markdown

You’ll find LLM Usage under Observe → LLM Usage in Studio.

When you want to act on this data — iterate on a prompt to cut token spend, experiment with a cheaper model — that moves into Improve.

A deeper guide is in progress. For the full Run story, see the Run overview.

LLM Usage

On this page

LLM Usage