This is a demo report. Numbers come from PitCrew’s real engine on a hand-curated agent description.
Audit your own agent →
Audit · May 3, 2026Default: anthropic · Sonnet 4.6

Your forecast is in

Slack bot answering HR benefits questions for an 800-person company. People mostly ask about health insurance, 401k, and PTO. Escalates legal/medical questions to a human.

What we assumed

These are the inputs we used. If anything looks off, re-run the audit with better numbers.

System prompt
62 tokens (estimated)
Avg user input
250 tokens
Avg output
400 tokens
Calls per month
7,500
Batch share
0%
Pricing as of
Apr 28, 2026
How precise is this?
Savings band spans 161% of the central estimate. Top sources of uncertainty:
  • Call volume is your guess — typical pre-deploy estimates land within ±50% of actual.
  • Conversation length is a coarse bucket — actual tokens vary by ±40% per call.

Real-bill expectation

PitCrew forecasts steady-state inference cost — the dollars the LLM provider bills for the deterministic, no-extras workload your wizard described. Real production bills are typically 1.2-1.5× higher because the steady-state model excludes:

  • Dev / eval loops (often 10-30% of total spend)
  • Retries, error recovery, idempotency replays
  • Background batch jobs (summaries, classification of past data)
  • A/B traffic on alternate models
  • Embeddings + fine-tunes that ride alongside the agent
ScenarioSteady-state (PitCrew)Expected real bill
Default build$52/mo$62–$78/mo
PitCrew plan$2/mo$2–$3/mo

The 20-50% multiplier comes from public engineering postmortems and the validation cases in docs/accuracy-validation.md. If your team has tight eval loops and minimal retry traffic, target the low end.

Run another audit
for a different build

Tweak inputs, swap the model, see how the forecast moves.

New audit