This is a demo report. Numbers come from PitCrew’s real engine on a hand-curated agent description.
Audit your own agent →
Audit · May 3, 2026Default: anthropic · Opus 4.7

Your forecast is in

IDE-integrated agent that reviews diffs before code review. Reads the changed files, runs static analysis tools, summarizes risks, and suggests fixes. About 80 PRs per day across the engineering org.

What we assumed

These are the inputs we used. If anything looks off, re-run the audit with better numbers.

System prompt
55 tokens (estimated)
Avg user input
1,500 tokens
Avg output
800 tokens
Calls per month
2,400
Batch share
0%
Pricing as of
Apr 28, 2026
How precise is this?
Savings band spans 0% of the central estimate. Top sources of uncertainty:
  • Call volume is your guess — typical pre-deploy estimates land within ±50% of actual.
  • Conversation length is a coarse bucket — actual tokens vary by ±40% per call.

Real-bill expectation

PitCrew forecasts steady-state inference cost — the dollars the LLM provider bills for the deterministic, no-extras workload your wizard described. Real production bills are typically 1.2-1.5× higher because the steady-state model excludes:

  • Dev / eval loops (often 10-30% of total spend)
  • Retries, error recovery, idempotency replays
  • Background batch jobs (summaries, classification of past data)
  • A/B traffic on alternate models
  • Embeddings + fine-tunes that ride alongside the agent
ScenarioSteady-state (PitCrew)Expected real bill
Default build$67/mo$80–$100/mo
PitCrew plan$67/mo$80–$100/mo

The 20-50% multiplier comes from public engineering postmortems and the validation cases in docs/accuracy-validation.md. If your team has tight eval loops and minimal retry traffic, target the low end.

Run another audit
for a different build

Tweak inputs, swap the model, see how the forecast moves.

New audit