This is a demo report. Numbers come from PitCrew’s real engine on a hand-curated agent description.
Audit your own agent →
Audit · May 3, 2026Default: anthropic · Opus 4.7

Your forecast is in

IDE-integrated agent that reviews diffs before code review. Reads the changed files, runs static analysis tools, summarizes risks, and suggests fixes. About 80 PRs per day across the engineering org.

What we assumed

These are the inputs we used. If anything looks off, re-run the audit with better numbers.

System prompt
55 tokens (estimated)
Avg user input
1,500 tokens
Avg output
800 tokens
Calls per month
2,400
Batch share
0%
Pricing as of
Apr 28, 2026
How precise is this?
Savings band spans 0% of the central estimate. Top sources of uncertainty:
  • Call volume is your guess — typical pre-deploy estimates land within ±50% of actual.
  • Conversation length is a coarse bucket — actual tokens vary by ±40% per call.

What’s not included

PitCrew forecasts steady-state AI API spend — the dollars the LLM / embedding provider bills for the deterministic workload your wizard described. A production bill carries two kinds of cost on top that PitCrew doesn’t model:

1. Inference overhead — proportional (2050% on top of steady-state)

  • Dev / eval loops (often 10-30% of total spend)
  • Retries, error recovery, idempotency replays
  • Background batch jobs (summaries, classification of past data)
  • A/B traffic on alternate models
  • Embeddings + fine-tunes that ride alongside the agent
ScenarioSteady-state (PitCrew)With inference overhead
Default build$67/mo$80–$100/mo
PitCrew plan$67/mo$80–$100/mo

2. Hosting & infra — flat (workload-dependent, typically $10–80/mo)

  • Cloud hosting (Vercel / Render / Fly / AWS / etc.)
  • Database (Supabase / Postgres / Mongo / etc.)
  • Managed vector DB or search — Pinecone, Weaviate, OpenSearch typically $25–100/mo (if not already entered in Step 5)
  • CDN, scraping APIs, telephony minutes, transport (Twilio, LiveKit, Zyte, etc.)
  • Vendor SaaS margin if going through a wrapper (Cursor, Vapi, Evee, etc.) instead of direct API

The 20-50% inference multiplier comes from public engineering postmortems and the validation cases in docs/accuracy-validation.md. If your team has tight eval loops and minimal retry traffic, target the low end. The hosting/infra range is highly workload-dependent — small RAG bots may spend nothing extra, voice agents add telephony costs on top.

Run another audit
for a different build

Tweak inputs, swap the model, see how the forecast moves.

New audit