This is a demo report. Numbers come from PitCrew’s real engine on a hand-curated agent description.
Audit your own agent →
Audit · May 7, 2026Brain: anthropic · Haiku 4.5Generation: google · Veo 3 720p

Your forecast is in

Internal tool for a 15-person creative agency. Marketers describe a campaign, the agent drafts a 4-shot script, then generates per-scene Veo prompts and triggers the video render. Roughly 10 finished videos a day across…

What we assumed

These are the inputs we used. If anything looks off, re-run the audit with better numbers.

System prompt
77 tokens (estimated)
Avg user input
1,500 tokens
Avg output
800 tokens
Calls per month
300
Batch share
30%
Pricing as of
Apr 28, 2026
Output per call (sec)
5
Generations per agent call
4
Regeneration rate
1.30×
Resolution
720p
How precise is this?
Savings band spans 263% of the central estimate. Top sources of uncertainty:
  • Call volume is your guess — typical pre-deploy estimates land within ±50% of actual.
  • Conversation length is a coarse bucket — actual tokens vary by ±40% per call.

What’s not included

PitCrew forecasts steady-state AI API spend — the dollars the LLM / embedding provider bills for the deterministic workload your wizard described. A production bill carries two kinds of cost on top that PitCrew doesn’t model:

1. Inference overhead — proportional (2050% on top of steady-state)

  • Dev / eval loops (often 10-30% of total spend)
  • Retries, error recovery, idempotency replays
  • Background batch jobs (summaries, classification of past data)
  • A/B traffic on alternate models
  • Embeddings + fine-tunes that ride alongside the agent
ScenarioSteady-state (PitCrew)With inference overhead
Default build$3,902/mo$4,682–$5,853/mo
PitCrew plan$390/mo$469–$586/mo

2. Hosting & infra — flat (workload-dependent, typically $10–80/mo)

  • Cloud hosting (Vercel / Render / Fly / AWS / etc.)
  • Database (Supabase / Postgres / Mongo / etc.)
  • Managed vector DB or search — Pinecone, Weaviate, OpenSearch typically $25–100/mo (if not already entered in Step 5)
  • CDN, scraping APIs, telephony minutes, transport (Twilio, LiveKit, Zyte, etc.)
  • Vendor SaaS margin if going through a wrapper (Cursor, Vapi, Evee, etc.) instead of direct API

The 20-50% inference multiplier comes from public engineering postmortems and the validation cases in docs/accuracy-validation.md. If your team has tight eval loops and minimal retry traffic, target the low end. The hosting/infra range is highly workload-dependent — small RAG bots may spend nothing extra, voice agents add telephony costs on top.

Run another audit
for a different build

Tweak inputs, swap the model, see how the forecast moves.

New audit