Your forecast is in
Internal tool for a 15-person creative agency. Marketers describe a campaign, the agent drafts a 4-shot script, then generates per-scene Veo prompts and triggers the video render. Roughly 10 finished videos a day across…
How PitCrew gets you to $390/mo
Each recommendation below is one change you make at design time, with the dollars it shaves and the running total saved before you ship.
Action plan
The full reasoning behind each recommendation — copy into your build doc.
runwayml Gen-3 Turbo prices its per-second video workload at $0.05/unit vs. your current $0.5. Same modality, comparable workflow.
M2.7 from minimax runs the same workload at lower cost (budget tier, same quality bucket). Spec lists it as good for: agentic, productivity. Verify quality on a sample of your traffic before fully switching.
Considered, didn’t apply
PitCrew checks every lever — model fit, prompt caching, batch lanes, prompt trimming. Here’s why the rest didn’t make the cut on this build.
- Prompt cachingYour system prompt is 77 tokens; caching needs ≥1,024 tokens to amortize the cache-write cost.
- Trim system promptNo redundancy detected — your 77-token prompt is already tight.
- Batch APIanthropic doesn't offer a Batch API, and no quality-equivalent provider in our pricing table does either.
Alternative video models
Same modality, your wizard’s output settings. Click Try as default to re-render this report with that generation model as the new baseline.
| Model | Unit cost | Resolution | Monthly cost | vs default | Try as default |
|---|---|---|---|---|---|
runwaymlGen-3 Turbo fast iterationsocial-first | $0.05/sec | 720p | $390/mo | $-3,510/mo | Try as default → |
fal.aiMulti-Model Router multi-modelexperimentationflexible routing | $0.05/sec | 720p | $390/mo | $-3,510/mo | Try as default → |
klingKling 1.5 Standard consistent characterssmooth motion | $0.07/sec | 720p | $546/mo | $-3,354/mo | Try as default → |
openaiSora 480p quick previewsB-roll | $0.10/sec | 480p | $780/mo | $-3,120/mo | Try as default → |
runwaymlGen-3 Alpha cinematicnarrative shots | $0.12/sec | 720p | $936/mo | $-2,964/mo | Try as default → |
openaiSora 1080p hero shotscinematic | $0.30/sec | 1080p | $2,340/mo | $-1,560/mo | Try as default → |
lumaDream Machine dream-likesurreal visuals | $0.40/song | 720p | $3,120/mo | $-780/mo | Try as default → |
googleVeo 3 720p Defaultrealistic motionphotorealismphysics-accurate | $0.50/sec | 720p | $3,900/mo | — |
What we assumed
These are the inputs we used. If anything looks off, re-run the audit with better numbers.
- Call volume is your guess — typical pre-deploy estimates land within ±50% of actual.
- Conversation length is a coarse bucket — actual tokens vary by ±40% per call.
What’s not included
PitCrew forecasts steady-state AI API spend — the dollars the LLM / embedding provider bills for the deterministic workload your wizard described. A production bill carries two kinds of cost on top that PitCrew doesn’t model:
1. Inference overhead — proportional (20–50% on top of steady-state)
- Dev / eval loops (often 10-30% of total spend)
- Retries, error recovery, idempotency replays
- Background batch jobs (summaries, classification of past data)
- A/B traffic on alternate models
- Embeddings + fine-tunes that ride alongside the agent
| Scenario | Steady-state (PitCrew) | With inference overhead |
|---|---|---|
| Default build | $3,902/mo | $4,682–$5,853/mo |
| PitCrew plan | $390/mo | $469–$586/mo |
2. Hosting & infra — flat (workload-dependent, typically $10–80/mo)
- Cloud hosting (Vercel / Render / Fly / AWS / etc.)
- Database (Supabase / Postgres / Mongo / etc.)
- Managed vector DB or search — Pinecone, Weaviate, OpenSearch typically $25–100/mo (if not already entered in Step 5)
- CDN, scraping APIs, telephony minutes, transport (Twilio, LiveKit, Zyte, etc.)
- Vendor SaaS margin if going through a wrapper (Cursor, Vapi, Evee, etc.) instead of direct API
The 20-50% inference multiplier comes from public engineering postmortems and the validation cases in docs/accuracy-validation.md. If your team has tight eval loops and minimal retry traffic, target the low end. The hosting/infra range is highly workload-dependent — small RAG bots may spend nothing extra, voice agents add telephony costs on top.
How sensitive is this forecast?
Pre-deploy estimates are guesses. Here’s how the savings shift if the volume or conversation length you guessed turns out to be off.
Run another audit
for a different build
Tweak inputs, swap the model, see how the forecast moves.
New audit