How PitCrew does the math
Three layers, one of them is AI, the rest are arithmetic. Every number on every report traces back to a counted token times a published rate. If you want to verify the math, you can — line by line.
The three layers
Most “AI cost forecasters” let the model do the forecasting. We don’t. The LLM only touches the first layer — the rest is arithmetic on real BPE tokenizers and published rate cards.
Claude reads your freeform description and fills in the wizard fields: archetype, expected tools, conversation length, draft system prompt. We mark each inferred value with a PitCrew suggested pill so you can see what the model contributed and override anything that looks off.
lib/parser/anthropic-parser.ts — Claude Sonnet 4.5 with structured tool-use output and 1-hour LRU cache.
Real BPE tokenizers count your prompt — no “chars divided by 4” heuristic when an exact path exists. OpenAI runs locally via gpt-tokenizer; Anthropic and Google use their official count_tokens endpoints.
lib/wizard/count-tokens.ts — three providers, in-memory cache, async path with heuristic fallback only on API failure.
Pure arithmetic on counted tokens × published rates. Three passes (pessimistic / central / optimistic) give every dollar a low/mid/high confidence band. The cascade ranks optimizations by absolute savings; recommendations cite the rate-card row they come from.
lib/analyze/pipeline.ts — closed-form uncertainty propagation, no LLM, fully reproducible from the same inputs.
Worked example
One agent, one ruleset, every step. Tier-1 support chatbot on anthropic Sonnet 4.6 at 1,000 conversations/day, ~3 turns each. We’ll compute the default-build cost, then add prompt caching as the one recommendation.
count_tokensAvg user input per turn: 60 tokens (short HR question)
Avg output per turn: 180 tokens (concise answer)
Total input tokens: 30,000 × (200 + 60) = 7,800,000
Total output tokens: 30,000 × 180 = 5,400,000
Output cost: (5,400,000 ÷ 1M) × $15 = $81/mo
Default build: $104/mo
Cached input cost ≈ $9/mo
Output unchanged.
PitCrew plan: $90/mo ($15/mo saved, 14% off)
Every line above lives somewhere in the codebase — token counts in lib/wizard/count-tokens.ts, rate-card lookups in model_pricing, the cache-rec math in lib/analyze/pipeline.ts. Run an audit and the same steps produce your numbers, with confidence bands tracking how much we’re inferring vs. confirming.
What Claude does, and doesn’t
- Parses your description into structured wizard fields
- Drafts a starter system prompt you can edit
- Reads architecture diagrams (PNG / PDF) and picks up the tools it sees
- Suggests an archetype when the description is ambiguous
- Multiply token counts by rates
- Decide which model is cheaper
- Approve or rank recommendations
- Compute the confidence bands
See it on a real agent
The same pipeline, same numbers, on three hand-curated examples.