How PitCrew does the math

Three layers, one of them is AI, the rest are arithmetic. Every number on every report traces back to a counted token times a published rate. If you want to verify the math, you can — line by line.

The three layers

Most “AI cost forecasters” let the model do the forecasting. We don’t. The LLM only touches the first layer — the rest is arithmetic on real BPE tokenizers and published rate cards.

01
Input layer AI

Claude reads your freeform description and fills in the wizard fields: archetype, expected tools, conversation length, draft system prompt. We mark each inferred value with a PitCrew suggested pill so you can see what the model contributed and override anything that looks off.

lib/parser/anthropic-parser.ts — Claude Sonnet 4.5 with structured tool-use output and 1-hour LRU cache.

02
Counting layer deterministic

Real BPE tokenizers count your prompt — no “chars divided by 4” heuristic when an exact path exists. OpenAI runs locally via gpt-tokenizer; Anthropic and Google use their official count_tokens endpoints.

lib/wizard/count-tokens.ts — three providers, in-memory cache, async path with heuristic fallback only on API failure.

03
Math layer deterministic

Pure arithmetic on counted tokens × published rates. Three passes (pessimistic / central / optimistic) give every dollar a low/mid/high confidence band. The cascade ranks optimizations by absolute savings; recommendations cite the rate-card row they come from.

lib/analyze/pipeline.ts — closed-form uncertainty propagation, no LLM, fully reproducible from the same inputs.

Worked example

One agent, one ruleset, every step. Tier-1 support chatbot on anthropic Sonnet 4.6 at 1,000 conversations/day, ~3 turns each. We’ll compute the default-build cost, then add prompt caching as the one recommendation.

Step 1
Count the tokens
System prompt: 200 tokens counted by count_tokens
Avg user input per turn: 60 tokens (short HR question)
Avg output per turn: 180 tokens (concise answer)
Step 2
Multiply by volume
Calls per month: 1,000 × 30 = 30,000
Total input tokens: 30,000 × (200 + 60) = 7,800,000
Total output tokens: 30,000 × 180 = 5,400,000
Step 3
Apply Sonnet 4.6’s published rates
Input cost: (7,800,000 ÷ 1M) × $3 = $23/mo
Output cost: (5,400,000 ÷ 1M) × $15 = $81/mo
Default build: $104/mo
Step 4
One recommendation: prompt caching
System prompt rides on every call. With caching, ~90% of the system-prompt input is reused at the cache-read rate ($0.3/Mtok ≈ 10% of full input).
Cached input cost ≈ $9/mo
Output unchanged.
PitCrew plan: $90/mo ($15/mo saved, 14% off)

Every line above lives somewhere in the codebase — token counts in lib/wizard/count-tokens.ts, rate-card lookups in model_pricing, the cache-rec math in lib/analyze/pipeline.ts. Run an audit and the same steps produce your numbers, with confidence bands tracking how much we’re inferring vs. confirming.

What Claude does, and doesn’t

Claude does
  • Parses your description into structured wizard fields
  • Drafts a starter system prompt you can edit
  • Reads architecture diagrams (PNG / PDF) and picks up the tools it sees
  • Suggests an archetype when the description is ambiguous
Claude does not
  • Multiply token counts by rates
  • Decide which model is cheaper
  • Approve or rank recommendations
  • Compute the confidence bands

See it on a real agent

The same pipeline, same numbers, on three hand-curated examples.

See a demo report Audit my own agent