How PitCrew does the math

Three layers, one of them is AI, the rest are arithmetic. Every number on every report traces back to a counted token times a published rate. If you want to verify the math, you can — line by line.

The three layers

Most “AI cost forecasters” let the model do the forecasting. We don’t. The LLM only touches the first layer — the rest is arithmetic on real BPE tokenizers and published rate cards.

Input layer AI

Claude reads your freeform description and fills in the wizard fields: archetype, expected tools, conversation length, draft system prompt. We mark each inferred value with a PitCrew suggested pill so you can see what the model contributed and override anything that looks off.

lib/parser/anthropic-parser.ts — Claude Sonnet 4.5 with structured tool-use output and 1-hour LRU cache.

Counting layer deterministic

Real BPE tokenizers count your prompt — no “chars divided by 4” heuristic when an exact path exists. OpenAI runs locally via gpt-tokenizer; Anthropic and Google use their official count_tokens endpoints.

lib/wizard/count-tokens.ts — three providers, in-memory cache, async path with heuristic fallback only on API failure.

Math layer deterministic

Pure arithmetic on counted tokens × published rates. Three passes (pessimistic / central / optimistic) give every dollar a low/mid/high confidence band. The cascade ranks optimizations by absolute savings; recommendations cite the rate-card row they come from.

lib/analyze/pipeline.ts — closed-form uncertainty propagation, no LLM, fully reproducible from the same inputs.

Worked example

One agent, one ruleset, every step. Tier-1 support chatbot on anthropic Sonnet 4.6 at 1,000 conversations/day, ~3 turns each. We’ll compute the default-build cost, then add prompt caching as the one recommendation.

Step 1

Count the tokens

System prompt: 200 tokens counted by count_tokens
Avg user input per turn: 60 tokens (short HR question)
Avg output per turn: 180 tokens (concise answer)

Step 2

Multiply by volume

Calls per month: 1,000 × 30 = 30,000
Total input tokens: 30,000 × (200 + 60) = 7,800,000
Total output tokens: 30,000 × 180 = 5,400,000

Step 3

Apply Sonnet 4.6’s published rates

Input cost: (7,800,000 ÷ 1M) × $3 = $23/mo
Output cost: (5,400,000 ÷ 1M) × $15 = $81/mo
Default build: $104/mo

Step 4

One recommendation: prompt caching

System prompt rides on every call. With caching, ~90% of the system-prompt input is reused at the cache-read rate ($0.3/Mtok ≈ 10% of full input).
Cached input cost ≈ $9/mo
Output unchanged.
PitCrew plan: $90/mo ($15/mo saved, 14% off)

Every line above lives somewhere in the codebase — token counts in lib/wizard/count-tokens.ts, rate-card lookups in model_pricing, the cache-rec math in lib/analyze/pipeline.ts. Run an audit and the same steps produce your numbers, with confidence bands tracking how much we’re inferring vs. confirming.

What Claude does, and doesn’t

Claude does

Parses your description into structured wizard fields
Drafts a starter system prompt you can edit
Reads architecture diagrams (PNG / PDF) and picks up the tools it sees
Suggests an archetype when the description is ambiguous

Claude does not

Multiply token counts by rates
Decide which model is cheaper
Approve or rank recommendations
Compute the confidence bands

See it on a real agent

The same pipeline, same numbers, on three hand-curated examples.

See a demo report Audit my own agent