Code review assistant — PitCrew demo

Default build

$67/mo

$1/user/mo

anthropic Opus 4.7 on every call, no caching

PitCrew plan

$67/mo

$1/user/mo

already efficient — no cheaper alternatives at this quality

Real bill typically runs $80–$100/mo — PitCrew computes steady-state inference; production adds dev/eval/retry overhead. Why →

Action plan

No optimizations apply to this build — it's already efficient for the workload you described. Below, the levers PitCrew checked and why each one wasn't a fit.

Considered, didn’t apply

PitCrew checks every lever — model fit, prompt caching, batch lanes, prompt trimming. Here’s why the rest didn’t make the cut on this build.

Prompt caching
Your system prompt is 55 tokens; caching needs ≥1,024 tokens to amortize the cache-write cost.
Trim system prompt
No redundancy detected — your 55-token prompt is already tight.
Switch to a cheaper model
No candidate beats Opus 4.7 by more than $1.00/mo on your inputs — your model choice is already efficient.
Batch API
This is a real-time agent (0% async traffic). No work to route to a batch lane.

Alternative models

Same quality tier, your wizard inputs. No caching or batch applied — every row is a directly-comparable raw monthly cost. Click Try as default to re-render this report with that model as the new baseline.

Model	Input $/Mtok	Output $/Mtok	Context	Monthly cost	vs default	Open in audit
openaiGPT-5.4 general purpose	$3	$10	—	$29/mo	$-38/mo	Try as default →
xaiGrok 4.1 real-time dataX integration	$3	$15	—	$40/mo	$-27/mo	Try as default →
anthropicOpus 4.7 Default complex reasoningcodeagentic	$5	$25	—	$67/mo	—	Try as default →
anthropicOpus 4.6 complex reasoningcodeagentic	$5	$25	—	$67/mo	+$0/mo	Try as default →
openaio3 deep reasoning	$10	$40	—	$114/mo	+$47/mo	Try as default →

What we assumed

These are the inputs we used. If anything looks off, re-run the audit with better numbers.

System prompt

55 tokens (estimated)

Avg user input

1,500 tokens

Avg output

800 tokens

Calls per month

2,400

Batch share

0%

Pricing as of

Apr 28, 2026

How precise is this?

Savings band spans 0% of the central estimate. Top sources of uncertainty:

Call volume is your guess — typical pre-deploy estimates land within ±50% of actual.
Conversation length is a coarse bucket — actual tokens vary by ±40% per call.

Real-bill expectation

PitCrew forecasts steady-state inference cost — the dollars the LLM provider bills for the deterministic, no-extras workload your wizard described. Real production bills are typically 1.2-1.5× higher because the steady-state model excludes:

Dev / eval loops (often 10-30% of total spend)
Retries, error recovery, idempotency replays
Background batch jobs (summaries, classification of past data)
A/B traffic on alternate models
Embeddings + fine-tunes that ride alongside the agent

Scenario	Steady-state (PitCrew)	Expected real bill
Default build	$67/mo	$80–$100/mo
PitCrew plan	$67/mo	$80–$100/mo

The 20-50% multiplier comes from public engineering postmortems and the validation cases in docs/accuracy-validation.md. If your team has tight eval loops and minimal retry traffic, target the low end.

How sensitive is this forecast?

Pre-deploy estimates are guesses. Here’s how the savings shift if the volume or conversation length you guessed turns out to be off.

If your volume is different

Scenario

Default monthly

PitCrew monthly

Savings

Half the volume

40 calls/day (0.5×)

$33

−$0/mo

As you estimated

80 calls/day

$67

−$0/mo

Double the volume

160 calls/day (2×)

$133

−$0/mo

5× the volume

400 calls/day (5×)

$333

−$0/mo

If conversations run shorter or longer

Scenario

Default monthly

PitCrew monthly

Savings

One bucket shorter

Medium (5-15 turns)

$44

−$0/mo

As you estimated

Long (15+ turns)

$67

−$0/mo

Your forecast is in

Action plan

Considered, didn’t apply

Alternative models

What we assumed

Real-bill expectation

How sensitive is this forecast?

Run another audit
for a different build

Your forecast is in

Action plan

Considered, didn’t apply

Alternative models

What we assumed

Real-bill expectation

How sensitive is this forecast?

Run another auditfor a different build

Run another audit
for a different build