Slack HR helper bot — PitCrew demo

Default build

$52/mo

$0.07/user/mo

anthropic Sonnet 4.6 on every call, no caching

PitCrew plan

$2/mo

<$0.01/user/mo

with 1 recommended change

$24–$105 saved every month ($50 most likely)

96% off the default build

Real bill typically runs $2–$3/mo — PitCrew prices AI API spend at steady-state; production adds inference overhead and separate hosting/infra costs. What’s not included →

How PitCrew gets you to $2/mo

Each recommendation below is one change you make at design time, with the dollars it shaves and the running total saved before you ship.

01

Default build

anthropic Sonnet 4.6 on every call, no caching, real-time pricing on async work

$52/mo

starting point

02

Switch to V3.2 Chat

V3.2 Chat from deepseek runs the same workload at lower cost (budget tier, one tier below).

$2/mo

−$50/mo

$50 saved before build

Action plan

The full reasoning behind each recommendation — copy into your build doc.

01

Switch to V3.2 Chat

low confidence — assumptions vary widely

−$24–$105/mo

V3.2 Chat from deepseek runs the same workload at lower cost (budget tier, one tier below). Spec lists it as good for: general purpose. Verify quality on a sample of your traffic before fully switching.

V3.2 Chat is one tier below Sonnet 4.6 — A/B test on a small slice of traffic before fully switching.

Different provider (deepseek vs anthropic) — you'll need a separate API key and may see different latency.

Considered, didn’t apply

PitCrew checks every lever — model fit, prompt caching, batch lanes, prompt trimming. Here’s why the rest didn’t make the cut on this build.

Prompt caching
Your system prompt is 62 tokens; caching needs ≥1,024 tokens to amortize the cache-write cost.
Trim system prompt
No redundancy detected — your 62-token prompt is already tight.
Batch API
This is a real-time agent (0% async traffic). No work to route to a batch lane.

Alternative models

Same quality tier, your wizard inputs. No caching or batch applied — every row is a directly-comparable raw monthly cost. Click Try as default to re-render this report with that model as the new baseline.

Model	Input $/Mtok	Output $/Mtok	Context	Monthly cost	vs default	Open in audit
deepgramVoice Agent voicelow-latencyaccurate-stt	$0	$0	—	$0/mo	$-52/mo	Try as default →
cartesiaConversational voicelow-costhigh-throughput	$0	$0	—	$0/mo	$-52/mo	Try as default →
voyagevoyage-3 embeddingsemantic-searchcode-friendly	$0.06	$0	—	$0.14/mo	$-52/mo	Try as default →
openaitext-embedding-ada-002 embeddinglegacysemantic-search	$0.10	$0	—	$0.23/mo	$-52/mo	Try as default →
cohereembed-english-v3 embeddingenglish-onlyhigh-quality	$0.10	$0	—	$0.23/mo	$-52/mo	Try as default →
cohereembed-multilingual-v3 embeddingmultilingual100+ languages	$0.10	$0	—	$0.23/mo	$-52/mo	Try as default →
deepseekV4 codingreasoning	$0.30	$0.50	—	$2/mo	$-50/mo	Try as default →
deepseekR1 complex reasoning	$0.55	$2	—	$8/mo	$-44/mo	Try as default →
mistralLarge 2 multilingualreasoning	$2	$6	—	$23/mo	$-29/mo	Try as default →
googleGemini 2.5 Pro long contextmultimodal	$1	$10	—	$33/mo	$-19/mo	Try as default →
openaiGPT-4o multimodal	$3	$10	—	$36/mo	$-16/mo	Try as default →
openaiGPT-5.2 balanced	$2	$14	—	$46/mo	$-6/mo	Try as default →
anthropicSonnet 4.6 Default general purposebalanced	$3	$15	—	$52/mo	—	Try as default →

What we assumed

These are the inputs we used. If anything looks off, re-run the audit with better numbers.

System prompt

62 tokens (estimated)

Avg user input

250 tokens

Avg output

400 tokens

Calls per month

7,500

Batch share

0%

Pricing as of

Apr 28, 2026

How precise is this?

Savings band spans 161% of the central estimate. Top sources of uncertainty:

Call volume is your guess — typical pre-deploy estimates land within ±50% of actual.
Conversation length is a coarse bucket — actual tokens vary by ±40% per call.

What’s not included

PitCrew forecasts steady-state AI API spend — the dollars the LLM / embedding provider bills for the deterministic workload your wizard described. A production bill carries two kinds of cost on top that PitCrew doesn’t model:

1. Inference overhead — proportional (20–50% on top of steady-state)

Dev / eval loops (often 10-30% of total spend)
Retries, error recovery, idempotency replays
Background batch jobs (summaries, classification of past data)
A/B traffic on alternate models
Embeddings + fine-tunes that ride alongside the agent

Scenario	Steady-state (PitCrew)	With inference overhead
Default build	$52/mo	$62–$78/mo
PitCrew plan	$2/mo	$2–$3/mo

2. Hosting & infra — flat (workload-dependent, typically $10–80/mo)

Cloud hosting (Vercel / Render / Fly / AWS / etc.)
Database (Supabase / Postgres / Mongo / etc.)
Managed vector DB or search — Pinecone, Weaviate, OpenSearch typically $25–100/mo (if not already entered in Step 5)
CDN, scraping APIs, telephony minutes, transport (Twilio, LiveKit, Zyte, etc.)
Vendor SaaS margin if going through a wrapper (Cursor, Vapi, Evee, etc.) instead of direct API

The 20-50% inference multiplier comes from public engineering postmortems and the validation cases in docs/accuracy-validation.md. If your team has tight eval loops and minimal retry traffic, target the low end. The hosting/infra range is highly workload-dependent — small RAG bots may spend nothing extra, voice agents add telephony costs on top.

How sensitive is this forecast?

Pre-deploy estimates are guesses. Here’s how the savings shift if the volume or conversation length you guessed turns out to be off.

If your volume is different

Scenario

Default monthly

PitCrew monthly

Savings

Half the volume

125 calls/day (0.5×)

$26

$0.96

−$25/mo

As you estimated

250 calls/day

$52

$2

−$50/mo

Double the volume

500 calls/day (2×)

$104

$4

−$100/mo

5× the volume

1,250 calls/day (5×)

$260

$10

−$251/mo

If conversations run shorter or longer

Scenario

Default monthly

PitCrew monthly

Savings

One bucket shorter

Single-turn (no memory)

$26

$0.97

−$25/mo

As you estimated

Short (2-5 turns)

$52

$2

−$50/mo

One bucket longer

Medium (5-15 turns)

$82

$3

−$79/mo

Your forecast is in

How PitCrew gets you to $2/mo

Action plan

Considered, didn’t apply

Alternative models

What we assumed

What’s not included

1. Inference overhead — proportional (20–50% on top of steady-state)

2. Hosting & infra — flat (workload-dependent, typically $10–80/mo)

How sensitive is this forecast?

Run another audit
for a different build

Your forecast is in

How PitCrew gets you to $2/mo

Action plan

Considered, didn’t apply

Alternative models

What we assumed

What’s not included

1. Inference overhead — proportional (20–50% on top of steady-state)

2. Hosting & infra — flat (workload-dependent, typically $10–80/mo)

How sensitive is this forecast?

Run another auditfor a different build

Run another audit
for a different build