MODULES

Audit

not started

Caching

not started

Routing

not started

Output

not started

Batch API

locked

GRIMReady when you are.

Grimoire

Cut Your Claude API Costs by 80%

5 techniques Anthropic doesn't document.
Each module ends with a real exercise. Leave with code you can ship today.

🔒

$???

your potential monthly savings

Complete the Module 1 calculator to reveal your number

"My Claude API bill hit $340 this month and I have no idea why." — r/ClaudeAI · real dev, real pain · sound familiar?

Haiku $0.25/MTok → Sonnet $3.00/MTok 12× gap · 5 modules to use it

… devs already competing

Loading…

0XP Points

0Day Streak

0mTime Today

⚡ First Step 🔥 Momentum 💎 Cost Master ⏱ Speed Run

MODULE 01 / 05not started

Why Your Claude Bill Is Probably 4× Higher Than It Should Be

⚡ ~8 min+20 XP🟢 Beginner

CONTEXT

Most API bills are 3–4× higher than they should be. Output tokens cost 5× more than input — and that's where the real waste hides.

THE NUMBERS

4×

more expensive than it should be.
That's the average overpaying rate for teams that have never audited their API usage.

The three biggest sources of waste:

max_tokens with no real limit — you set max_tokens: 4096 when your average response is 200 tokens. Claude writes more when given space.
Repeated system prompts without caching — you send the same 800-token prompt on every call, no cache. At 1,000 calls/day: 800,000 tokens wasted daily.
Wrong model for the task — using Sonnet for things Haiku handles perfectly. Sonnet costs 12× more per token.

5×

output tokens cost more than input.
Focusing only on input reduction while ignoring output is leaving your biggest expense untouched.

KNOWLEDGE CHECK

QUICK CHECK — MODULE 01

Which type of token costs more per unit in the Claude API?

EXERCISE

🧮

API Cost Calculator

Enter your numbers — see your real waste

+20 XP

Enter your usage numbers. The calculator shows exactly how much you're wasting and what each technique saves.

Daily API calls

1,000

System prompt size (tokens)

800

Avg response length (tokens)

500

Model

CURRENT / MONTH

→

OPTIMIZED / MONTH

Cost Tracker — Python

import anthropic client = anthropic.Anthropic() # Prices per million tokens (update if model changes) PRICES = { "in": 3.00, "out": 15.00, "cache_write": 3.75, "cache_read": 0.30 } def cost_of(usage) -> dict: u = usage breakdown = { "input": u.input_tokens * PRICES["in"] / 1e6, "output": u.output_tokens * PRICES["out"] / 1e6, "cache_write": getattr(u,"cache_creation_input_tokens",0) * PRICES["cache_write"] / 1e6, "cache_read": getattr(u,"cache_read_input_tokens",0) * PRICES["cache_read"] / 1e6, } breakdown["total"] = sum(breakdown.values()) return breakdown # Usage — wrap any API call response = client.messages.create( model="claude-sonnet-4-6", max_tokens=256, messages=[{"role":"user","content":"Hello"}] ) c = cost_of(response.usage) print(f"input ${c['input']:.4f} | output ${c['output']:.4f} | total ${c['total']:.4f}")

MODULE 02 / 05not started

Prompt Caching — The 2-Line Fix That Cuts Input Costs by 90%

⚡ ~10 min+30 XP🟡 Intermediate

CONTEXT

Two lines of code. Your system prompt is cached for 5 minutes at 10% of the original cost. Break-even: 2 calls.

HOW IT WORKS

Add "cache_control": {"type": "ephemeral"} to the block you want to cache:

BEFORE — no caching

{"role": "system", "content": "You are a specialized analyst..."}

AFTER — 90% cheaper

{"role":"user","content":[{"type":"text","text":"You are a specialized analyst...","cache_control":{"type":"ephemeral"}}]}

90%

reduction in input cost for cached system prompts.
Cache miss on the first call: 125%. Break-even at 2 calls.

KNOWLEDGE CHECK

QUICK CHECK — MODULE 02

After how many calls does prompt caching become profitable?

EXERCISE

⚡

Cache Savings Simulator

Enter your numbers — see your real savings

+30 XP

Set your actual usage. The simulator calculates how much prompt caching saves per month.

System prompt size (tokens)

800

Daily API calls

500

WITHOUT CACHE / month

$—

WITH CACHE / month

$—

YOU SAVE / month

$—

cache_control — Python

import anthropic client = anthropic.Anthropic() # System prompt cached after 1st call (90% discount on re-reads) response = client.messages.create( model="claude-sonnet-4-6", max_tokens=1024, system=[{ "type": "text", "text": your_long_system_prompt, "cache_control": {"type": "ephemeral"} # ← this line }], messages=[{"role": "user", "content": user_message}] ) # Check cache hit in response headers print(response.usage.cache_read_input_tokens) # >0 = cache hit print(response.usage.cache_creation_input_tokens) # >0 = cache write

MODULE 03 / 05not started

Model Routing — Use Haiku for 80% of Your Calls

⚡ ~10 min+25 XP🟡 Intermediate

CONTEXT

One model for everything = maximum cost. Use a 3-tier router:

Haiku — classify, extract, tag, translate (12× cheaper)
Sonnet — write, reason, debug, analyze
Opus — architecture, novel problems, 10k+ context

THE CODE

ROUTING_LAYER.js

function route(task) {
const light = ['classify','extract','summarize','format'];
return light.includes(task.type) ? 'claude-haiku-4-5' : 'claude-sonnet-4-6';
}

12×

cheaper than Sonnet per token.
For classification, extraction and short generation, Haiku isn't a downgrade — it's the right choice.

KNOWLEDGE CHECK

QUICK CHECK — MODULE 03

How much cheaper is Haiku compared to Sonnet per token?

EXERCISE

🎯

Route These Calls

Haiku or Sonnet? 5 tasks — need 4/5 right

+25 XP

For each task below: which model would you use? Tap Haiku or Sonnet — next task loads instantly.

Model Router — Python

def route_model(task: str, complexity: str) -> str: """Route to the cheapest model that can handle the task.""" haiku_tasks = [ "classify", "extract", "summarize", "sentiment", "translate", "tag" ] if any(t in task.lower() for t in haiku_tasks) or complexity == "low": return "claude-haiku-4-5" # ~20× cheaper than Sonnet if complexity == "high": return "claude-opus-4-7" # full power — use sparingly return "claude-sonnet-4-6" # default: fast + capable # Usage model = route_model("classify this email", complexity="low") response = client.messages.create(model=model, ...)

MODULE 04 / 05not started

Output Control — Stop Paying for Tokens You Never Use

⚡ ~8 min+30 XP🟠 Advanced

CONTEXT

Claude writes longer when given more space. Tighten max_tokens to your real P95 — same quality, 40% fewer output tokens.

THE TECHNIQUE

Analyze your last 30 responses. Calculate mean and P95 in tokens
Set max_tokens to P95 + 20% buffer — not the theoretical maximum
Add length instructions to your prompt: "Answer in under 3 sentences"

96%

of budget wasted when max_tokens is 4096 and the real response is 180 tokens.
Adjusting to P95 + 20% fixes this with zero quality impact.

KNOWLEDGE CHECK

QUICK CHECK — MODULE 04

What happens when you reduce max_tokens close to your real average?

EXERCISE

✂️

Token Budget Calibrator

Find your optimal max_tokens

+30 XP

Set your real average response length. Then drag max_tokens to a smart value — the zone indicator tells you if you're in range.

Your real average response (tokens)

180

Your new max_tokens setting

2,000

Set your average response length to start

max_tokens + early stop — Python

# Set max_tokens to P95 response length × 1.2 buffer response = client.messages.create( model="claude-sonnet-4-6", max_tokens=512, # ← measured P95, not 4096 messages=[{"role": "user", "content": prompt}] ) # Detect when the model was actually cut short if response.stop_reason == "max_tokens": print("Warning: response truncated — raise max_tokens") # Output tokens used vs limit used = response.usage.output_tokens print(f"Used {used} / 512 tokens ({used/512*100:.0f}%)")

MODULE 05 / 05🔒 locked

Batch API — Cut Another 50% While You Sleep

⚡ ~12 min+40 XP🔴 Expert

CONTEXT

Non-realtime calls via Batch API = 50% off, automatically. Submit → wait 24h → fetch results. That's it.

WHAT QUALIFIES

What qualifies for batch:

Content generation (descriptions, summaries, posts)
Data enrichment (classify, tag, extract from datasets)
Nightly reports or analyses
Any workflow that runs on a schedule, not on-demand

→ COMBINED SAVINGS

Caching (90% off input) + Model routing (12× cheaper on 80% of calls) + Output control (40% off output) + Batch (50% off total) = effective reduction of 85–92%.

92%

effective cost reduction combining all 4 techniques.

KNOWLEDGE CHECK

QUICK CHECK — MODULE 05

What discount does the Batch API offer compared to the standard price?

EXERCISE

🌙

Batch or Real-time?

Sort 6 tasks — need 5/6 right

+40 XP

Real-time or Batch? Tap one — next task loads instantly.

Batch API — Python

import anthropic client = anthropic.Anthropic() # Submit a batch (50% cheaper, results within 24h) batch = client.messages.batches.create( requests=[ { "custom_id": f"req-{i}", "params": { "model": "claude-haiku-4-5", "max_tokens": 256, "messages": [{"role": "user", "content": prompt}] } } for i, prompt in enumerate(prompts) ] ) print(f"Batch ID: {batch.id} — check results later") # Retrieve results (poll or webhook) results = client.messages.batches.results(batch.id) for result in results: print(result.custom_id, result.result.message.content[0].text)

CERTIFICATE OF COMPLETION

All 5 modules completed.

You now know what 92% of API users don't — and you have the math to prove it.

0 TOTAL XP EARNED

completion #— of this Tome

You're in the top 8% of Claude API users by cost efficiency. Most developers never audit their token usage. You did — and you built the systems to stay there.

THIS WEEK'S MISSION

These 5 changes take under 2 hours. Each one pays back the cost of this course.

0 of 5 done

⚡ NEXT MISSION UNLOCKED

Tome #2 is being forged.

The next Tome tackles what took down Uber — orchestrating multiple Claude agents without the cost exploding. Each mission, a real system you ship.

🔮 Back Tome #2 — pay what you want →

WHAT DO YOU WANT IN TOME #2?

STATS

XP EARNED

LV 1

⚡ Apprentice

0 / 30 XP to next level

0/5

MODULES DONE

savings implemented

💰

0/5 techniques applied

DAILY QUEST

📅 TODAY

Loading quest...

+5 XP bonus

THIS SESSION

TIME 0m

XP TODAY 0

STREAK 🔥 0

SNIPPETS 0 / 5

ACHIEVEMENTS

🌅

🎯

🔥

⚡

🌄

🏆

⏱

🧙

🔮

⚗️

💎

TROPHIES

🥉

Bronze

🥈

Silver

🥇

Gold

💎

Platinum

GLOBAL RANKINGS

Loading rankings…

You made it to the advanced tier.

Cut Your Claude API Costs by 80%

Why Your Claude Bill Is Probably 4× Higher Than It Should Be

Prompt Caching — The 2-Line Fix That Cuts Input Costs by 90%

Model Routing — Use Haiku for 80% of Your Calls

Output Control — Stop Paying for Tokens You Never Use

🔓 Bonus module unlocked

Batch API — Cut Another 50% While You Sleep

All 5 modules completed.

Tome #2 is being forged.