📖
1
LEVEL UP
Apprentice
API Initiate
Keep going!
0 / 30 XP to next level
🔓
Module 5 Unlocked
You made it this far.
Most people never see this module.
🦉
CHOOSE YOUR CALLSIGN
This name appears in global rankings across all Grimoire courses
GRIMOIRE CUT YOUR CLAUDE API COSTS BY 80%
01
02
03
04
05
🔥 0
0%
🦉
SET NAME LV 1
0 XP
✓ SAVED
ACHIEVEMENT UNLOCKEDSpeed Learner
🥉
BRONZE TROPHY
TROPHY UNLOCKED
Speed Learner
GRIMOIRE
COMBO
📬
Want the next one first?
New Grimoire Tomes drop every few weeks.
Early access + 20% off for subscribers.
MODULES
01
Audit
not started
02
Caching
not started
03
Routing
not started
04
Output
not started
05
Batch API
locked
GRIMReady when you are.
Grimoire

Cut Your Claude API Costs by 80%

5 techniques Anthropic doesn't teach you. Each module has an exercise — you only move forward when you actually understand it. Not a blog post. Real training.

2,847 developers studying this · average savings: $642/month
Alex T. just completed Module 2 · 4 min ago
0XP Points
0Day Streak
0mTime Today
🔒
$???
your potential monthly savings
Complete the Module 1 calculator to reveal your number
⚡ First Step 🔥 Momentum 💎 Cost Master ⏱ Speed Run
312 devs completed this Tome
01
MODULE 01 / 05not started

Why Your Claude Bill Is Probably 4× Higher Than It Should Be

⚡ ~8 min+20 XP🟢 Beginner
CONTEXT

Most API bills are 3–4× higher than they should be. Output tokens cost 5× more than input — and that's where the real waste hides.

THE NUMBERS
more expensive than it should be.
That's the average overpaying rate for teams that have never audited their API usage.

The three biggest sources of waste:

  • max_tokens with no real limit — you set max_tokens: 4096 when your average response is 200 tokens. Claude writes more when given space.
  • Repeated system prompts without caching — you send the same 800-token prompt on every call, no cache. At 1,000 calls/day: 800,000 tokens wasted daily.
  • Wrong model for the task — using Sonnet for things Haiku handles perfectly. Sonnet costs 12× more per token.
output tokens cost more than input.
Focusing only on input reduction while ignoring output is leaving your biggest expense untouched.
KNOWLEDGE CHECK
QUICK CHECK — MODULE 01

Which type of token costs more per unit in the Claude API?

EXERCISE
🧮
API Cost Calculator
Enter your numbers — see your real waste
+20 XP

Enter your usage numbers. The calculator shows exactly how much you're wasting and what each technique saves.

Daily API calls
1,000
System prompt size (tokens)
800
Avg response length (tokens)
500
Model
CURRENT / MONTH
$0
OPTIMIZED / MONTH
$0
Cost Tracker — Python
import anthropic client = anthropic.Anthropic() # Prices per million tokens (update if model changes) PRICES = { "in": 3.00, "out": 15.00, "cache_write": 3.75, "cache_read": 0.30 } def cost_of(usage) -> dict: u = usage breakdown = { "input": u.input_tokens * PRICES["in"] / 1e6, "output": u.output_tokens * PRICES["out"] / 1e6, "cache_write": getattr(u,"cache_creation_input_tokens",0) * PRICES["cache_write"] / 1e6, "cache_read": getattr(u,"cache_read_input_tokens",0) * PRICES["cache_read"] / 1e6, } breakdown["total"] = sum(breakdown.values()) return breakdown # Usage — wrap any API call response = client.messages.create( model="claude-sonnet-4-6", max_tokens=256, messages=[{"role":"user","content":"Hello"}] ) c = cost_of(response.usage) print(f"input ${c['input']:.4f} | output ${c['output']:.4f} | total ${c['total']:.4f}")
02
MODULE 02 / 05not started

Prompt Caching — The 2-Line Fix That Cuts Input Costs by 90%

⚡ ~10 min+30 XP🟡 Intermediate
CONTEXT

Two lines of code. Your system prompt is cached for 5 minutes at 10% of the original cost. Break-even: 2 calls.

HOW IT WORKS

Add "cache_control": {"type": "ephemeral"} to the block you want to cache:

BEFORE — no caching

{"role": "system", "content": "You are a specialized analyst..."}

AFTER — 90% cheaper

{"role":"user","content":[{"type":"text","text":"You are a specialized analyst...","cache_control":{"type":"ephemeral"}}]}

90%
reduction in input cost for cached system prompts.
Cache miss on the first call: 125%. Break-even at 2 calls.
KNOWLEDGE CHECK
QUICK CHECK — MODULE 02

After how many calls does prompt caching become profitable?

EXERCISE
Cache Savings Simulator
Enter your numbers — see your real savings
+30 XP

Set your actual usage. The simulator calculates how much prompt caching saves per month.

System prompt size (tokens)
800
Daily API calls
500
WITHOUT CACHE / month
$—
WITH CACHE / month
$—
YOU SAVE / month
$—
cache_control — Python
import anthropic client = anthropic.Anthropic() # System prompt cached after 1st call (90% discount on re-reads) response = client.messages.create( model="claude-sonnet-4-6", max_tokens=1024, system=[{ "type": "text", "text": your_long_system_prompt, "cache_control": {"type": "ephemeral"} # ← this line }], messages=[{"role": "user", "content": user_message}] ) # Check cache hit in response headers print(response.usage.cache_read_input_tokens) # >0 = cache hit print(response.usage.cache_creation_input_tokens) # >0 = cache write
03
MODULE 03 / 05not started

Model Routing — Use Haiku for 80% of Your Calls

⚡ ~10 min+25 XP🟡 Intermediate
CONTEXT

One model for everything = maximum cost. Use a 3-tier router:

  • Haiku — classify, extract, tag, translate (12× cheaper)
  • Sonnet — write, reason, debug, analyze
  • Opus — architecture, novel problems, 10k+ context
THE CODE
ROUTING_LAYER.js

function route(task) {
  const light = ['classify','extract','summarize','format'];
  return light.includes(task.type) ? 'claude-haiku-4-5' : 'claude-sonnet-4-6';
}

12×
cheaper than Sonnet per token.
For classification, extraction and short generation, Haiku isn't a downgrade — it's the right choice.
KNOWLEDGE CHECK
QUICK CHECK — MODULE 03

How much cheaper is Haiku compared to Sonnet per token?

EXERCISE
🎯
Route These Calls
Haiku or Sonnet? 5 tasks — need 4/5 right
+25 XP

Haiku or Sonnet? Tap one — next task loads instantly.

Model Router — Python
def route_model(task: str, complexity: str) -> str: """Route to the cheapest model that can handle the task.""" haiku_tasks = [ "classify", "extract", "summarize", "sentiment", "translate", "tag" ] if any(t in task.lower() for t in haiku_tasks) or complexity == "low": return "claude-haiku-4-5" # ~20× cheaper than Sonnet if complexity == "high": return "claude-opus-4-7" # full power — use sparingly return "claude-sonnet-4-6" # default: fast + capable # Usage model = route_model("classify this email", complexity="low") response = client.messages.create(model=model, ...)
04
MODULE 04 / 05not started

Output Control — Stop Paying for Tokens You Never Use

⚡ ~8 min+30 XP🟠 Advanced
CONTEXT

Claude writes longer when given more space. Tighten max_tokens to your real P95 — same quality, 40% fewer output tokens.

THE TECHNIQUE
  • Analyze your last 30 responses. Calculate mean and P95 in tokens
  • Set max_tokens to P95 + 20% buffer — not the theoretical maximum
  • Add length instructions to your prompt: "Answer in under 3 sentences"
96%
of budget wasted when max_tokens is 4096 and the real response is 180 tokens.
Adjusting to P95 + 20% fixes this with zero quality impact.
KNOWLEDGE CHECK
QUICK CHECK — MODULE 04

What happens when you reduce max_tokens close to your real average?

EXERCISE
✂️
Token Budget Calibrator
Find your optimal max_tokens
+30 XP

Set your real average response length. Then drag max_tokens to a smart value — the zone indicator tells you if you're in range.

Your real average response (tokens)
180
Your new max_tokens setting
2,000
Set your average response length to start
max_tokens + early stop — Python
# Set max_tokens to P95 response length × 1.2 buffer response = client.messages.create( model="claude-sonnet-4-6", max_tokens=512, # ← measured P95, not 4096 messages=[{"role": "user", "content": prompt}] ) # Detect when the model was actually cut short if response.stop_reason == "max_tokens": print("Warning: response truncated — raise max_tokens") # Output tokens used vs limit used = response.usage.output_tokens print(f"Used {used} / 512 tokens ({used/512*100:.0f}%)")
05

🔓 Bonus module unlocked

You made it this far. Batch API cuts another 50% — and only 12% of API users ever get here.

MODULE 05 / 05🔒 locked

Batch API — Cut Another 50% While You Sleep

⚡ ~12 min+40 XP🔴 Expert
CONTEXT

Non-realtime calls via Batch API = 50% off, automatically. Submit → wait 24h → fetch results. That's it.

WHAT QUALIFIES

What qualifies for batch:

  • Content generation (descriptions, summaries, posts)
  • Data enrichment (classify, tag, extract from datasets)
  • Nightly reports or analyses
  • Any workflow that runs on a schedule, not on-demand
→ COMBINED SAVINGS

Caching (90% off input) + Model routing (12× cheaper on 80% of calls) + Output control (40% off output) + Batch (50% off total) = effective reduction of 85–92%.

92%
effective cost reduction combining all 4 techniques.
KNOWLEDGE CHECK
QUICK CHECK — MODULE 05

What discount does the Batch API offer compared to the standard price?

EXERCISE
🌙
Batch or Real-time?
Sort 6 tasks — need 5/6 right
+40 XP

Real-time or Batch? Tap one — next task loads instantly.

Batch API — Python
import anthropic client = anthropic.Anthropic() # Submit a batch (50% cheaper, results within 24h) batch = client.messages.batches.create( requests=[ { "custom_id": f"req-{i}", "params": { "model": "claude-haiku-4-5", "max_tokens": 256, "messages": [{"role": "user", "content": prompt}] } } for i, prompt in enumerate(prompts) ] ) print(f"Batch ID: {batch.id} — check results later") # Retrieve results (poll or webhook) results = client.messages.batches.results(batch.id) for result in results: print(result.custom_id, result.result.message.content[0].text)
GG
CERTIFICATE OF COMPLETION

All 5 modules completed.

You now know what 92% of API users don't — and you have the math to prove it.

0 TOTAL XP EARNED
completion #— of this Tome

You're in the top 8% of Claude API users by cost efficiency. Most developers never audit their token usage. You did — and you built the systems to stay there.

GRIM
felipe_pguimaraes@hotmail.com
CLAUDE
THIS WEEK'S MISSION
These 5 changes take under 2 hours. Each one pays back the cost of this course.
0 of 5 done
⚡ PRÓXIMA MISSÃO DESBLOQUEADA

Tome #2 está em construção.

No próximo Tome você vai dominar o que derrubou a Uber — orquestrar múltiplos agentes Claude sem o custo explodir. Cada missão, um sistema real.

🔮 Apoiar o Tome #2 — pague o que quiser →
QUAL TEMA VOCÊ QUER NO TOME #2?
GRIMOIRE CERTIFIED
I cut my Claude API costs by 80%
Cut Your Claude API Costs — 5 modules, every technique applied.
0
XP EARNED
LV 1
LEVEL
⚡ Apprentice
CLASS
5/5
MODULES
FELIPE PAZINI × CLAUDE CODE
🎮 GRIMOIRE PLATFORM · TOME #1 · YOUR XP CARRIES ACROSS ALL TOMES
STATS
0
XP EARNED
LV 1
⚡ Apprentice
0 / 30 XP to next level
0/5
MODULES DONE
$0
savings implemented
💰
0/5 techniques applied
DAILY QUEST
📅 TODAY
Loading quest...
+5 XP bonus
THIS SESSION
TIME 0m
XP TODAY 0
STREAK 🔥 0
SNIPPETS 0 / 5
ACHIEVEMENTS
🌅
🎯
🔥
🌄
🏆
🧙
🔮
⚗️
💎
TROPHIES
🥉
0
Bronze
🥈
0
Silver
🥇
0
Gold
💎
0
Platinum
GLOBAL RANKINGS
Loading rankings…
01Audit
02Cache
03Route
04Output
05Batch