5 techniques Anthropic doesn't teach you. Each module has an exercise — you only move forward when you actually understand it. Not a blog post. Real training.
Most API bills are 3–4× higher than they should be. Output tokens cost 5× more than input — and that's where the real waste hides.
The three biggest sources of waste:
max_tokens: 4096 when your average response is 200 tokens. Claude writes more when given space.Which type of token costs more per unit in the Claude API?
Enter your usage numbers. The calculator shows exactly how much you're wasting and what each technique saves.
Two lines of code. Your system prompt is cached for 5 minutes at 10% of the original cost. Break-even: 2 calls.
Add "cache_control": {"type": "ephemeral"} to the block you want to cache:
{"role": "system", "content": "You are a specialized analyst..."}
{"role":"user","content":[{"type":"text","text":"You are a specialized analyst...","cache_control":{"type":"ephemeral"}}]}
After how many calls does prompt caching become profitable?
Set your actual usage. The simulator calculates how much prompt caching saves per month.
One model for everything = maximum cost. Use a 3-tier router:
function route(task) { const light = ['classify','extract','summarize','format']; return light.includes(task.type) ? 'claude-haiku-4-5' : 'claude-sonnet-4-6';}
How much cheaper is Haiku compared to Sonnet per token?
Haiku or Sonnet? Tap one — next task loads instantly.
Claude writes longer when given more space. Tighten max_tokens to your real P95 — same quality, 40% fewer output tokens.
What happens when you reduce max_tokens close to your real average?
Set your real average response length. Then drag max_tokens to a smart value — the zone indicator tells you if you're in range.
Non-realtime calls via Batch API = 50% off, automatically. Submit → wait 24h → fetch results. That's it.
What qualifies for batch:
Caching (90% off input) + Model routing (12× cheaper on 80% of calls) + Output control (40% off output) + Batch (50% off total) = effective reduction of 85–92%.
What discount does the Batch API offer compared to the standard price?
Real-time or Batch? Tap one — next task loads instantly.
You now know what 92% of API users don't — and you have the math to prove it.
0 TOTAL XP EARNEDYou're in the top 8% of Claude API users by cost efficiency. Most developers never audit their token usage. You did — and you built the systems to stay there.
No próximo Tome você vai dominar o que derrubou a Uber — orquestrar múltiplos agentes Claude sem o custo explodir. Cada missão, um sistema real.
🔮 Apoiar o Tome #2 — pague o que quiser →