Justy: Exploring Next, episode 346. One user cut their Claude Code bill from $1,389 a month to $200—not by switching models or skipping work, just by fixing how they set it up.
Cody: Yeah, and the postmortem from Anthropic was interesting. They fixed three bugs in late April, reset subscriber limits. But the author's argument is the real lever isn't on their side—it's on ours.
Justy: So what's actually eating the budget? Is it just people running long sessions?
Cody: Four things, and they're all preventable. Prompt caching—if you're not hitting it, you're paying full price on every turn. Context bloat, wrong model routing, and token-expensive inputs like screenshots. Read cost is 0.1× input price. Write is 1.25× or 2×. But once cached, a hit is free.
Justy: Free? So why isn't everyone just... caching everything?
Cody: Because you have to lock your tools and model at session start. Add a tool mid-session or switch models—the cached prefix dies. You're back to paying full price. Most people don't even know it's happening.
Justy: That's a brutal gotcha. So the setup is the trap.
Cody: Exactly. The author got ~90% cache hit rate by not touching tools or models once the session started. The other big move is disabling 1M context. Cap it at 200K, compact early. For parallel work, spawn subagents in cheaper models. Haiku for grunt work, Sonnet for research, Opus for planning.
Justy: Right. But adoption barrier is real. Most people don't think about prompt caching. They hit a bill shock and either switch tools or accept it.
Cody: True. But the author gives you copy-paste config blocks. Set CLAUDE_CODE_DISABLE_1M_CONTEXT=1, lock your tools, use subagents. For inputs, swap screenshots for agent-browser—it returns the accessibility tree instead. ~90% fewer tokens. Or use pdftotext instead of Claude's PDF reader.
Justy: So if someone's running Claude Code right now and seeing spend creep, where do they start?
Cody: Watch the cache hit rate in session stats. If it's below 75%, you're invalidating the prefix. Lock your tools and model at start, compact early, and route subtasks to cheaper models. Then swap your inputs—agent-browser for web scraping, pdftotext for PDFs. Those two things alone probably drop your bill 40–50%.
Justy: And if someone wants to experiment over a weekend?
Cody: Grab agent-browser from npm, set up one CLAUDE.md task-delegation block in a project, and run a long session with subagents instead of everything in the parent. You'll see the token count drop immediately. Or set up the 1M context disable and compact threshold in your env, then run the same workflow twice and compare.
Justy: Alright. If you're hitting Claude Code limits, Cody's right—the fixes are on your side. Go lock your setup.