// blog

How to Save Tokens in Claude Code: 12 Techniques That Actually Work

Q: Does Plan Mode actually save tokens?

Yes. Plan Mode (/model opus-plan) routes planning through Opus and execution through Sonnet. Since execution typically consumes 70-80% of session tokens, switching that phase to Sonnet cuts your effective spend on the execution stage by roughly 75-80% while keeping Opus-quality planning.

Q: How often should I run /compact?

Run /compact yourself after any exploration phase, before you start executing. Don't wait for the automatic compaction at ~83.5% - by then you've already paid to re-read the noise on every turn. A good rule: if you've inspected 5+ files chasing a problem, compact before you switch into building.

Q: Will Sonnet do the same job as Opus?

For execution - writing tests, simple edits, mechanical refactors - Sonnet is usually fine and costs a fraction of Opus. For deep reasoning, multi-step planning, and architectural decisions, Opus is meaningfully better. The pragmatic default is Sonnet for execution, Opus for planning, switched via /model.

May 2026 · 12 min read

// tl;dr

You save tokens in Claude Code by reducing what Claude has to re-discover every session. The biggest wins are structural - a .claude/ directory with skills and memory files. The rest are in-session tactics: Plan Mode, proactive /compact, defaulting to Sonnet, and tighter prompts. Combined, they cut typical token usage by 60-80%.

I've been tracking my Claude Code token usage across real projects for the last few months. The thing that surprised me most: the biggest waste isn't bad prompting or the wrong model. It's that Claude starts every session blind, and most of your bill goes to it re-learning what it already knew yesterday.

Below are the twelve techniques that actually moved my numbers, roughly in order of impact. The first one is structural and saves the most. The other eleven are in-session tactics anyone can use today.

# the 12 techniques

01. Pre-build your project's intelligence layer

02. Default to Sonnet, escalate to Opus only when you need it

03. Use Plan Mode for anything non-trivial

04. Run /compact proactively, not reactively

05. Run /clear between unrelated tasks

06. Inspect with /context when sessions feel heavy

07. Be specific: paths, line numbers, function names

08. Delegate verbose work to subagents

09. Keep CLAUDE.md small and load-bearing

10. Compress CLI output before pasting

11. Lower /effort for mechanical tasks

12. Work in focused bursts to keep the cache warm

1. Pre-build your project's intelligence layer

This is the single highest-impact change you can make, and almost no one writes about it.

Claude doesn't remember previous sessions. Every time you open a project, it starts from scratch - reading your directory, inferring your conventions, re-discovering where your auth lives, re-figuring out that you use Prisma not Drizzle. On a fresh codebase, this orientation phase typically eats 8,000-12,000 tokens before Claude writes a single line. I tracked this across six projects and it consistently ran 60-70% of total token spend.

The fix is a .claude/ directory in your repo containing files written specifically for Claude to read at session start:

CODEBASE.md - a ~2,000-token map of your project: tree, data model, routes, integrations.
.claude/skills/ - task-specific instructions (“how to add an API route in this project”).
.claude/memory/ - architecture, conventions, and decisions Claude keeps forgetting.
.claude/patterns/ - reference implementations from your actual code.

Two to three hours to build well. Pays back from the second session. After I added one to my main project, orientation dropped from ~10,000 tokens per session to under 2,000. Same work, ~75% less cost.

If you want the full walkthrough with the actual benchmark, the receipts are in this post.

2. Default to Sonnet, escalate to Opus only when you need it

Most people run Opus by default and pay for it. Sonnet handles the majority of day-to-day coding - writing tests, simple edits, mechanical refactors, debugging - at a fraction of the cost. Save Opus for genuinely hard reasoning: architectural decisions, multi-step planning, gnarly debugging.

Switch mid-session with /model sonnet or set it as your default in /config. The discipline isn't “never use Opus” - it's “don't use Opus for work Sonnet can do.”

3. Use Plan Mode for anything non-trivial

/model opus-plan splits the cognitive labor: Opus plans, Sonnet executes. Since execution typically consumes 70-80% of session tokens, routing that phase through the cheaper model cuts execution spend by roughly 75-80% while keeping Opus-quality reasoning at the front.

The hidden benefit: plans force you to review before Claude spends. You catch “wait, that's not what I meant” at the planning stage instead of after Claude has trial-and-errored its way through three wrong implementations.

4. Run /compact proactively, not reactively

Claude Code auto-compacts at around 83.5% context usage. By that point you've already spent tokens re-reading all the noise on every turn - and one developer's analysis found 98.5% of his tokens were going to re-reading history, not generating output.

The fix: compact /compact yourself, after any exploration phase, before you start building. A useful rule of thumb: if you've inspected 5+ files chasing a lead, compact before you switch into execution. The summary keeps the decisions you made; the noise gets dropped.

5. Run /clear between unrelated tasks

When you switch contexts - done with the auth refactor, moving on to a new feature - stale context taxes every subsequent message. /clear is the cheapest reset you have. Use it the moment you change tracks. Don't carry yesterday's exploration into today's work.

6. Inspect with /context when sessions feel heavy

/context shows a category-by-category breakdown of what's occupying your window: system prompt, tool definitions, open files, conversation turns. Run it when a session feels slow or expensive. You'll usually find something surprising - a massive log Claude read three turns ago that's still riding along, or an MCP tool dumping verbose JSON on every call.

Treat it like a profiler. You can't fix waste you can't see.

7. Be specific: paths, line numbers, function names

Vague prompts trigger broad scanning. Specific prompts let Claude go straight to the code. Compare:

# vague - Claude searches

“fix the bug in the login page”

# specific - Claude jumps

“fix the null pointer in app/login/page.tsx around line 45 - handleSubmit is called before useState resolves”

The first prompt costs you a directory scan, three file reads, and an inference pass. The second costs you one targeted read. Same fix; very different bill.

8. Delegate verbose work to subagents

Running tests, fetching documentation, scanning logs - these produce enormous output that bloats your main context. A subagent runs the work in its own window and returns only the summary you need. The verbose middle stays out of your main thread.

Rule of thumb: if a task is going to produce more than a screen of output that you don't need to keep reading later, it belongs in a subagent.

9. Keep CLAUDE.md small and load-bearing

CLAUDE.md is read every session, forever. Every line you add is paid every session, forever. A bloated 5,000-token CLAUDE.md eats meaningful working context before you've done anything.

CLAUDE.md should give Claude the shape of your project, not the full documentation. Five rules and three file pointers is the right size. The deep knowledge lives in skills and memory files that Claude pulls in on demand - not in the always-on header.

Skip the setup, keep the savings.

LaunchPaid ships the full .claude/ intelligence layer pre-built - early bird up to 80% off.

Claim early bird

10. Compress CLI output before pasting

Test output, install logs, stack traces, scan results - these are often 90% noise. Don't paste the whole thing. Grep, head, tail, or just summarize the failure into a couple of lines before you ask Claude to act on it. 1,000 lines of npm install spam becomes 20 lines of “these three packages failed with this error.”

Tools like rtk automate this. So does a five-line shell function. Either way, the goal is the same: pay tokens on signal, not noise.

11. Lower /effort for mechanical tasks

Thinking tokens are billed at the output rate. For tasks that don't need deep reasoning - formatting, renaming, repetitive edits - lower the effort level with /effort or cap it via MAX_THINKING_TOKENS. Claude doesn't need to deliberate for ten thousand tokens to rename a variable.

12. Work in focused bursts to keep the cache warm

Claude Code uses a prompt cache to avoid re-billing identical prefixes. After Anthropic's recent change, the cache TTL is about five minutes. Take a 20-minute break mid-feature and the next turn has to rebuild the entire context - at the write rate, not the cheaper read rate.

One developer's tracked spend went from $6.28/day to $15.54/day after the cache change, driven entirely by missed cache hits. The workaround isn't magic: structure your work into focused bursts, do the boring “wait, what was I doing” recovery before your timer, or accept the miss as the cost of a long break.

FAQ

What wastes the most tokens in Claude Code?

Orientation - Claude re-reading your files, re-inferring your conventions, and re-discovering where things live at the start of every session. On most codebases this is 60-70% of your token spend. The fix is structural: pre-build a .claude/ directory so Claude starts every session already oriented.

Does Plan Mode actually save tokens?

Yes. Plan Mode (/model opus-plan) routes planning through Opus and execution through Sonnet. Since execution typically consumes 70-80% of session tokens, switching that phase to Sonnet cuts your effective spend on the execution stage by roughly 75-80% while preserving Opus-quality planning.

How often should I run /compact?

Run /compact yourself after any exploration phase, before you start executing. Don't wait for the automatic compaction at ~83.5% - by then you've already paid to re-read the noise on every turn. Rule of thumb: if you've inspected 5+ files chasing a problem, compact before you switch into building.

Will Sonnet do the same job as Opus?

For execution - tests, simple edits, mechanical refactors - Sonnet is usually fine and costs a fraction of Opus. For deep reasoning, multi-step planning, and architectural decisions, Opus is meaningfully better. The pragmatic default is Sonnet for execution, Opus for planning, switched via /model.

Will upgrading my Claude plan reduce my token usage?

No. Plan upgrades change rate caps and access, not how many tokens your work consumes. If your bill is too high, the fix is structural - reduce what Claude has to re-discover each session - not a higher subscription tier.

The pattern across all twelve techniques is the same: every token you save comes from making Claude do less unnecessary work. Less re-reading. Less re-inferring. Less wandering through your codebase trying to figure out the shape of things.

The in-session tactics - Plan Mode, Sonnet defaults, proactive compacts, specific prompts - get you most of the way there. But the biggest leverage is upstream of any session: a project that tells Claude what it needs to know, before it has to ask.

One more thing.

Everything in technique #1 - the .claude/ directory, the skills, the memory files, the CODEBASE.md snapshot - takes a couple of focused afternoons to build well from scratch. The DIY guide in my benchmark post walks through it.

If you'd rather not spend the afternoons, LaunchPaid is a full Next.js boilerplate with the complete .claude/ intelligence layer already built in - skills, memory, 30+ production patterns, and the codebase snapshot. Clone the repo, wire up your env vars, and Claude starts every session already oriented.

We're live. While we ramp, we're running an early bird discount of up to 80% off - that's $5 for the boilerplate, $8 with the community and all future updates. Once early bird ends, prices return to $15 / $30.

Claim early birdEarly bird pricing · limited time

// keep reading