7 Token Hacks That Cut Claude Code Usage

Q: When should I use /compact vs /clear?

Use /compact when you want to continue the current task but reduce context bloat from earlier in the session. Use /clear when switching to a completely unrelated task where the existing context has no value. /compact preserves working state. /clear discards it.

Per Anthropic's June 2026 cost management documentation, the average Claude Code developer spends about $13 per active day, and 90% of users stay under $30. That cost isn't random. It comes from accumulated context, verbose output, and exploratory prompts that could have been avoided. These 7 habits address exactly that.

Where the $13/day goes, 7 habits that address it, and the savings rates that make them worth doing

Hack 1: /compact Is Your Biggest Single Lever

In 2026, Anthropic's Claude Code docs list /compact as the primary tool for managing context window growth in long sessions. It compresses your full conversation history into a tight summary, dropping accumulated token count without losing your working state.

Use it when sessions run long — after an hour of active work, or when you notice responses getting slower. The savings compound over a multi-hour session faster than any other change you can make. Type /compact in the session. That's the whole hack.

For even deeper cuts on repeated API calls outside Claude Code, prompt caching reduces input token costs by 90% on cached content.

Hack 2: CLAUDE.md Loads Context Once, Not on Every Prompt

Per Anthropic's Claude Code documentation (retrieved June 2026), placing a CLAUDE.md file at your project root lets Claude read project-specific instructions automatically at session start. You write them once. You stop repeating them at the top of every conversation.

The official docs recommend keeping CLAUDE.md under 200 lines. Every line loads into context on every message you send. A bloated CLAUDE.md increases your per-message cost across every session on that project.

Keep it tight. Only what Claude needs every session: project structure, stack, constraints, naming conventions. Move specialized workflow instructions to separate skill files that load on demand.

# CLAUDE.md example (keep it ruthlessly short)

## Stack
Next.js 16 App Router, TypeScript, Tailwind CSS v4, Supabase

## Project structure
src/app/ — routes, src/components/ — UI, src/lib/ — utils

## Rules
- No new npm packages without flagging first
- Server components only in page.tsx files
- All animations use IntersectionObserver pattern

Hack 3: Vague Prompts Are the Hidden Cost

"Fix my app" is not a prompt. It's an invitation for Claude to explore, guess context, and burn tokens before doing real work. "Fix the auth bug in src/auth/login.ts line 42 — JWT not validated" goes straight to the fix.

This isn't a style difference. Vague prompts front-load context gathering. Surgical prompts go straight to the answer.

Vague prompt	Surgical prompt
"Fix my app"	"Fix auth bug in `src/auth/login.ts` line 42. JWT not validated."
"Update the button"	"Update `/components/Button.tsx`. Change hover state to `bg-gray-100`."
"Something's wrong with the API"	"GET /api/users returns 500 when userId is undefined. Check `src/app/api/users/route.ts`."

Precision is the cheapest optimization here. No tooling, no setup, no API changes.

Try This

Before sending any prompt, ask: does Claude need to figure out what you mean, or is that already in the prompt? If Claude has to guess, rewrite it.

Hack 4: Kill the Explanation

Claude Code defaults to explaining its reasoning. Useful when learning. Expensive when you just need the output.

End your prompts with "no explanation needed" or "just output the code." This cuts output tokens on every single request. On a day with fifty prompts, that adds up to real money on your bill. It costs you nothing in output quality on routine tasks.

Where to skip it: novel architecture decisions or complex debugging where seeing Claude's reasoning chain helps you catch wrong assumptions. For anything routine (refactoring, simple fixes, code generation with clear specs), default to skipping the explanation.

Hack 5: /clear Between Unrelated Tasks

Carrying context from one task into an unrelated one doesn't help Claude. It adds tokens that have no bearing on the new problem.

Use /clear whenever you shift to a genuinely different task. Reviewing a PR, then switching to build a new feature, then debugging an unrelated module: each is a clear candidate for a fresh session.

One command, clean slate. The session that was helping you debug authentication has no business being present while you write a data migration.

Hack 6: Reference Files, Not Directories

"Update the button component" forces Claude to scan the directory, identify candidates, and load context for multiple files before settling on the one you meant. "Update /components/Button.tsx" goes directly there.

The extra context from a directory scan isn't neutral. It costs tokens, it costs latency, and it occasionally introduces noise from adjacent files that Claude then has to reason around.

Exact paths remove all of that. Make it a habit: always give Claude the exact file path. Every time.

Hack 7: Forced-Choice Questions Over Open Questions

"What should I do about the database schema?" opens an exploration. Claude will weigh options, cover trade-offs, and produce a multi-paragraph response covering angles you didn't ask about.

"Should I normalize this to 3NF or keep it denormalized for read performance?" gets a direct answer.

Forced-choice questions define the decision space before Claude starts generating. Constrained questions remove the exploratory overhead without affecting the quality of the answer.

When you know the options, put them in the prompt. This applies to architecture decisions, debugging strategies, code style, anything where you already have the candidates in mind. The batch and model routing guide uses the same principle at the API level: constrain the problem, reduce the cost.

What These Hacks Save in Practice

In 2026, Anthropic's cost docs show that prompt cache reads cost 10% of standard input price — $0.30/MTok vs $3.00/MTok for Sonnet 4.6 (Anthropic pricing, June 2026). Anything that reduces context reprocessing (like /compact and tight CLAUDE.md files) compounds at that same 90% discount rate.

Agent team mode is the other extreme. Per Anthropic's cost docs, each agent runs its own full context window as a separate Claude instance, using approximately 7x more tokens than single-agent sessions. Every bad habit above gets multiplied by 7 in agent mode. These hacks matter most there.

None of the seven changes affect what you build. They only change what you send, which is a meaningfully different ask.

For a full breakdown of API-level cost reduction including the Batch API and model routing, see the Claude API cost reduction guide.

For the broader picture on Claude's features from Projects to extended thinking, the 11 Claude AI Things guide covers it all.

Note

This is part of an ongoing token saving series. The next post covers better alternatives to some of the commands above, including smarter replacements for /compact and "kill the explanation" that work in more situations. Subscribe below to get it when it drops.

Frequently Asked Questions

When should I use /compact vs /clear?

Use /compact when you want to continue the current task but reduce context bloat from earlier in the session. Use /clear when switching to a completely unrelated task where the existing context has no value. /compact preserves working state. /clear discards it.

How long should a CLAUDE.md file be?

Anthropic's official docs recommend under 200 lines. In practice, 50-100 lines covers most projects well. Only include information Claude needs on every session: stack, structure, constraints, conventions. Move specialized instructions to separate skill files that load on demand rather than on every message.

Does "no explanation needed" ever hurt output quality?

On routine tasks like refactoring, simple fixes, or code generation with clear specs, no. Where it can backfire: novel architecture decisions or complex debugging where Claude's reasoning chain helps you spot wrong assumptions. Default to it on anything routine. Skip it when you need to see the thinking.

Do these hacks work with Claude Code agent mode?

Yes, and they matter more there. Per Anthropic's cost documentation (June 2026), agent teams use approximately 7x more tokens than single-agent sessions because each sub-agent runs its own full context window. Tight prompts, exact file paths, and /compact on the orchestrator session all reduce the base cost that gets multiplied across every agent.

What's the fastest win from this list?

Adding "no explanation needed" to prompts you send more than five times a day. It takes two seconds, cuts output tokens on each one, and requires no change to how you work. Start there, then tackle CLAUDE.md once you see the bill drop.

Sources

Anthropic, "Manage costs effectively — Claude Code," retrieved June 2026 - code.claude.com/docs/en/costs
Anthropic, "Models Overview and Pricing," retrieved June 2026 - platform.claude.com/docs/en/about-claude/pricing