For the first few weeks I used Claude Code seriously, I focused on keeping my prompts tight. Shorter questions, less context, more specific requests. I thought that was where the savings were. Then I actually read Anthropic's pricing page. Output tokens on Claude Sonnet 4.6 cost $15 per million. Input tokens cost $3. I was trimming the cheap side while Claude's verbose-by-default responses ran up the expensive side at a 5x premium.
Two skills changed how I work: caveman mode, which I now type before I've even written my first question in any session, and session handoff, which stopped me from paying for the same context two or three sessions in a row.
Neither skill ships with Claude Code by default. Both are community skills you install manually in under a minute. Here's how to install them if you haven't already.
The Gap Nobody Talks About
In June 2026, Anthropic's pricing page puts Claude Sonnet 4.6 at $3 per million input tokens and $15 per million output tokens. Five times more expensive to receive than to send. And the default Claude Code behavior is to explain things thoroughly, wrap up each answer with a summary, hedge when uncertain, and add a friendly closing. None of that is wrong. It's how the model was trained to be helpful. But you're paying for every word of it at the $15 rate.
Most token advice targets the input side: write shorter prompts, tighten your CLAUDE.md, use /compact before context bloats. That advice is real and worth following — see the full breakdown of input-side token hacks. But it's attacking the cheaper half of the equation. If you want to move the needle on your bill, the output side is where to start.
Caveman Mode: This One Is My Personal Favourite
I type /caveman before I've asked my first question. Every single session. It takes one second and it changes the shape of every response I get for the rest of that conversation.
What caveman mode does is strip Claude's output down to pure signal. The answer, nothing else. No "Great question!", no preamble warming you up to the actual information, no summary at the end restating what it just said. The technical substance is completely untouched. Code comes back exactly as it would normally. Error messages are quoted precisely. What disappears is everything around that substance that you were reading past anyway.
Here's a real example. Same question, both modes:
Default Claude:
"Certainly! I'd be happy to help with that. The issue you're encountering is most likely related to how the authentication middleware handles token expiry. When a token reaches its expiry timestamp, the middleware needs to compare the values correctly. Based on the code you've shared, it looks like the comparison is using
<instead of<=, which means a token that expires at exactly the current timestamp will slip through. Here's how to fix it..."
Caveman full:
"Auth middleware. Token expiry check uses
<not<=. Expired-at-boundary token passes. Fix:"
Same information. 71 words versus 13. You pay $15 per million for both versions.
The skill has three levels you can dial between. Full is what I use daily: drops articles and filler, keeps complete technical reasoning. Ultra goes further, abbreviates everything, uses arrows for cause and effect. I switch to ultra when I'm deep in a debugging loop and need raw signal. Lite is closer to a tighter professional tone, useful if I'm copying a response somewhere else.
Activate it: /caveman. Switch: /caveman ultra or /caveman lite. Turn off: stop caveman. One thing I appreciated when I found this: the skill automatically drops caveman mode for anything irreversible. Security warnings and destructive operation confirmations come back in full, clear English. Then it resumes. That's a thoughtful default.
Type /caveman as your very first message in any new Claude Code session, before you've asked anything else. It sets the register for the entire conversation and compounds across every response.
Why Long Sessions Get Expensive
This one took me longer to understand than it should have.
Every message in a Claude Code session re-sends the full conversation history as input, not a compressed version. The whole thing, from message one. Message 10 in a session sends the content of messages 1 through 9, plus your new message. By message 40, you're paying for the accumulated weight of everything said before, including the debugging tangent you went down in message 12 that turned out to be a dead end, and the three long explanations Claude gave before you found the right approach.
Anthropic's compaction docs put the auto-compaction trigger at 150,000 input tokens. At Sonnet 4.6 rates, 150,000 input tokens is $0.45 in context alone, before any output. And you hit that threshold in a serious session faster than you'd expect.
I had a four-hour session working through a refactor. By the end, I was burning tokens re-explaining things we'd covered in the first hour. When I closed that session and came back the next morning, I had to explain the entire project state from scratch. That's when I found session handoff.
Session Handoff: My Mornings Improved
Session handoff creates a focused document with what a fresh Claude Code session needs to continue your work. Not a transcript of everything that happened. A structured snapshot: what's done, what's next, which files matter, what decisions were made and why, where the blockers are.
When you come back, you load that document into a new session. Instead of starting from zero or starting from a bloated history full of wrong turns, you start with a few hundred tokens of precise context and pick up exactly where you left off.
A four-hour session's worth of context, compressed into a focused handoff document, costs a fraction of re-sending that full conversation. And the fresh session doesn't carry the drift that long sessions accumulate, where the context gets noisy and responses start hedging.
The workflow is two commands.
Before you close out:
/session-handoff
The skill generates a structured document: timestamp, current git state, modified files, decisions and their reasoning, blockers, and immediate next steps. The mechanical parts are auto-populated. You add any judgment calls that wouldn't be obvious from the file diff, then close the session.
When you come back:
Open a fresh Claude Code window. Run /clear to start with a clean context. Then send this:
"Hey, let's continue where we left off. Check the handoff and let me know."
Claude reads the handoff document, gives you a quick state-of-play summary, and you're back in. No re-explaining the project, no dead ends from the previous session filling your context. There's a staleness check built in: if the codebase changed significantly since you made the handoff, it flags that before you resume so you don't start from outdated state.
For API-level savings on repeated inputs, prompt caching is the companion technique worth knowing.
Session handoff and /compact solve different problems. /compact squishes your current session history and keeps going in the same conversation. Use that mid-session when things are getting long. Session handoff creates a document for a completely new session. Use it when you're done for now and coming back later.
Running Both Together
My session startup takes about 30 seconds now.
Open Claude Code. Type /caveman. If I'm continuing a project, load the handoff from the last session. Adjust intensity if needed: /caveman lite if I'll want to share responses, /caveman ultra for a pure debugging session. Then work.
Before I close out for the day, if the session went longer than an hour or touched decisions I'd want to explain to myself tomorrow, I run /session-handoff. That's what I load next time.
They solve different problems. Caveman compresses what Claude generates within a session. Session handoff stops you paying to re-send all of that accumulated history into the next one. Together they cut both sides of the bill.
Anthropic's cost docs put the average Claude Code developer at $13 per active day, with 90% of users staying under $30 per active day. Those numbers reflect default behavior running unchecked, not a hard floor. These two habits moved mine faster than anything else I tried.
Frequently Asked Questions
Does caveman mode affect the accuracy of Claude's responses?
No. The rules for what gets removed are strict: only filler, hedging, and pleasantries. Code blocks, error messages, technical terms, and commands come back exactly as they would in a normal response. The answer doesn't change. The words around the answer do.
What's the actual difference between /compact and session handoff?
/compact summarises your current conversation and keeps going in the same session. It's for when a session is getting long but you're not done yet. Session handoff creates a structured document you load into a new session. It's for when you're finishing for the day and want to start clean tomorrow. They're not the same tool used in different places.
How often should I make a handoff document?
After any session longer than an hour, or any session where you made decisions you'd want to explain to your future self. A short focused session that touched one file probably doesn't need one. Any session involving architecture decisions, debugging detours, or significant file changes does.
What do I actually say when resuming with a session handoff?
Run /clear in a fresh Claude Code window, then send one message: "Hey, let's continue where we left off. Check the handoff and let me know." Claude reads the handoff document, summarises the current state, and picks up from there. That's the full resume sequence.
The default Claude Code experience is designed around thoroughness. Full explanations, careful caveats, friendly responses. That's appropriate for someone learning. At $13 a day for a developer who already knows what they're doing, most of it is noise you're paying to generate and then skip past.
Neither changes what Claude can do. What they change is how much of what comes back is actually worth reading. The savings follow.
For more Claude Code optimizations including /compact and prompt sculpting, see 7 Claude Code Token Hacks That Actually Cut Your Usage. For broader Claude tips and features worth knowing, the 11 Claude AI Things I Wish I Knew Earlier covers the full picture.