Claude AI

11 Claude AI Things I Wish I Knew Earlier

11 Claude AI tips from a founder's real workflow: persistent Projects memory, 200K context windows, 90% cost cuts via prompt caching, and more.

Most people use Claude at a fraction of what it can do. Claude Projects give persistent 200K context across sessions, prompt caching cuts API costs 90%, and model routing drops blended cost per request by up to 80%. These are the 11 things that actually changed how I work with it every day.


The Claude.ai Features Most People Skip

Tip 1: Projects give Claude a persistent memory

Most people treat Claude like a search engine. New tab, new question, move on. That means Claude starts from zero every single time. Projects change that.

In June 2024, Anthropic launched Claude Projects. Each project includes a persistent 200K context window, equivalent to a 500-page book. Files you upload, instructions you write, and conversations you have all persist inside that project.

Before Projects, every conversation I opened started cold. Claude didn't know my stack, my naming conventions, or what I'd shipped last week. Now I have a "Product Build" project that knows my stack, design tokens, and deployment process. I open it, describe what I want to build, and skip the 500-word context dump that used to open every conversation.

Try This

Create a separate Project for every ongoing area of work. One for product, one for content, one for client work. The setup takes five minutes. The payoff compounds over months.


Tip 2: The context window is bigger than you've tested

People split large documents across multiple conversations because they assume Claude can't hold much at once. That assumption was true two years ago. It isn't now.

As of June 2026, Anthropic's models overview lists Claude Haiku 4.5 at a 200K token context window, roughly 150,000 words. Claude Sonnet 4.6 and Opus both sit at 1M tokens. That's an entire mid-sized codebase in a single conversation.

If you're manually splitting a 50-page document into chunks and running separate queries, stop. Paste the whole thing. On Sonnet or Opus, you have room for roughly five of those documents simultaneously.

One real caveat: Claude weights the beginning and end of its context window more heavily than the middle. If you have critical instructions, put them at the top of your system prompt or first message. Don't bury them on page 12 of a 50-page paste and wonder why Claude ignored them.


Tip 3: Files in Projects are your knowledge base

Uploading your codebase before starting a new feature isn't optional if you want Claude to write code that actually fits your project. Without codebase context, Claude guesses at your patterns. And it's usually wrong in small, annoying ways.

When I started uploading the relevant modules from my codebase to a Project before any new feature work, the quality of first-draft code jumped. Claude can see my naming conventions, my actual types, my existing patterns. It stops hallucinating abstractions that don't match what I've already built.

A developer working on multiple screens in a dark office environment with coding interfaces

Watch Out

Files count against your context window. Don't upload your entire monorepo. Trim to the modules relevant to what you're building. For larger codebases, use a Sonnet or Opus project, which has a 1M token window.


Tip 4: Custom instructions change Claude's defaults, not just its output

Claude's default behavior is calibrated for a general audience. That's not you. You have a specific stack, a specific writing style, specific things you never want Claude to say.

Go to Settings, then "Claude.ai," and write your custom instructions. They apply to every conversation you open. Here's what I run: respond concisely, use markdown formatting, no conclusion paragraph unless explicitly asked, skip filler phrases like "certainly" or "of course."

If you write code: add your stack, your preferred libraries, your error-handling conventions. If you write content: add your tone, what you want avoided, your preferred structure. These instructions compound. Every conversation starts closer to what you actually want.


The API Moves That Cut My Costs 90%

Tip 5: Model selection is the fastest cost lever

Before you touch caching or batching, get your model routing right. The price spread between Claude models is large, and most people default to Sonnet for everything.

As of June 2026, Anthropic's models overview shows Claude Haiku 4.5 at $1 input / $5 output per million tokens. Sonnet 4.6 runs $3 / $15. Opus sits at $5 / $25. That's a 5x difference between Haiku and Opus.

Claude API Model Pricing per Million Tokens (June 2026)Input prices per MTok: Haiku $1, Sonnet $3, Opus $5. Output prices per MTok: Haiku $5, Sonnet $15, Opus $25. Source: Anthropic Models Overview, platform.claude.com/docs/en/about-claude/models/overview, June 2026.Claude Model Pricing per Million Tokens (June 2026)InputOutputHaiku 4.5Sonnet 4.6Opus$1$5$3$15$5$25Source: Anthropic Models Overview, platform.claude.com/docs/en/about-claude/models/overview, June 2026

The routing rule is simple. Haiku handles speed and scale: summarization, classification, simple Q&A, anything where you need volume and don't need deep reasoning. Sonnet handles most code and writing tasks. Opus handles hard strategic problems where multi-step reasoning actually matters.

For production pipelines, see how to combine this with batching in the Claude API Batch and Model Routing guide.

Computer screens displaying code with vibrant neon lighting in a modern developer workstation


Tip 6: Prompt caching can cut your API bill by 90%

This is the one I wish I'd found in month one instead of month seven.

In August 2025, Anthropic published detailed prompt caching benchmarks. Caching cuts costs up to 90% and latency up to 85% on long prompts. Cache reads cost 10% of the standard input price. To show what that means in practice: on a 100K-token book, latency dropped from 11.5 seconds to 2.4 seconds, a 79% reduction, alongside the 90% cost reduction.

The mechanism: add a cache_control parameter to your system prompt. Anthropic stores a hash of that content. Subsequent calls with the same prefix read from cache instead of reprocessing. You pay 1/10th the normal rate on the cached portion.

One threshold to know: in June 2026, Anthropic's caching documentation confirmed the minimum prompt length is 1,024 tokens for Sonnet and 4,096 tokens for Opus and Haiku 4.5. Below the threshold, caching is silently skipped. No error, no warning. Just a permanent zero in cache_read_input_tokens.

I dropped a client project from $0.48 to $0.043 per request by moving a 25K-token knowledge base into a cached system prompt. The code change was three lines. The cost change was 91%.

Result

Combined with model routing, prompt caching is the second lever that makes real production deployments affordable. Don't ship an API product without it. The full implementation guide with code examples is in Save Tokens with Claude: Prompt Caching Deep Dive.


Tip 7: The Batch API is for everything you're running synchronously

Most Claude API calls don't need an immediate response. Data enrichment, classification pipelines, content generation, nightly analysis. You're probably running these synchronously right now because that's the default. It's costing you twice as much as it needs to.

The Batch API processes requests within 24 hours and costs 50% less than real-time calls across all models. If you're running a loop of API calls today, you're paying a 2x premium for speed you don't need.

The switch is one endpoint change. You get results back in a file instead of a stream. For anything that runs in the background, it's the simplest discount available.

For the full implementation with polling code and the stacked savings math (caching plus batching gets you to 95% off), see the Claude API Batch and Model Routing guide.


The Builder Moves Most Tutorials Skip

Tip 8: Extended thinking isn't a gimmick

Extended thinking gives Claude up to 128K thinking tokens of internal reasoning before it responds. That's not a marketing number. It's the model working through a problem step-by-step before generating output, visible or not in your UI.

As of June 2026, Anthropic's models overview confirms the 128K thinking token ceiling. The jump in Claude 3.7 Sonnet's SWE-bench Verified score, reaching 63.7% standard and 70.3% with high-compute in February 2025, wasn't from a larger model. It was from giving the model room to think before answering.

Don't use it for product descriptions or simple rewrites. Use it for complex debugging sessions, mathematical proofs, architectural decisions with a lot of competing constraints, multi-step reasoning problems where getting the reasoning wrong cascades into a wrong answer.

My personal heuristic: if I'd give this problem to a smart person and expect them to spend 15 minutes thinking before answering, I use extended thinking. If it's a 10-second answer, I don't.


Tip 9: Claude can read images and PDFs without extra setup

No separate vision endpoint. No special SDK version. Same API call you already use.

As of June 2026, Anthropic's context window documentation confirms that a single API request can include up to 100 images or PDF pages on 200K-context models.

The use cases that changed my workflow: PDF invoice data extraction without custom parsers, screenshot-based error analysis from teammates who don't read logs, design mockup vs. spec comparison to find divergence, and extracting tables from scanned documents that defeated every other tool I tried.

If you're building a pipeline that extracts data from documents and you haven't tested Claude on it, test it. The accuracy on structured extraction from PDFs is genuinely good.

A hand checking off items on a to-do list representing organized workflow and task management


Tip 10: Your first message sets the quality floor

Most people write one-liners and get one-liner quality back. Then they conclude that Claude isn't that good. The model isn't the problem.

I ran the same code review twice on the same file with the same Claude Sonnet model. One-liner version: "review this code." Structured version: "You are a senior backend engineer reviewing Node.js. Stack: Express, Postgres, Railway. Task: find security vulnerabilities. Format: issue name, severity (high/medium/low), line number, fix." The one-liner found 3 issues. The structured version found 9, each with severity ratings and specific fixes. Same model, same code, same temperature.

A four-part setup that works every time: Role, Context, Task, Format. Tell Claude what role it's playing, give it the relevant context about your environment, define the specific task, and specify the output format you want. It takes 30 seconds to write. The quality difference is not subtle.

Short punchy output comes from short punchy prompts. Detailed, structured output comes from detailed, structured prompts. Claude follows what you model in the input.


Tip 11: Claude Code's benchmark numbers are a real reason to trust it on production work

Most people treat AI coding assistants as suggestion tools. You review every line, assume it's probably wrong, and use it for autocomplete. That mental model made sense in 2023. The benchmarks have moved.

In 2025, Anthropic Engineering published Claude Sonnet 4.5's SWE-bench Verified score: 77.2%. SWE-bench Verified uses real issues from real GitHub repositories with real test suites. Not a toy benchmark. Not curated examples. Real open-source issues that real developers filed.

77% means Claude resolves more than three out of four real GitHub issues it has never seen before.

That's not "AI can help with code." That's "AI can close tickets."

If you've stayed conservative about using Claude for production code because you didn't trust the output, the benchmark data is the argument worth reading. I'm not saying ship without review. I'm saying your review time should reflect a 77% baseline accuracy, not a 40% one.


Frequently Asked Questions

What is Claude Projects and how is it different from a regular conversation?

A regular Claude conversation has no memory of past sessions. Claude Projects, launched by Anthropic in June 2024, give you a persistent workspace with a 200K context window per project, equivalent to a 500-page book. Files you upload and instructions you write persist across all conversations inside that project. Regular conversations reset every time.

Does prompt caching work on all Claude models?

Yes, with different thresholds. Per Anthropic's June 2026 caching documentation, the minimum prompt length is 1,024 tokens for Sonnet models and 4,096 tokens for Opus and Haiku 4.5. Cache reads cost 10% of the standard input price. The cache TTL is 5 minutes for ephemeral caching. Below the token threshold, caching is silently skipped.

When should I use extended thinking?

Use it on multi-step reasoning problems where getting the reasoning chain wrong cascades into a wrong answer: complex debugging, mathematical proofs, architectural decisions with many competing constraints, or analysis tasks that require synthesizing conflicting information. Don't use it for simple generation tasks. In 2025, Anthropic reported that Claude 3.7 Sonnet reached 70.3% SWE-bench Verified with high-compute extended thinking, compared to 63.7% standard.

What's the difference between Claude Haiku, Sonnet, and Opus?

They're calibrated for different cost and capability tradeoffs. Per Anthropic's June 2026 models overview: Haiku 4.5 runs $1 / $5 per million input/output tokens, Sonnet 4.6 at $3 / $15, and Opus at $5 / $25. Route classification, extraction, and simple summarization to Haiku. Route most code and writing to Sonnet. Reserve Opus for hard reasoning problems only.

Can Claude read files directly in Claude.ai?

Yes. You can attach files to any conversation, and you can upload persistent files inside a Claude Project. The context window determines how much Claude can hold at once: 200K tokens for Haiku 4.5 projects, 1M tokens for Sonnet 4.6 and Opus projects per Anthropic's models overview. For very large codebases, use a Sonnet or Opus project to avoid hitting the limit.


Sources

  1. Anthropic, "Claude Projects," June 25, 2024 - anthropic.com/news/projects
  2. Anthropic, "Prompt Caching," August 14, 2025 - claude.com/blog/prompt-caching
  3. Anthropic, "Models Overview," retrieved June 2026 - platform.claude.com/docs/en/about-claude/models/overview
  4. Anthropic, "Context Windows," retrieved June 2026 - platform.claude.com/docs/en/build-with-claude/context-windows
  5. Anthropic, "Prompt Caching Docs," retrieved June 2026 - platform.claude.com/docs/en/build-with-claude/prompt-caching
  6. Anthropic, "Claude 3.7 Sonnet," February 24, 2025 - anthropic.com/news/claude-3-7-sonnet
  7. Anthropic Engineering, "SWE-bench Sonnet," 2025 - anthropic.com/engineering/swe-bench-sonnet
  8. Anthropic, "Extended Thinking," retrieved June 2026 - platform.claude.com/docs/en/about-claude/models/overview
  9. Anthropic, "Prompt Caching: Minimum Thresholds," retrieved June 2026 - platform.claude.com/docs/en/build-with-claude/prompt-caching
  10. Anthropic, "Model Pricing," retrieved June 2026 - platform.claude.com/docs/en/about-claude/models/overview