The moment I understood
I hit the rate limit in 45 minutes. Not on a complex task — on CSS. Claude had loaded 2 MB of runtime JSON data, the 300-line CLAUDE.md, and the history of a refactoring that had nothing to do with it. I was paying tokens for context that nobody had asked for.
My project — a PHP portfolio with an automated monitoring system in Node.js — isn't large. A few dozen code files. But the working directory also contained hundreds of uploaded JSONs, Playwright screenshots, temporary session plans, binary assets. Claude Code doesn't distinguish between a source file and a runtime artifact. It indexes everything, and loads what seems relevant. Often, it gets it wrong.
That day, I decided to understand where the tokens were going. Not by reading the documentation first — by observing what Claude was actually doing in my sessions. Here's what I found, in the order I discovered it.
First discovery: the .claudeignore
Looking at a session's logs, I saw Claude exploring uploads/ —
a folder of runtime data, hundreds of JSON files. It was searching for context
to understand the project structure. Except those files don't describe the
project structure. They're data. There's nothing to learn from them for writing code.
The solution exists and it's trivial: a .claudeignore at the root,
same syntax as .gitignore.
# Runtime data — useless for coding
uploads/
scripts/.retro-state.json
scripts/.veille.lock
# Playwright
.playwright-mcp/
# Generated plans/specs
docs/superpowers/
# Binary assets
assets/images/
assets/plugins/
# Blog comments
blog/comments/
Before: Claude would regularly load files from
uploads/ at the start of a session, especially when the question was vague.
Over 2 MB of useless JSON in the context.
After: these files no longer exist for Claude. Automatic project exploration is limited to source files. This was the lever with the biggest impact on its own — and the simplest to implement.
Second discovery: the CLAUDE.md that grows without control
The .claudeignore had fixed the runtime data problem. But sessions were still expensive. Examining what was injected at startup, I realized my
CLAUDE.md was 260 lines. All injected in every session,
from the first exchange.
260 lines is the accumulation of weeks of work. Each time Claude made a recurring mistake, I'd add a rule. "Don't touch that file." "This service runs on port 3001." "The date format is ISO." Useful rules when I wrote them — but half concerned the Node.js monitoring system. When I worked on the blog's CSS, all that monitoring context was pure noise.
Claude Code supports @path syntax to import other
Markdown files. But the import is automatic — the referenced file is always loaded.
It's not conditional loading.
The real solution: split and don't import. I split it into two files:
CLAUDE.md— stack, deployment, LinkedIn bot. 24 lines.CLAUDE.veille.md— entire monitoring system. Loaded manually when needed.
In the main CLAUDE.md, just a note:
> Monitoring system: see `CLAUDE.veille.md`
When I work on monitoring, I tell Claude to "read CLAUDE.veille.md". The rest of the time, those 240 lines don't exist in the context.
The hard part isn't the splitting — it's knowing what to keep. I did a line-by-line audit: each rule classified as essential, deducible from code, obsolete, or out of scope. Everything deducible or obsolete goes straight away. The result: a 24-line CLAUDE.md that's sufficient for 80% of sessions.
The other lesson, less obvious: when a session reveals a real gotcha — a business constraint invisible in the code, a counter-intuitive behavior — encode it immediately in the right file. But with strict filtering: if it's deducible from existing files, or if it's an edge case that won't repeat, it has no place. CLAUDE.md doesn't grow because you add things to it — it grows because you don't apply filtering.
Third discovery: long sessions are expensive for nothing
At this point, I'd reduced what Claude loaded at startup. But I was still running 2-hour sessions where I'd chain unrelated tasks: a field in a form, a bug in the daemon, writing an article. The context accumulated everything.
The problem is arithmetic. Each exchange is billed on the total accumulated context. By the tenth exchange in a long session, the context contains files read for the nine previous exchanges — even if they have nothing to do with the current question.
The /clear command resets the context to zero. Not the session, just the
context. CLAUDE.md is reinjected, but everything else disappears.
Before: a 2-hour session, 15 exchanges, billed on context that only kept growing.
After: /clear whenever I switch tasks. Five 20-minute sessions instead of one 2-hour session. The same work, for a fraction of the cost.
And when I want to keep a summary of what was done before continuing on the same
subject: /compact. Claude compresses the context while keeping the essentials.
Less radical than /clear, but effective for long tasks with real continuity.
Fourth discovery: how you ask questions changes everything
Even with clean context, some questions cost 10x more than others for the same result. It took me a while to understand because the natural reflex is to ask broad questions.
"How does the monitoring system work?" — Claude will read 8 files, trace the architecture, and produce a 500-token explanation. If I only wanted to know how deduplication works, I paid for 7 useless files.
The efficient version: "in scripts/run-veille.js, how does SHA256
dedup work?" Claude reads one file, goes straight to the function, answers in
50 tokens.
Same logic for modifications. Instead of "I want to change the FTP loop
behavior", give the file and line: "in scripts/run-veille.js
line 180, I want the condition to also ignore .html files". Claude
makes a minimal diff without rereading the entire file.
It's a habit change. You move from "Claude, understand my project" to "Claude, modify this line in that file". Less conversational, but radically cheaper.
Bonus: Haiku for mechanics, Sonnet for everything else
Once these habits were in place, I tested one last lever: switching models based on the task. Claude Haiku costs roughly 20 times less than Sonnet at equivalent volume. For mechanical tasks — variable renaming, adding cases in a switch, fixing a typo — it does the job.
But I learned the hard way that the savings evaporate quickly. On a project with implicit coupling (JSON registry that drives a daemon that syncs via FTP), Haiku regularly misses a constraint and proposes a superficial fix. Three Haiku exchanges to fix what one Sonnet exchange would have resolved — the total cost is the same, the lost time isn't.
My current usage: /model haiku for factual questions ("where is
function X defined?", "what's the schema of this JSON?") and single-line
edits in isolated context. Sonnet for everything else.
And launching Claude from the right folder
One last detail that took me a while to internalize. Claude Code indexes the
directory it's launched from. From the project root, everything is available.
From veille/, only that folder is in the exploration scope.
For focused work on a module — monitoring admin, Node.js scripts, blog —
opening Claude from the right subdirectory mechanically reduces what it can
load inadvertently. It's complementary to .claudeignore, not a
replacement.
What it changed
None of these optimizations is dramatic on its own. The effect comes from their
combination: a 24-line CLAUDE.md instead of 260, a .claudeignore
that excludes 2 MB of runtime data, 20-minute sessions with /clear
between each task, questions pointing to a file and line instead of asking for
a global explanation.
The most counter-intuitive: long sessions feel productive because you're doing lots of things. In practice, five short sessions cost less and produce work that's just as clean — often better, because Claude doesn't have to manage context that's become incoherent as topics shift.
I haven't hit the rate limit on CSS again.