AI does exactly what you ask — that's the problem

I asked Claude to add pagination to a product list. Response in 30 seconds: clean, functional, complete. And completely disconnected from the rest of the app. Wrong pagination component (we already had one), invented styles, existing filters broken. Technically correct. Unusable as-is.

The problem wasn't the AI — it was my prompt. I wrote: "Add pagination to this list." That's exactly what it did. Nothing more, nothing less.

Current models (Claude 4.x, GPT-4o) have dropped the "infer intent" behavior. They take prompts literally. That's progress overall, but it fundamentally changes how you need to prompt for code. A good code prompt isn't about tricks — it's about giving the AI the same context you'd give a junior dev joining the project.

I tested and scored dozens of formulations across the four most common coding tasks. Here's what actually works.

The structure that applies to everything

Before getting into specific cases, there are four elements present in every good code prompt:

Stack/context — language, version, framework, relevant files
Precise task — what you want, phrased as an instruction, not a question
Constraints — what NOT to touch, scope limits
Expected output — diff, full code, explanation only, CVSS score…

Anthropic's golden rule sums it up: show your prompt to a colleague with no context. If they'd be confused, the AI will be too.

For larger prompts (multiple files, complex instructions), XML tags help separate sections. Tests show 30–39% improvement in response quality when prompts are structured with clear tags:


<context>PHP 8.1, Laravel 10, multi-tenant app</context>
<task>Add pagination to ProductList</task>
<constraints>Do not touch getFilteredProducts()</constraints>
<code>[paste files here]</code>

Second rule: code first, question last. Placing context and files at the top with the instruction at the bottom improves precision by ~30% on large contexts. The AI reads everything before acting.

Bug fix — describe the symptom, not the theory

The classic mistake: paste code and ask "this doesn't work." The AI doesn't know if it's crashing, returning a wrong value, or just too slow. It will invent a plausible problem and "fix" it.

❌ Vague prompt — score: 3/10


This code doesn't work, help me.

function getUserById($id) {
    $result = $db->query("SELECT * FROM users WHERE id = $id");
    return $result->fetch();
}

Typical result: the AI fixes the SQL injection (fair, real issue), ignores the actual error, and rewrites with PDO prepared statements. Technically sound, but not what you needed.

✅ Precise prompt — score: 9/10


Stack: PHP 8.1 + PDO, Laravel 10.

Current behavior:
  Call to a member function fetch() on bool
  → only when the ID doesn't exist in the table

Expected behavior:
  Return null if user doesn't exist (no exception thrown)

Already tried:
  isset($result) before fetch() — same error

Code:
function getUserById($id) {
    $result = $db->query("SELECT * FROM users WHERE id = $id");
    return $result->fetch();
}

The "already tried" line is critical. Without it, the AI will re-suggest exactly what you already attempted. With it, it looks elsewhere — and finds that PDO::query() returns false on failure, so the fix is $result !== false ? $result->fetch() : null.

Bug fix template:


Stack: [language + version + framework]
Current behavior: [exact symptom — error message, wrong return value, observed behavior]
Expected behavior: [what the code should do]
Already tried: [previous attempts and why they didn't work]
Code: [minimal code that reproduces the bug]

New feature — acceptance criteria and out-of-scope

"Add a comment system." The AI will choose a stack (database, probably), a UI (modals, inline, dedicated page?), validation (client-side, server-side?). Each of those choices may be incompatible with your project.

❌ Without integration context — score: 3/10


Add a comment system to this blog.

Likely result: MySQL solution, jQuery form, a design that matches nothing in the existing project.

✅ With context, criteria, and out-of-scope — score: 9/10


Stack: PHP 8 + Bootstrap 3, no database (JSON file storage).

Task: Add comments to blog articles.

Acceptance criteria:
- Form: name + message (no email, no account required)
- Server-side validation only (no JS)
- Storage: one JSON file per article at /blog/comments/{slug}.json
- Email notification to author on each new comment (PHPMailer already installed)

Out of scope:
- No moderation for now
- No nested replies
- No changes to existing CSS

Files involved: [blog_footer.php, template.php]

The out-of-scope section is the key difference. Telling the AI what not to implement is just as important as what to build. Without it, the AI will either over-engineer (moderation, auth, nested threads) or invent constraints that don't exist.

New feature template:


Stack: [technical context]
Integration context: [what it must integrate with — existing components, project patterns]
Task: [precise feature description]
Acceptance criteria: [list of expected behaviors]
Out of scope: [what we do NOT want implemented now]
Files involved: [files to modify or create]

Refactoring — define the problem, not the solution

"Refactor this code" is the worst possible refactoring prompt. The AI will choose its own quality criteria, likely change method signatures, add unsolicited abstractions, and break the existing public API.

❌ Without goal or constraints — score: 2/10


Refactor this code to make it cleaner.

[450 lines of OrderService]

Typical result: 8 extracted private methods, renames, interfaces added, an abstract class "for future flexibility." It compiles. Tests break.

✅ Concrete problem + goal + constraints — score: 9/10


Concrete problem:
  OrderService::processOrder() (450 lines) is called from 8 places
  with different parameter combinations.
  Impossible to know which cases are covered by existing tests.

Goal:
  Extract pricing rules into immutable Value Objects.
  One class = one rule (e.g., DiscountRule, TaxRule, ShippingRule).

Hard constraints:
  - Observable behavior unchanged — existing tests must pass without modification
  - Do not modify public method signatures
  - Do not change return types

Scope: pricing calculation only, not persistence or validation.

Code: [OrderService.php]

The key phrase: "observable behavior unchanged — existing tests must pass without modification." Without this constraint, the AI optimizes by its own criteria. With it, it stays on track.

Refactoring template:


Concrete problem: [why this code is problematic — not "it's messy", but the real impact]
Goal: [what we want after the refactoring]
Hard constraints:
  - [what must not change: signatures, tests, observable behavior]
Scope: [what's IN and what's OUT]
Code: [the code to refactor]

Code review — choose your angle

Without a focus, the AI runs a generic style review. It will flag missing docstrings, suggest more descriptive variable names, and miss the SQL injection sitting at the bottom of the file. AI code review only pays off if you tell it where to focus.

❌ Generic review — score: 4/10


Review this code as a senior dev and give me improvement suggestions.

Typical result: 80% style suggestions, naming, comments. 20% real issues — buried in noise.

✅ Targeted angle + intentional decisions — score: 9/10


Context:
  Public endpoint POST /api/documents/upload
  Unfiltered incoming data, filesystem access, JWT auth verified upstream.

Review focus: security only
  - Injection (command, SQL, path)
  - Path traversal
  - Malicious file uploads (server-side execution)
  - IDOR

Intentional decisions — do not flag:
  - The (int) cast on ID is deliberate
  - Minimal error handling is intentional (internal app, centralized logs)
  - Variable name $tmp follows team convention

Output format:
  For each issue: severity (critical/high/medium) + description + fix with code.

Code: [upload-handler.php]

The "intentional decisions" section cuts noise in half. The AI doesn't flag what you already know about — it focuses on what you asked for.

Code review template:


Context: [endpoint type, who calls it, input data, auth in place]
Focus: [security / performance / business logic / architecture — one angle at a time]
Intentional decisions (do not flag): [deliberate choices in the code]
Output format: [severity + description + fix with code / simple list / …]
Code: [the code to review]

Two cross-cutting techniques that change everything

1. Two passes beat one

For complex tasks (large refactors, deep reviews), separating critique from implementation consistently outperforms asking for everything at once:

Pass 1: "What's wrong with this code? List the problems without fixing anything."
Pass 2: "Now fix problems 1, 2, and 4 you identified. Skip 3."

The separation forces the AI to analyze before acting. Second-pass results are significantly more precise.

2. "Suggest" vs "Change" — it matters now

With Claude 4.x, this distinction became literal:

Can you suggest some changes? → AI lists suggestions, touches nothing
Change this function to improve its performance. → AI modifies

If you want the AI to act: use action verbs. "Modify", "Fix", "Extract", "Rename". Not "Could you maybe look at whether...".

Going further: skills that auto-trigger

The templates above are still manual — you choose when to apply them. The next step: create skills (custom slash commands) that automatically load the right template when the context matches.

In Claude Code, a skill is a markdown file with two things: a description (trigger conditions) and content (what Claude should do). Every message, Claude reads the list of available skills and invokes the relevant ones — without you typing the command.

Identify your patterns from git log

Before creating generic skills, audit what you actually do. Your git log doesn't lie:


git log --oneline -50 | cut -d' ' -f2- | sort | uniq -c | sort -rn | head -20

On a real project, this kind of analysis immediately reveals recurring tasks. For example, on this portfolio:


8  fix(veille): ...       → Node.js watch system — frequent bugs, specific architecture
5  feat(blog): ...        → article creation always in the same order (FR → EN → context → OG → deploy)
3  fix(blog): ...         → post-publication fixes (typos, slugs, PHP syntax)
2  refactor(veille): ...  → refactors of the same subsystem

These patterns tell you exactly what skills to create. Not generic "bug fix" skills — skills adapted to your project: the exact sequence for an article, which files to read first when debugging the watch system, the specific constraints for the refactor.

Structure of an auto-triggering skill

A Claude Code skill is a markdown file in ~/.claude/plugins/<name>/skills/. The key: a precise description of trigger conditions.


---
name: blog-article
description: >
  Use when the user asks to create, write or publish a blog article.
  Also trigger on: "new article", "write an article about X", "post on LinkedIn".
---

Execution order for an article (FR + EN):
1. Create blog/posts/<slug>.php (FR version)
2. Create blog/posts/<slug>.en.php (EN version)
3. Update blog/posts.json (FR + EN entry, first position)
4. npm run og <slug>
5. php -l blog/posts/<slug>.php && php -l blog/posts/<slug>.en.php
6. Commit + push
7. node scripts/publish-article.js <slug>

This skill auto-triggers whenever an article is mentioned. The description isn't documentation — it's a detection pattern. The more precise it is, the fewer false positives.

The real advantage: embedded project context

A generic "bug fix" skill asks the same questions every time. A project skill reads the right files first:


---
name: veille-debug
description: >
  Use for any bug in the automated watch system (Node.js).
  Trigger on: watch errors, jobs not running, articles not generated.
---

Before diagnosing, read in this order:
1. scripts/veille/registry.json (configured jobs)
2. logs/veille-daemon.log (last execution)
3. scripts/veille/runner.js (general architecture)

Known failure points:
- Claude API timeout → increase in job config
- Slug regex → test with scripts/veille/test-slug.js
- Corrupt updates.json → delete, system recreates it

That's the difference between a generic template and a project skill: the skill knows where to look first on your codebase. It encodes accumulated project experience, not an abstract procedure.

Summary: the 4 templates

Task	Essential elements	Most common mistake
Bug fix	Exact symptom + expected behavior + "already tried"	Forgetting previous attempts
New feature	Integration context + criteria + out-of-scope	Not defining out-of-scope
Refactoring	Concrete problem + goal + behavioral constraints	No "tests must pass" constraint
Code review	Targeted angle + intentional decisions + output format	Generic review without focus

Conclusion

The real value of these templates isn't about the AI — it's about you. Writing "current behavior vs expected behavior" forces you to precisely characterize the bug. Writing out-of-scope forces you to decide what isn't a priority. Writing refactoring constraints forces you to define what must not move.

Multiple times, while drafting a structured prompt, I realized the task itself was poorly defined from the start. The AI didn't need to run — the prompt had already done the work.

AI doesn't solve the problem for you. It solves the problem you described. You might as well make that description accurate.