I started with a single crypto-veille.js script. 400 lines, everything hardcoded:
categories, the prompt, FTP logic. It worked. Then I wanted the same thing to track the Epstein
case, retro console releases, and tech/AI news. Copy-pasting a 400-line script four times with
variations? Not my idea of a good time.
The exercise took three days of iterative development (with Claude Code as a pair programmer) and produced something interesting: a generic architecture driven by a central configuration file, a reusable Node.js runner, and a PHP rendering pattern that cleanly separates data from presentation. What I learned along the way — especially the bugs that cost the most time — is worth documenting.
The architecture in one sentence
A Node.js daemon polls registry.json every minute, decides which topics are due,
launches run-veille.js --slug <slug>, which calls Claude CLI with WebSearch,
deduplicates results, persists them as JSON, then calls Claude again (without WebSearch) to patch
a structured article.json file. PHP reads these JSON files and generates HTML on the fly.
[veille-daemon.js] ← runs continuously (systemd service)
↓ every minute: compare frequency_hours
↓ if cycle due → launch run-veille.js --slug <slug>
[run-veille.js]
1. load registry.json (central config)
2. claude --print --allowedTools WebSearch,WebFetch → items[] JSON
3. deduplication via SHA256 of normalized title
4. merge + prune → updates.json (atomic write)
5. if render_prompt → claude --print (no WebSearch) → patch article.json
6. if summary_weekly_day → summaries.json
[PHP veille/<slug>/index.php]
include _veille-page.php → reads updates.json + article.json → HTML
The key element: everything that varies between topics is in registry.json.
The Node.js code is generic. Adding a new topic requires zero changes to the runner.
registry.json — the heart of the system
Each topic is an entry in uploads/veille/registry.json:
{
"veilles": {
"retro": {
"slug": "retro",
"label": "Retro Consoles",
"frequency_hours": 168,
"prune_days": 180,
"categories": ["NOUVELLE_CONSOLE", "FIRMWARE", "BON_PLAN"],
"ftp_files": [
{ "local": "uploads/veille/retro/updates.json", "remote": "/www/uploads/veille/retro/updates.json" },
{ "local": "uploads/veille/retro/article.json", "remote": "/www/uploads/veille/retro/article.json" }
],
"prompt": "You are an expert on the retro console market...",
"render_prompt": "Update the JSON guide. Return only a patch..."
}
}
}
ftp_files[0] is always the raw news feed (updates.json).
ftp_files[1+] are secondary files. render_prompt is optional —
only for topics that have a structured article in addition to the news feed.
The article.json pattern — two Claude passes
The system distinguishes between two types of output files:
updates.json: chronological items feed (news, events). Append-only, automatic pruning.article.json: a structured document enriched each cycle. Contains rankings, prices, biographies, analyses — everything that can't be reduced to "here are the latest news".
For article.json, pass 1 (WebSearch) collects raw data. Pass 2
(without WebSearch) receives that data plus the current state of
article.json and returns a partial JSON patch — only the
changed fields. The runner merges this patch with the existing document.
// renderArticle() — simplified
function renderArticle(renderPrompt, inputData) {
const currentArticle = readJSON('article.json');
const claudeInput = renderPrompt
+ '\n\nNEW INFORMATION:\n' + JSON.stringify(inputData)
+ '\n\nCURRENT DATA (article.json):\n' + JSON.stringify(currentArticle);
// Claude without WebSearch — works on data passed in context
const patch = callClaude(claudeInput, { noTools: true });
// Merge patch → existing article
const merged = { ...currentArticle, ...patch, last_updated: new Date().toISOString() };
writeAtomically('article.json', merged);
}
The PHP side: 3 lines per topic
Each veille/<slug>/index.php does exactly this:
<?php
$veille_slug = 'retro';
include __DIR__ . '/../_veille-page.php';
_veille-page.php handles the shared structure (header, filters, news feed).
Each topic has its own _article.php that reads article.json and
displays the rich section: console rankings for retro, judicial timeline for
epstein, AI model benchmarks for techno.
The bugs that cost the most time
Bug 1 — The secondary files loop was overwriting article.json
After each cycle, run-veille.js loops over ftp_files[1+] to write
top-level fields from the current cycle (e.g. current_prices,
market_note). The problem: this loop ran before
renderArticle(). It was overwriting article.json with partial cycle
data — stripping the complete console ranking that renderArticle was about to
enrich 30 seconds later.
The symptom: the retro page displayed empty after every cycle. Diagnosis came from reading log timestamps:
[17:48:42] ✓ Secondary file written → uploads/veille/retro/article.json
[17:49:06] [render:NEW INFORMATION] ✓ Done → uploads/veille/retro/article.json
Secondary file at 17:48 → render at 17:49. The render writes based on what it received as context (the overwritten version), not the full version. Fix in one line:
// Line 220 of run-veille.js
if (f.local.endsWith('/article.json')) continue; // managed by renderArticle(), not here
Bug 2 — The daemon was overwriting registry.json
veille-daemon.js compares updated_at (local vs OVH). If the remote
version is more recent or equal, it overwrites the local file.
Consequence: I modify registry.json locally, deploy it to OVH, but both versions
have the same updated_at. At the next polling cycle, the daemon fetches the OVH
version and overwrites the local one — losing my changes.
Fix: always bump updated_at to new Date().toISOString() on every
local modification before deploying. This is an operational constraint that must never be
forgotten.
Bug 3 — The render_prompt wasn't updating prices
The retro topic collects current console prices each cycle (current_prices).
The render_prompt was supposed to apply them to the ranking. In practice, Claude returned
a short patch with only market_note and highlights — without the
consoles[] array.
Cause: the render_prompt was ambiguous. Claude optimized its output and omitted "unchanged" fields. Fix: add an explicit rule in the prompt:
ABSOLUTE RULE — PRICES: If current_prices is present in the NEW INFORMATION (even partially), you MUST MANDATORILY include the "consoles" field in the patch with the COMPLETE array of consoles and updated prices.
The render_prompt has no access to WebSearch. It works only with data passed in context. What Claude can't find in pass 1, it can't find in pass 2. This must be clearly stated in the prompt to avoid incorrect expectations.
Bug 4 — Invented image URLs
The render_prompt asked Claude to find "a direct URL to an official product photo". Claude fabricated plausible-looking URLs — all returned 404. No validation mechanism in the pipeline.
Manual fix: search on official sites (anbernic.com, trimui.com, goretroid.com), verify each
URL with curl -I, inject directly into article.json. The real lesson:
don't delegate asset lookup to the render pass (which has no WebSearch). If an image URL is
critical, it must be collected during pass 1 (with WebSearch).
Planned improvements
-
Image URL validation. Nothing currently verifies that URLs in
article.jsonreturn a real image. An HTTP 200 check at merge time would eliminate silent 404s. -
Price history. For the retro topic, storing a
{date, price}array per console would allow displaying a price evolution chart. The data is collected every cycle — it's just not persisted yet. - Significant delta alerts. If a console drops more than 20% between cycles, trigger a notification. The data is there, the diff logic exists — the trigger is missing.
-
Admin UI for adding topics. Currently adding a topic means editing
registry.json, creating three PHP files, and adding two routes. A simple form → file generation UI would make this accessible without touching code. - Retry on render pass. Pass 1 has a retry. Pass 2 doesn't — if Claude returns invalid JSON, the article isn't updated and no alert is triggered.
Conclusion
The final architecture fits in a few files: a JSON registry, a generic Node.js runner (~450 lines), a polling daemon, and minimal PHP templates. The power comes from the separation between configuration (registry), collection (pass 1 with WebSearch), structured enrichment (pass 2 without WebSearch), and rendering (pure PHP).
What started as a crypto-specific script became a platform where any monitoring topic can be
deployed in 30 minutes: create the registry entry, write a prompt, define the
article.json structure, and write the PHP template. The rest — deduplication,
retry, FTP sync, pruning, periodic summaries — is provided by the runner for free.
The dashboards are live at web-developpeur.com/veille/.