My first spam comment: how I secured comments in 10 minutes

My blog is 3 months old. 50+ articles published, a database-free comment system, a honeypot anti-spam, CSRF protection. This morning, first comment. From RobertqueRy. On a gRPC Go article. Promoting an online gaming site in Bangladesh.

Welcome to the internet.

Anatomy of a spam comment

Here's what appeared in my comment JSON file, faithfully reproduced (link redacted):

RobertqueRy — March 21, 2026
"Players in Bangladesh are increasingly choosing [url][redacted][/url] for online gaming and rewards. The platform provides access to popular games like slots, rummy and aviator with a welcome bonus for new users. Visit the site to download the APK and start playing today."

The red flags stack up fast:

  • Username pattern: RobertqueRy — a real first name followed by a random string. Classic bot pattern, allows varied identities without a dictionary.
  • Generic content: "Great post! Very informative." works on any article on any topic. It's not even trying.
  • Off-target audience: "Players in Bangladesh" on a French technical blog about gRPC. The geographic targeting makes zero sense.
  • BBCode syntax: some spambots still try to inject [url=...]link[/url] even on sites that don't render BBCode.

And the stored JSON, exactly as it landed in the file:

{
    "id": "fb0c3b72",
    "author": "RobertqueRy",
    "date": "2026-03-21T15:45:52+01:00",
    "content": "Players in Bangladesh are increasingly choosing [url][redacted][/url]...",
    "ip_hash": "9b4be10615075cc7"
}

No gravatar. No previous interaction. Filed and gone in under a second — the timestamp shows the form was submitted at 3:45 PM, I noticed it an hour later.

Why honeypots aren't enough

The honeypot principle is simple: add a hidden field to the form. Humans don't see it (CSS: display: none), so they don't fill it in. Bots scan the DOM and fill in all fields. If the hidden field is populated, you reject the submission.

<!-- Honeypot field — hidden from real users -->
<div style="display:none" aria-hidden="true">
    <label for="website">Leave this field empty</label>
    <input type="text" id="website" name="website" tabindex="-1" autocomplete="off">
</div>

And the server-side check:

// Honeypot check
if (!empty($_POST['website'])) {
    http_response_code(400);
    exit(json_encode(['error' => 'Bot detected']));
}

Problem: modern bots don't blindly fill in all fields. They parse the DOM, detect hidden elements via CSS classes, inline styles, or aria-hidden attributes, and skip them. RobertqueRy left the honeypot field empty. Passed.

CSRF doesn't help either. The CSRF token proves the request came from my site — it doesn't prove it came from a human. A bot that visits the page first, extracts the token from the HTML, and replays it in the POST passes the CSRF check without issue.

Rate limiting doesn't help for a single isolated submission. I cap at 2 comments per IP per hour. The bot posted once and moved on. It didn't trigger any threshold.

The solution — server-side math captcha

The immediate reflex would be reCAPTCHA. I dismissed it for three reasons: external dependency on Google's infrastructure, tracking (the widget fingerprints the visitor), and GDPR headaches for a French site.

The alternative: a simple math challenge generated server-side. "What is 4 + 7?" The answer is stored in the session, not in the form. The bot can't find it by scanning the DOM.

Captcha generation — in the template, when the comment form is rendered:

// Generate a new captcha for each page load
$num1 = rand(1, 9);
$num2 = rand(1, 9);
$_SESSION['captcha_answer'] = $num1 + $num2;
$_SESSION['captcha_token']  = bin2hex(random_bytes(16)); // single-use token
<div class="form-group">
    <label for="captcha">
        Anti-spam: what is <?= $num1 ?> + <?= $num2 ?>?
    </label>
    <input type="number" id="captcha" name="captcha"
           class="form-control" required min="1" max="18">
    <input type="hidden" name="captcha_token"
           value="<?= htmlspecialchars($_SESSION['captcha_token']) ?>">
</div>

Verification — in the comment handler:

// Verify captcha
$captcha_input = (int) ($_POST['captcha'] ?? -1);
$captcha_token = $_POST['captcha_token'] ?? '';

// Token must match and exist in session
if (empty($_SESSION['captcha_answer'])
    || empty($_SESSION['captcha_token'])
    || $captcha_token !== $_SESSION['captcha_token']
    || $captcha_input !== (int) $_SESSION['captcha_answer']
) {
    http_response_code(400);
    exit(json_encode(['error' => 'Invalid captcha']));
}

// Invalidate after use — prevents replay
unset($_SESSION['captcha_answer'], $_SESSION['captcha_token']);

Three key points in this implementation:

  • Session storage: the correct answer never appears in the HTML. The bot has no DOM element to extract it from.
  • Single-use token: the captcha is invalidated immediately after the first use. A bot can't replay a captured valid session.
  • Random per page load: even if a bot hardcodes "4 + 7 = 11", the next load generates a different question.

Remote spam cleanup

The comment was already stored in the JSON file on the server. Deleting it locally changes nothing — the production file needs to be overwritten.

The comment files live in blog/comments/{slug}.json. The cleanest approach: connect with lftp and overwrite the file with an empty array.

# Connect to OVH FTP and overwrite the spam comment file
lftp -u webdevelzo,MYPASSWORD ftp.cluster121.hosting.ovh.net <<EOF
put /dev/stdin -o /www/blog/comments/grpc-go-streaming-microservices.json <<'JSON'
[]
JSON
bye
EOF

Thirty seconds, done. The article's comment section is clean again. If I had a more elaborate moderation workflow, I'd build a small admin interface — but for now, one bot in three months doesn't justify the investment.

The 4 anti-spam layers

Layer What it blocks Bypassed by
Honeypot Naive bots that fill all fields DOM-aware bots
CSRF token Cross-site request forgery Bots that visit the page first
Rate limiting Mass flooding from one IP Distributed bots, single posts
Math captcha Automated form submission OCR + AI (overkill for most spambots)

None of these is perfect in isolation. Together, they make spam unprofitable. A bot that navigates the DOM, extracts the CSRF token, solves a math captcha, and submits at a rate below the limit exists — but it won't bother for a personal tech blog with zero commercial value for the spammer.

The goal isn't to build an impenetrable fortress. It's to make the return on investment negative for automated attacks. That's enough.

Conclusion

RobertqueRy did me a favor. His Bangladesh gaming spam pushed me to add a captcha I should have included from day one. Ten minutes of work, zero external dependencies, no Google tracking injected into my visitors' browsers.

The lesson is embarrassingly simple: if you ship a public form, someone will use it to post garbage. The only question is when — not if. The blog had been live for three months. In internet time, that's patient.

The comment system was already documented in Adding comments to a PHP blog without a database and Email notifications for PHP comments via SMTP. The captcha is now part of both posts as well. It probably should have been there from the start.

Comments (0)