SSE and PHP-FPM don't play well together

I was building a chatbox on my dedicated server — a shared space where members could exchange data and communicate in real time. On the server side, I went with Server-Sent Events. Clean, simple, native HTTP. In local development, everything worked perfectly. I pushed to production. For two days, not a single issue.

Then one evening, about ten people were using the chat at the same time. The site's homepage started responding slowly. At twenty simultaneous connected users, the server stopped responding to everything else entirely. 504 Gateway Timeout across the board. The server was running, PHP was running, but nothing was getting through. It took me an hour to understand what was happening.

The naive implementation that seems to work

The SSE server-side code is short and standard. Here's what I had written:

<?php
header('Content-Type: text/event-stream');
header('Cache-Control: no-cache');
header('X-Accel-Buffering: no');

while (true) {
    $messages = getNewMessages();
    if (!empty($messages)) {
        foreach ($messages as $msg) {
            echo "data: " . json_encode($msg) . "\n\n";
            ob_flush();
            flush();
        }
    }
    sleep(1);
}

This is exactly what the SSE documentation recommends. The Content-Type: text/event-stream tells the browser this is an event stream. The X-Accel-Buffering: no disables Nginx buffering so data is sent immediately. The infinite loop checks regularly for new messages and sends them as they arrive. In isolation, it works perfectly.

The problem isn't in the code. It's in what this code does at the server level when multiple people run it at the same time.

The diagnosis — the PHP-FPM worker pool

PHP-FPM (FastCGI Process Manager) is the PHP process manager sitting behind Apache or Nginx on the vast majority of production PHP servers. Its workings are simple: it maintains a pool of workers — PHP processes waiting for work. When an HTTP request arrives, Apache or Nginx passes it to an available worker. The worker runs it, sends back the response, then frees itself up for the next request.

The size of this pool is finite. Depending on configuration and available memory, you typically run between 20 and 50 workers. That's more than enough for normal PHP pages that execute in a few dozen milliseconds and free the worker immediately.

But an SSE endpoint is not a normal page. It's a connection that stays open for as long as the user is connected. Here's what actually happens:

[worker 1]  → user Alice  (infinite loop, blocked)
[worker 2]  → user Bob    (infinite loop, blocked)
[worker 3]  → user Carol  (infinite loop, blocked)
[worker 4]  → user David  (infinite loop, blocked)
[worker 5]  → user Emma   (infinite loop, blocked)
...
[worker 20] → user Theo   (infinite loop, blocked)

[homepage]   → 504 Gateway Timeout ← no workers left
[API /health] → 504 Gateway Timeout ← same
[anything]   → 504 Gateway Timeout ← same

Every user connected to the chat pins a worker for the entire duration of their session. At twenty simultaneous users on a twenty-worker pool, there are none left for anything else. The whole site is dead. This isn't a bug, not a memory leak, not a slow unoptimized query. It's structural.

The worker can't free itself: it's executing an infinite loop waiting for messages. It won't hand control back until the user disconnects — which means exactly when the browser closes the SSE connection, potentially hours later.

Why Node.js and Go don't have this problem

Node.js and Go handle long-lived connections effortlessly because they're built on a fundamentally different model. Node.js uses a non-blocking event loop: a single thread can manage thousands of open SSE connections simultaneously, because a connection waiting for data doesn't consume a thread — it's simply registered as a file descriptor to watch. Go works the same way with goroutines, lightweight threads multiplexed over a handful of OS threads. Holding a thousand open SSE connections in Go costs a few megabytes of memory, not a thousand workers. This isn't a PHP bug — it's a fundamental architectural difference. PHP-FPM = one blocked worker per active connection. Node.js/Go = one thread for all connections, non-blocking concurrency.

The canonical PHP solution: long-polling

Long-polling flips the logic. Instead of keeping a connection open indefinitely, the server responds as soon as it has something to say — or after a timeout if nothing happened. The connection closes. The client immediately loops back and opens a new request.

From the user's perspective, the chat is real-time: messages arrive within a second. From the server's perspective, each connection lasts at most 30 seconds before freeing up. The worker returns to the pool. Other requests get through.

Here's the chat-poll.php file:

<?php
header('Content-Type: application/json');
header('Cache-Control: no-cache');

$lastId  = (int)($_GET['last_id'] ?? 0);
$timeout = 30; // seconds before releasing the worker regardless
$start   = time();

$messagesFile = __DIR__ . '/messages.json';

while (time() - $start < $timeout) {
    clearstatcache(true, $messagesFile);

    if (file_exists($messagesFile)) {
        $all  = json_decode(file_get_contents($messagesFile), true) ?? [];
        $news = array_values(array_filter($all, fn($m) => $m['id'] > $lastId));

        if (!empty($news)) {
            echo json_encode($news);
            exit;
        }
    }

    usleep(300_000); // poll every 300ms
}

// Timeout: nothing new, release the worker, client will loop back
echo json_encode([]);

The principle is simple: we watch a JSON file every 300ms. As soon as new messages appear (identified by an id higher than the last one the client saw), we send them back and exit. If nothing arrives within 30 seconds, we return an empty array. Either way, the worker is freed.

On the client side, the JavaScript loop:

async function pollChat(lastId = 0) {
    try {
        const res = await fetch(`/chat-poll.php?last_id=${lastId}`);
        const messages = await res.json();

        if (messages.length > 0) {
            messages.forEach(appendMessage);
            lastId = messages.at(-1).id;
        }
    } catch (e) {
        // Network error: wait before looping back
        await new Promise(r => setTimeout(r, 2000));
    }

    // Loop back immediately — short connection, worker already freed
    pollChat(lastId);
}

pollChat();

The pollChat function is recursive but asynchronous: it waits for the server response, processes messages, then calls itself again. Since each call is inside a resolved Promise, there's no stack accumulation. This is the standard pattern for infinite polling in JS.

The gain is immediate. With this approach, a worker is held for at most 30 seconds (when there are no messages), and often less than 300ms when messages arrive quickly. With a 20-worker pool and a 30-second timeout, you can theoretically handle hundreds of users in rotation: most of the time, workers are free.

Bonus: inotify to avoid active polling

Polling every 300ms works fine, but it wastes a little CPU when there's nothing to read. The PHP inotify extension lets you replace this waiting loop with a system notification: the process goes to sleep and is woken up as soon as the messages file is modified.

<?php
// With the inotify extension installed (pecl install inotify)
$fd = inotify_init();
inotify_add_watch($fd, $messagesFile, IN_MODIFY | IN_CREATE);

// Wait until timeout or until a change happens
$read = [$fd];
$write = $except = [];
$remaining = $timeout - (time() - $start);

if (stream_select($read, $write, $except, $remaining) > 0) {
    inotify_read($fd); // drain the event queue
    // read new messages...
}

fclose($fd);

Result: near-zero latency for the user, zero CPU while waiting. That said, inotify isn't available everywhere and adds an extension dependency. For a chatbox use case, 300ms polling is more than enough — the difference is imperceptible to the user.

Conclusion

SSE is a good protocol. Simple, native, well supported by browsers, no library needed. The problem isn't SSE — it's what PHP-FPM does with every active connection. One worker per connection is the PHP model. That model is perfectly suited to thousands of short pages. It's fundamentally ill-suited to persistent connections.

Long-polling isn't as elegant as true push. There are round trips, timeouts, client-side reconnection to handle. But it respects the PHP model, it scales, and it requires no external dependencies — no ReactPHP, no Ratchet, no Swoole. One PHP file and a JavaScript loop.

The lesson I take from this: knowing your stack's constraints is what separates "it works in dev" from "it holds in prod". The naive SSE code was correct. What was missing was an understanding of how PHP-FPM allocates its resources. Fifteen minutes reading the PHP-FPM documentation would have prevented the outage.