PHP-FPM, workers and goroutines: what actually happens under load

The API runs on a 4 GB RAM VPS, Nginx in front, PHP-FPM configured with 50 workers. A traffic spike — nothing exceptional, a marketing campaign — and in 8 seconds the pool is saturated. CPU at 30%, server down. Monitoring showing 502s and 504s in bursts. The bottleneck wasn't the CPU. It was RAM and the exhausted pool.

Six months later, migration to Go. Not out of hype, but because we understood the model. The difference in behavior under load wasn't about raw speed — it was about how each runtime handles concurrency.

Note: if you're looking for a general language comparison, I wrote a dedicated article. This one goes one level deeper: the mechanics.

The PHP-FPM model in one picture

Nginx (or Apache) acts as a reverse proxy: it handles TLS, serves static files, buffers incoming requests. It doesn't run PHP. PHP-FPM maintains a pool of forked OS processes ready to execute PHP.

The flow is simple:

client → Nginx → PHP-FPM queue → [worker pool]
                               ↑
                     if pool full: queue → timeout → 502/504

Each worker is an independent OS process. It consumes between 30 and 60 MB of RAM at minimum, depending on what the application loads into memory. This memory is not shared between workers — each has its own memory space. The pool is sized at configuration time, not at actual load.

Saturation isn't a bug. It's the expected behavior of the model.

[www]
pm = dynamic
pm.max_children = 50        ; 50 workers × 50 MB = 2.5 GB RAM reserved
pm.start_servers = 10
pm.min_spare_servers = 5
pm.max_spare_servers = 20
pm.max_requests = 500       ; Recycle workers after 500 req (avoids memory leaks)

With this configuration on a 4 GB VPS, PHP-FPM can consume up to 2.5 GB of RAM before the application even starts actually processing data.

What happens when the pool is full

Exact sequence: a request arrives, Nginx buffers it, PHP-FPM tries to assign a worker. If pm.max_children is reached, the request waits in queue. If the queue is full or fastcgi_read_timeout expires, Nginx returns a 504. Clients see an error. The CPU, meanwhile, is idle.

The math is brutal: 50 workers × 50 MB = 2.5 GB of RAM consumed just for the PHP pool, before logs, cache, Nginx itself. Concurrency is bounded by RAM, not by compute power.

This model has a direct consequence on persistent connections. An SSE or WebSocket keeps a worker occupied for its entire lifetime. 50 simultaneous SSE connections = 50 blocked workers = pool saturated for everything else.

ps --no-headers -o rss -C php-fpm | awk '{sum+=$1} END {print sum/1024 " MB"}'

This command gives the actual RAM consumption of all php-fpm processes in production. Useful to run before sizing the pool.

The Go model — goroutines and M:N scheduling

Go doesn't fork processes. It doesn't maintain a thread pool. Its runtime implements an M:N scheduler: N goroutines multiplexed onto M OS threads, where M corresponds to GOMAXPROCS (defaults to the number of cores).

A goroutine starts with a stack of 8 KB — versus around 8 MB for an OS thread. This stack grows dynamically if needed, but stays lightweight while the goroutine is blocked on I/O. The runtime parks it and uses the OS thread for something else.

With net/http, each incoming connection spawns a goroutine. 10,000 simultaneous connections ≈ 80 MB of goroutine stacks. On a PHP-FPM setup with 50 workers, you'd be at 2.5 GB and drowning in 502s long before that.

There's no PHP-FPM, no worker pool. The Go binary is the server. Nginx can still sit in front for TLS, compression, static file caching, rate limiting — but not to manage application concurrency.

On goroutine lifecycle and leak risks, I detailed the patterns to avoid in the article Goroutine leaks in Go: detect, understand, fix.

package main

import (
    "log"
    "net/http"
    "time"
)

func handler(w http.ResponseWriter, r *http.Request) {
    w.Write([]byte("OK"))
}

func main() {
    mux := http.NewServeMux()
    mux.HandleFunc("/", handler)

    srv := &http.Server{
        Addr:         ":8080",
        Handler:      mux,
        ReadTimeout:  5 * time.Second,
        WriteTimeout: 10 * time.Second,
        IdleTimeout:  120 * time.Second,
    }

    log.Println("Listening on :8080")
    log.Fatal(srv.ListenAndServe())
}

Server timeouts are non-negotiable in production. Without ReadTimeout, a slow client can hold a connection open indefinitely, and the associated goroutine never gets released.

Degradation under load — compared behavior

PHP-FPM: cliff-edge degradation

Below max_children, everything works normally. Beyond it: queue, timeout, 502. Degradation is binary — the service responds or it doesn't. No middle ground, no latency that gradually climbs. The server goes from "operational" to "erroring" in a few seconds.

Go: progressive degradation

Goroutines accumulate in memory. Latency climbs linearly. There's no "pool full" — as long as RAM allows, Go keeps accepting connections. With context.WithTimeout correctly propagated, slow requests release their goroutines cleanly on expiry.

Without timeout — potentially orphaned goroutine:

func handler(w http.ResponseWriter, r *http.Request) {
    result := fetchFromDB() // can take 30 seconds
    w.Write(result)
    // if the client disconnects, the goroutine keeps running
}

With context timeout — clean release:

func handler(w http.ResponseWriter, r *http.Request) {
    ctx, cancel := context.WithTimeout(r.Context(), 5*time.Second)
    defer cancel()

    result, err := fetchFromDBCtx(ctx)
    if err != nil {
        http.Error(w, "timeout", http.StatusGatewayTimeout)
        return
    }
    w.Write(result)
    // when the client disconnects, r.Context() is cancelled → ctx too → fetchFromDBCtx exits cleanly
}

The difference in behavior under load between the two models often comes down to this context propagation. In Go, a goroutine that isn't properly anchored to a context becomes a goroutine leak — invisible until RAM saturates.

Operational consequences

The choice isn't "PHP is slow, Go is fast". It's a question of fit between the concurrency model and the project's constraints.

PHP + Nginx wins when:

Shared hosting, CMS (WordPress, Drupal), Composer ecosystem already in place
Traffic below ~1,000 req/min, existing PHP team, legacy code
FTP or simple git push deployment without system access

Go wins when:

High-frequency APIs (> 10,000 req/min), WebSockets, SSE at volume
Constrained VPS budget: 512 MB of RAM can sustain thousands of lightweight Go connections
Low-footprint microservices, single binary to deploy

For the concrete case of SSE with PHP and the necessary workarounds, I detailed the approach in the article SSE, PHP-FPM and chatbox: working with workers.

Criterion	PHP + Nginx/FPM	Go (net/http)
Concurrency unit	OS process (~50 MB)	Goroutine (~8 KB)
Natural limit	Pool size (RAM)	Total RAM (progressive degradation)
Behavior at saturation	Queue + timeout + 502	Latency climbs, connections held
Persistent connections (SSE, WS)	1 blocked worker	1 sleeping goroutine (~8 KB)
2 GB RAM VPS	~30-40 workers max	~100k lightweight connections
Deployment	FTP, shared hosting, CMS ready	Single binary, systemd
Shared hosting	Yes (everywhere)	No (VPS minimum)
CMS/libs ecosystem	Huge	Minimal on the classic web side
Best for	Sites, CMS, API < 1k req/min	High-frequency API, real-time, microservices

Conclusion

Most Go migrations I've seen — or done — start from a bad surprise with PHP-FPM under load. A surprise that was avoidable with a load test upfront, before going to production.

PHP-FPM is robust and predictable. Its only real flaw: it's opaque until the moment the pool is full. No gradual warning, no graceful degradation. Once you understand this mechanic, you choose with full awareness — and you often stay on PHP, just better sized.

The right tool isn't the one that holds up best. It's the one whose breaking point you understand.