The program has been running for 3 days. Memory is climbing slowly, requests are starting to slow down, and at some point the process gets killed by the OOM. No panic, no visible error log, nothing in Sentry. Just a gradual degradation. Nine times out of ten: goroutine leak.
What makes leaks hard to catch is that they don't make noise. A blocked goroutine consumes between 2 and 8 KB of stack depending on its state. Alone, it's nothing. At a few thousand, it's a problem.
What a leak is
A goroutine leak is a goroutine that was started and never terminates — because it's waiting for something that will never arrive, or because it's running in an infinite loop with no exit mechanism. The Go runtime does not collect blocked goroutines: they stay in memory until the process ends.
Unlike classic memory leaks (unreleased objects), goroutine leaks often have a precise, reproducible cause. Once you know what to look for, they're quick to find.
The 4 patterns that reliably leak
1. Channel send with no receiver
The most common case. You send on an unbuffered channel, but the receiver has stopped (timeout, error, early return):
func fetchResult() (string, error) {
ch := make(chan string) // unbuffered
go func() {
result := doExpensiveWork()
ch <- result // ← blocked if nobody is reading anymore
}()
select {
case result := <-ch:
return result, nil
case <-time.After(2 * time.Second):
return "", errors.New("timeout")
// the goroutine is now blocked on ch <- forever
}
}
Every call that times out leaves a goroutine blocked. Fix: buffered channel with size 1, which lets the goroutine write and exit even if nobody reads:
func fetchResult(ctx context.Context) (string, error) {
ch := make(chan string, 1) // buffered — the goroutine can always write
go func() {
result := doExpensiveWork()
select {
case ch <- result:
case <-ctx.Done(): // if the context is cancelled, exit cleanly
}
}()
select {
case result := <-ch:
return result, nil
case <-ctx.Done():
return "", ctx.Err()
}
}
2. Channel receive on a channel that is never closed
You range over a channel waiting for it to be closed to exit, but the producer stops without closing it:
func process(jobs <-chan Job) {
go func() {
for job := range jobs { // ← blocked if jobs is never closed
handle(job)
}
}()
}
// On the caller side — the bug
func run() {
jobs := make(chan Job)
process(jobs)
jobs <- Job{ID: 1}
jobs <- Job{ID: 2}
// forgot close(jobs) → the goroutine in process() waits indefinitely
}
Rule: whoever creates the channel is responsible for closing it, with defer
as early as possible:
func run() {
jobs := make(chan Job)
defer close(jobs) // guaranteed close, even on panic
process(jobs)
jobs <- Job{ID: 1}
jobs <- Job{ID: 2}
}
3. Goroutine in an infinite loop with no exit
A worker launched at application startup, running forever with no clean shutdown mechanism:
func startWorker() {
go func() {
for {
processQueue()
time.Sleep(5 * time.Second)
// no way to stop this worker
// tests that create this worker will leak
}
}()
}
The problem mostly shows up in tests: every test that calls
startWorker() adds a goroutine that never terminates,
and goleak detects them immediately. Fix with context:
func startWorker(ctx context.Context) {
go func() {
ticker := time.NewTicker(5 * time.Second)
defer ticker.Stop()
for {
select {
case <-ticker.C:
processQueue()
case <-ctx.Done():
return // clean shutdown
}
}
}()
}
4. HTTP handler launching goroutines not attached to the context
A handler that launches background work, but the request ends (or the client disconnects) before the work is done:
func handleUpload(w http.ResponseWriter, r *http.Request) {
data := parseBody(r)
go func() {
// if the client disconnects, r.Context() is cancelled
// but this goroutine continues — and can stay blocked on I/O
processAndStore(data)
sendNotification(data.UserID)
}()
w.WriteHeader(http.StatusAccepted)
}
The goroutine is not tied to the request context. If processAndStore
is waiting for a network response and the connection drops, it stays blocked.
You need to pass the context — and handle cancellation:
func handleUpload(w http.ResponseWriter, r *http.Request) {
data := parseBody(r)
// Detach from the HTTP context (which will be cancelled at the end of the handler)
// but remain cancellable via an application-level context
ctx := context.WithoutCancel(r.Context()) // Go 1.21+
go func() {
if err := processAndStore(ctx, data); err != nil {
slog.Error("background processing failed",
"user_id", data.UserID, "error", err)
return
}
sendNotification(ctx, data.UserID)
}()
w.WriteHeader(http.StatusAccepted)
}
Detecting leaks
During development: goleak
goleak from Uber is the go-to tool for detecting leaks in tests. It verifies that no unexpected goroutine is running after the test ends:
func TestWorker(t *testing.T) {
defer goleak.VerifyNone(t) // fails if a goroutine leaks after the test
ctx, cancel := context.WithCancel(context.Background())
defer cancel() // guarantees worker shutdown
startWorker(ctx)
// ... test
}
Add goleak.VerifyNone(t) as a defer on tests that touch
concurrency. It doesn't slow down tests and catches 90% of leaks
before they reach production.
To enable it for all tests in a package at once:
func TestMain(m *testing.M) {
goleak.VerifyTestMain(m)
}
In production: pprof
If the leak is already in production, net/http/pprof lets you see
all active goroutines and their stack traces:
import _ "net/http/pprof"
func main() {
// Expose pprof on an internal port (never expose publicly)
go func() {
log.Println(http.ListenAndServe("localhost:6060", nil))
}()
// ...
}
# See the goroutine count in real time
curl -s http://localhost:6060/debug/pprof/goroutine?debug=1 | head -5
# Interactive profile in the browser
go tool pprof http://localhost:6060/debug/pprof/goroutine
In go tool pprof, the top command lists the functions
with the most blocked goroutines. If you see hundreds of goroutines
on the same stack trace, that's the leak.
Minimal monitoring with runtime
Without pprof, a periodic log of the goroutine count is enough to detect a drift:
func monitorGoroutines(ctx context.Context) {
ticker := time.NewTicker(30 * time.Second)
defer ticker.Stop()
for {
select {
case <-ticker.C:
n := runtime.NumGoroutine()
slog.Info("goroutine count", "count", n)
if n > 1000 {
slog.Warn("high goroutine count — possible leak", "count", n)
}
case <-ctx.Done():
return
}
}
}
The alert threshold depends on the application. What matters is the trend: a number that climbs steadily and never comes back down is a leak. A number that rises and falls with load is normal.
The rules that prevent 90% of leaks
Rather than an exhaustive checklist, three rules that cover the essentials:
- Every goroutine must have an explicit exit path. If you can't answer "how does this goroutine terminate?", it will leak. Cancelled context, closed channel, signal — whichever one, but there must be one.
-
Whoever creates the channel closes it.
Never the receiver. Same rule as memory in C:
whoever allocates, frees.
defer close(ch)right at channel creation on the producer side. - Propagate context down to background goroutines. Every goroutine launched in an HTTP handler, a job queue, a scheduler — must receive a context and listen to it. It's the only way to have a clean cascading shutdown.
A word on "intentionally long-running" goroutines
Everything above is about unintentional leaks. Some goroutines are meant to run for the entire lifetime of the application (HTTP server, queue worker, scheduler). Those are not leaks — as long as they respond to the process shutdown signal.
The standard pattern: a root context cancelled on SIGTERM,
passed to all long-running goroutines, with a WaitGroup to
wait for them to finish before exiting:
func main() {
ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
defer stop()
var wg sync.WaitGroup
wg.Add(1)
go func() {
defer wg.Done()
startWorker(ctx)
}()
<-ctx.Done() // wait for SIGTERM or Ctrl+C
slog.Info("shutting down...")
wg.Wait() // wait for all goroutines to finish
slog.Info("done")
}
With this pattern, goleak in tests and pprof in production,
goroutine leaks become detectable before they cause incidents.