Parallelism in Go — Part 1: goroutines and WaitGroup

You have a Go program making 10 HTTP calls to an external API. Each call takes about 1 second. Result: your program takes 10 seconds to finish. With goroutines, those 10 calls run at the same time and the whole thing takes 1 second — the duration of the slowest one, not the sum of all of them.

That's the promise of concurrency in Go. And unlike most languages where it's a pain to set up (OS threads, manual locks, cascading callbacks), Go makes it accessible from day one. You still need to avoid the classic pitfalls, and there are a few of them.

This three-part series starts from scratch:

What is a goroutine?

The most honest analogy: think of your browser tabs. Each tab loads its page "at the same time" — YouTube buffers your video while you read an article in another tab. Your CPU isn't truly doing everything at once (well, not entirely), but the system switches between tasks so fast it gives the illusion of parallelism.

A goroutine is the same thing: a function running "in the background" while the rest of the program continues. The difference from a classic OS thread is that goroutines are managed by the Go runtime, not the operating system. They're lightweight (a few kilobytes at startup versus several megabytes for an OS thread) and you can launch thousands of them without breaking a sweat.

To launch one, just put go in front of a function call:

package main

import (
    "fmt"
    "time"
)

func sayHello(name string) {
    fmt.Println("Hello", name)
}

func main() {
    go sayHello("Alice")
    go sayHello("Bob")
    time.Sleep(10 * time.Millisecond) // we'll come back to this
    fmt.Println("Program done")
}

That's all. go sayHello("Alice") launches the function in a new goroutine and doesn't wait for it to finish before moving to the next line. main() continues immediately.

The first trap — the program that finishes too soon

Run this code without the time.Sleep:

func main() {
    go sayHello("Alice")
    go sayHello("Bob")
    fmt.Println("Program done")
}

Likely result: you only see "Program done". The two goroutines never had time to print anything. Why?

Fundamental rule: when main() returns, the Go program terminates immediately — regardless of how many goroutines are still running. They all get killed at once, without warning, without finishing their work.

The time.Sleep(10 * time.Millisecond) in the previous example is a band-aid, not a solution. You're "hoping" that 10ms is enough for the goroutines to finish. That's fragile, non-deterministic, and doesn't scale. If the goroutines make a network call that takes 2 seconds, do you put Sleep(2s)? And what if sometimes it takes 3 seconds?

You need a mechanism to properly wait for all goroutines to finish.

sync.WaitGroup — the right solution

Imagine a head chef distributing tasks to the kitchen staff before service: "You chop the onions, you prepare the sauce, you get the plates out." Before opening the dining room, he waits for everyone to finish their prep. He doesn't check his watch and hope — he explicitly waits for confirmation from each person.

sync.WaitGroup is that synchronization mechanism:

  • wg.Add(n) — "I'm waiting for n more goroutines"
  • wg.Done() — "this goroutine is done" (decrements the counter)
  • wg.Wait() — "block until the counter reaches zero"
package main

import (
    "fmt"
    "sync"
)

func sayHello(name string, wg *sync.WaitGroup) {
    defer wg.Done() // called when the function returns, no matter what
    fmt.Println("Hello", name)
}

func main() {
    var wg sync.WaitGroup

    wg.Add(2)
    go sayHello("Alice", &wg)
    go sayHello("Bob", &wg)

    wg.Wait() // blocks here until both goroutines have called Done()
    fmt.Println("Everyone said hello")
}

The defer wg.Done() matters: by putting it in a defer, you ensure it will be called even if the function crashes with an error or panics halfway through. Without it, a goroutine that dies prematurely would block wg.Wait() indefinitely.

The full pattern with a loop, for example over a list of URLs:

urls := []string{
    "https://api.example.com/users/1",
    "https://api.example.com/users/2",
    "https://api.example.com/users/3",
}

var wg sync.WaitGroup

for _, url := range urls {
    wg.Add(1)
    go func(u string) {
        defer wg.Done()
        download(u)
    }(url)
}

wg.Wait()
fmt.Println("All downloads complete")

Note that we do wg.Add(1) before launching the goroutine, not inside it. If we did it inside the goroutine, wg.Wait() could trigger before the goroutine had time to register itself.

The classic loop trap

This is probably the most common bug for developers new to concurrent Go, and it's nasty because it "sometimes works" depending on execution conditions.

// BUG: all goroutines share the same variable i
for i := 0; i < 5; i++ {
    go func() {
        fmt.Println(i) // i is captured by reference
    }()
}
// Often prints: 5 5 5 5 5

Why? The goroutine doesn't capture the value of i at the moment it's created. It captures a reference to the variable i. When the goroutines eventually execute (a few microseconds later), the loop has already finished and i equals 5. So all goroutines read 5.

The fix is trivial: pass i as a parameter to the anonymous function.

// FIX: passing i as a parameter creates a local copy
for i := 0; i < 5; i++ {
    go func(n int) {
        fmt.Println(n) // n is a copy belonging to this goroutine
    }(i) // i is evaluated here, at call time
}
// Prints: 0 1 2 3 4 (in some order, but always the right numbers)

In Go 1.22+, this problem has been partially addressed: the loop variable in for i := range ... now creates a new variable at each iteration instead of reusing the same one. But the parameter pattern remains the most explicit form and is compatible with all versions.

The race detector — your best friend

A race condition is when two goroutines access the same data at the same time and at least one of them modifies it. The result is non-deterministic and often catastrophic.

Classic example: two goroutines incrementing a shared counter.

package main

import (
    "fmt"
    "sync"
)

func main() {
    counter := 0
    var wg sync.WaitGroup

    for i := 0; i < 1000; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            counter++ // DANGER: read + increment + write, not atomic
        }()
    }

    wg.Wait()
    fmt.Println("Final counter:", counter)
    // Expected: 1000
    // Actual: something between 800 and 1000, different every run
}

Without the right tool, this kind of bug is nearly impossible to reproduce reliably in tests. That's where Go's built-in race detector comes in:

go run -race main.go

The race detector instruments the code at compilation time and detects unprotected concurrent accesses at runtime. Its output looks like this:

==================
WARNING: DATA RACE
Write at 0x00c000126010 by goroutine 8:
  main.main.func1()
      /home/odilon/main.go:15 +0x44

Previous write at 0x00c000126010 by goroutine 7:
  main.main.func1()
      /home/odilon/main.go:15 +0x44
==================

It tells you exactly which line is the problem and which goroutines are in conflict. The performance cost is real (about 5 to 10x slower), but it's only for development and tests — never in production.

Practical rule: always run your tests with -race. go test -race ./... should be part of your CI.

Concrete example — 10 URLs in parallel

Here's a complete example that illustrates the real gain. We simulate 10 HTTP calls each taking about 1 second (using time.Sleep to avoid depending on a real API).

package main

import (
    "fmt"
    "sync"
    "time"
)

// Simulates an HTTP call taking ~1 second
func fetchData(url string) string {
    time.Sleep(1 * time.Second)
    return "data from " + url
}

func main() {
    urls := []string{
        "https://api.weather.com/london",
        "https://api.weather.com/paris",
        "https://api.weather.com/berlin",
        "https://api.weather.com/madrid",
        "https://api.weather.com/rome",
        "https://api.weather.com/amsterdam",
        "https://api.weather.com/lisbon",
        "https://api.weather.com/vienna",
        "https://api.weather.com/prague",
        "https://api.weather.com/warsaw",
    }

    // --- Sequential version ---
    start := time.Now()
    for _, url := range urls {
        fetchData(url)
    }
    fmt.Printf("Sequential: %v\n", time.Since(start))
    // Sequential: ~10 seconds

    // --- Parallel version ---
    start = time.Now()
    var wg sync.WaitGroup

    for _, url := range urls {
        wg.Add(1)
        go func(u string) {
            defer wg.Done()
            fetchData(u)
        }(url)
    }

    wg.Wait()
    fmt.Printf("Parallel: %v\n", time.Since(start))
    // Parallel: ~1 second
}

In practice, the parallel version takes the duration of the slowest call, plus a minimal goroutine management overhead. For network calls, this is often a 5 to 20x speedup depending on latencies and number of requests.

One important detail: in this example, we're ignoring the return values of the goroutines. In real conditions, you'd need to retrieve the returned data and any errors. That's exactly what channels allow you to do — the subject of part 2.

What we've learned

  • go myFunction() launches a goroutine — the following code executes immediately without waiting
  • main() returning kills all running goroutines, without exception
  • sync.WaitGroup is the right tool for waiting for a set of goroutines to complete
  • defer wg.Done() guarantees the counter is decremented even on error
  • In a loop, always pass the loop variable as a parameter to the goroutine, never capture it directly
  • go run -race detects race conditions — use it in development and CI

In part 2, we'll see how to retrieve results and errors from these goroutines using channels, and how to build a worker pool that limits the number of goroutines running in parallel — because launching 10,000 goroutines all at once is also a great way to bring a server down.

📄 Associated CLAUDE.md

Comments (0)