When you write Go for a regulated financial platform, there's an unspoken rule everyone understands after the first incident: the code running on Friday evening must still be running on Monday morning, exactly the same way. Not "roughly the same." Not "after a restart." Exactly the same.
That changes how you write Go. You stop looking for the most elegant solution — you look for the one that won't break at 3 AM on a bank holiday. The patterns described here didn't come from tutorials or conference talks. They survived months of production, post-mortems, and code reviews with people who have very little patience for code that "should work."
Graceful shutdown — the non-negotiable pattern
First pattern, and by far the most critical. If your service can't stop cleanly, everything else is decoration.
The scenario: a deployment in progress. Kubernetes sends SIGTERM. Your service has 30 seconds to finish what it's doing. If you're in the middle of a financial transaction — a fund transfer, a reconciliation, a ledger entry — you can't just cut. You also can't take 5 minutes.
func run(ctx context.Context) error {
srv := &http.Server{
Addr: ":8080",
Handler: newRouter(),
}
errCh := make(chan error, 1)
go func() { errCh <- srv.ListenAndServe() }()
select {
case err := <-errCh:
return fmt.Errorf("server stopped: %w", err)
case <-ctx.Done():
shutCtx, cancel := context.WithTimeout(context.Background(), 25*time.Second)
defer cancel()
return srv.Shutdown(shutCtx)
}
}
Three details that matter:
context.Background()for shutdown, not the parent ctx — the parent is already cancelled, that's why we're here- 25 seconds, not 30 — keep a 5-second margin before Kubernetes sends kill -9
- The error channel is buffered — if shutdown arrives before
ListenAndServereturns, the goroutine doesn't leak
The real difficulty isn't the HTTP server — it's everything else. Kafka consumers, background workers, gRPC connections, tickers. Every component with a lifecycle must stop in the right order. In practice, we use errgroup with a shared context: the first component to die cancels the others.
g, ctx := errgroup.WithContext(ctx)
g.Go(func() error { return httpServer.Run(ctx) })
g.Go(func() error { return grpcServer.Run(ctx) })
g.Go(func() error { return kafkaConsumer.Run(ctx) })
g.Go(func() error { return metricsServer.Run(ctx) })
return g.Wait()
Simple. Testable. Every component implements Run(ctx context.Context) error. When the context is cancelled, everything shuts down in reverse startup order. It's boring, verbose, and has worked for two years without surprises.
HTTP middleware — the production stack
Every HTTP request goes through the same middleware chain. The order is non-negotiable:
func newRouter() http.Handler {
mux := http.NewServeMux()
mux.HandleFunc("GET /health", handleHealth)
mux.HandleFunc("POST /api/v1/transfers", handleTransfer)
var h http.Handler = mux
h = withAuth(h)
h = withRequestID(h)
h = withRecovery(h)
h = withLogging(h)
h = withMetrics(h)
return h
}
Read bottom to top (last wrapped = first executed):
- Metrics — Prometheus histogram, before everything else to capture total duration
- Logging — structured request log with request ID, status, duration
- Recovery — catches panics, logs the stack trace, returns 500 instead of killing the process
- Request ID — UUID in context, propagated through all logs and downstream calls
- Auth — token verification, identity injected into context
The recovery middleware is the most underestimated. In dev, a panic crashes the program and you see the stack trace. In production, a panic in an HTTP handler kills the goroutine but not the process — except the connection is closed cleanly by the runtime, with no log. The client gets an EOF. You see nothing. The recovery middleware turns that into a 500 + stack trace in logs.
Circuit breaker — when downstream is dead
In fintech, your services talk to banking partners, KYC APIs, payment systems. They go down. Not often, but when they do, it's rarely for 5 seconds — it's for 45 minutes, on a Saturday, with no warning.
Without a circuit breaker, your service stacks pending requests, goroutines multiply, memory climbs, timeouts cascade, and your own healthcheck fails. The circuit breaker cuts the connection to the dead service before it contaminates the rest.
The key questions for configuration: how many failures before opening the circuit? How long before retrying? Does "failure" include timeouts or only 5xx? The answer depends on the downstream service. A banking partner that normally responds in 800ms and 30s when struggling? Timeout at 5s, circuit open after 3 failures, reset after 60s.
Structured logging — slog in production
We switched from log.Printf to slog (standard library since Go 1.21) a year and a half ago. The gain isn't aesthetic — it's operational. When an incident hits at 2 AM, the question is never "what happened?" but "what happened for this request ID, this user, this amount?"
slog.Info("transfer processed",
"request_id", reqID,
"user_id", userID,
"amount_cents", amount,
"duration_ms", time.Since(start).Milliseconds(),
"partner", "bank_xyz",
)
Two rules we enforce:
- Never log personal data — no email, no name, no IBAN. User ID yes, everything else no. It's a GDPR reflex, but mostly it's the law when you handle funds.
- Request ID goes everywhere — from HTTP middleware to the last downstream gRPC call. Passed through context, included in every log. When a customer calls about a stuck transaction, support provides the request ID, and in 30 seconds you have the full trace.
What the code doesn't show
The patterns above are the technical bricks. What makes the difference in financial production is everything that isn't code:
- Blameless post-mortems. Every incident documented, every corrective action tracked.
- Shutdown tests. We test graceful shutdown as seriously as features. A deployment that drops requests is a P0 bug.
- "Boring code." The most reliable code is the code you don't need to re-read. No generics everywhere, no channels when a mutex will do, no abstraction for fun. Boring code is code that runs.
Conclusion
After several years of Go in financial production, the patterns that survive are never the most sophisticated. They're the most boring. Graceful shutdown, middleware in the right order, circuit breaker, structured logging. Nothing spectacular. But when the banking partner goes down at 11 PM on a Friday, it's this boring code that makes the difference between "the circuit breaker cut, zero lost transactions, we go home" and "we spend the weekend reconciling ledger entries."
Go "best practices 2026" isn't about language novelties. It's about the discipline of what you write — and especially what you don't.