ClaudeGate exposes Claude Code CLI as a REST API. Every POST /api/v1/jobs
request creates a job that spawns a CLI process — this isn't an HTTP handler that fires
a SQL query. It's a child process, RAM, CPU, time.
Without protection, an aggressive client can saturate the machine in seconds.
The obvious solution: a per-IP rate limiter. The
golang.org/x/time/rate package implements the token bucket algorithm,
which is exactly what's needed here. Here's how the integration was done,
the decisions made, and what was deliberately left out.
Token bucket: the mechanics in two sentences
A token bucket holds tokens. Each request consumes one token. Tokens replenish at a constant rate (the RPS). If the bucket is empty, the request is rejected.
golang.org/x/time/rate exposes this cleanly via
rate.NewLimiter(limit, burst):
import "golang.org/x/time/rate"
// 5 requests/second, burst of 5 (no accumulation beyond that)
limiter := rate.NewLimiter(rate.Limit(5), 5)
if limiter.Allow() {
// process the request
} else {
// 429 Too Many Requests
}
rate.Limit is a float64 representing events per second.
The burst defines the maximum number of tokens in the bucket — in other words,
the allowed burst size before the limit kicks in.
In ClaudeGate, we set burst = rps: no accumulated credit,
the per-second cap is strict.
One limiter per IP, not a global limiter
A global limiter would throttle all clients simultaneously. A moderately active client would be penalized by an abusive one. That's not the intended behavior.
The RateLimiter struct maintains a map of IP → limiter:
type ipLimiter struct {
limiter *rate.Limiter
lastSeen time.Time
}
type RateLimiter struct {
mu sync.Mutex
ips map[string]*ipLimiter
rps rate.Limit
burst int
}
func NewRateLimiter(rps int) *RateLimiter {
rl := &RateLimiter{
ips: make(map[string]*ipLimiter),
rps: rate.Limit(rps),
burst: rps,
}
go rl.cleanup()
return rl
}
func (rl *RateLimiter) allow(ip string) bool {
rl.mu.Lock()
defer rl.mu.Unlock()
l, ok := rl.ips[ip]
if !ok {
l = &ipLimiter{limiter: rate.NewLimiter(rl.rps, rl.burst)}
rl.ips[ip] = l
}
l.lastSeen = time.Now()
return l.limiter.Allow()
}
The sync.Mutex protects concurrent access to the map.
Each IP gets its own rate.Limiter on first request.
The lastSeen field is used exclusively for cleanup.
The cleanup goroutine — avoiding memory leaks
Without cleanup, the map grows indefinitely. Every IP that touches the API creates an entry that is never removed. On a publicly exposed API, this is a trivial memory leak vector to exploit.
The solution: a background goroutine that periodically evicts entries inactive for more than 5 minutes:
func (rl *RateLimiter) cleanup() {
ticker := time.NewTicker(5 * time.Minute)
defer ticker.Stop()
for range ticker.C {
rl.mu.Lock()
cutoff := time.Now().Add(-5 * time.Minute)
for ip, l := range rl.ips {
if l.lastSeen.Before(cutoff) {
delete(rl.ips, ip)
}
}
rl.mu.Unlock()
}
}
Two important points. First, the ticker is created with defer ticker.Stop()
— if the goroutine were to terminate, the resource is properly released.
Second, the mutex is held for the entire map iteration:
you cannot modify a map while another goroutine is reading it.
The goroutine receives no context. In ClaudeGate, the rate limiter lives
as long as the server runs — there's no reason to shut it down gracefully.
If shutdown correctness matters in your case, pass a ctx context.Context
and add case <-ctx.Done(): return in a select.
X-Forwarded-For: the real IP behind a proxy
Behind a reverse proxy (nginx, Caddy, a load balancer), r.RemoteAddr
returns the proxy's IP — not the client's. All requests would share the same "IP",
effectively making the rate limiter global. You need to read X-Forwarded-For:
func clientIP(r *http.Request) string {
if fwd := r.Header.Get("X-Forwarded-For"); fwd != "" {
if idx := strings.Index(fwd, ","); idx != -1 {
return strings.TrimSpace(fwd[:idx])
}
return strings.TrimSpace(fwd)
}
addr := r.RemoteAddr
if idx := strings.LastIndex(addr, ":"); idx != -1 {
return addr[:idx]
}
return addr
}
X-Forwarded-For can contain a list of IPs if the request passes through
multiple proxies: client, proxy1, proxy2. We take the first value —
that's the originating client IP as seen by the first proxy in the chain.
The strings.TrimSpace handles any stray whitespace.
Note: this header can be forged by the client if your infrastructure doesn't control which proxy sets it. In a controlled environment where you own the proxy that sets this header, it's reliable. On a directly exposed API, it's a potential bypass vector.
Targeted middleware: only POST /api/v1/jobs
The rate limiter only applies to requests that create jobs. GET endpoints (status polling, SSE stream) don't need it: they are lightweight reads.
func RateLimit(rps int) Middleware {
if rps <= 0 {
return func(next http.Handler) http.Handler { return next }
}
rl := NewRateLimiter(rps)
return func(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.Method == http.MethodPost && r.URL.Path == "/api/v1/jobs" {
ip := clientIP(r)
if !rl.allow(ip) {
writeError(w, http.StatusTooManyRequests, "rate limit exceeded, slow down")
return
}
}
next.ServeHTTP(w, r)
})
}
}
The rps <= 0 check lets you disable the rate limiter
with a RATE_LIMIT=0 config — useful for integration tests
or an internal deployment without public exposure.
The returned no-op middleware makes zero allocations.
The tests: rps=1 burst=1 to force rejection
Testing a rate limiter requires forcing the rejection case, which isn't trivial
with high burst values. The solution: rps=1, burst=1.
The second request within the same second is always blocked:
func TestRateLimit_BlocksOverLimit(t *testing.T) {
handler := RateLimit(1)(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
}))
req := httptest.NewRequest(http.MethodPost, "/api/v1/jobs", nil)
req.RemoteAddr = "1.2.3.4:5678"
// First request: passes
rr := httptest.NewRecorder()
handler.ServeHTTP(rr, req)
if rr.Code != http.StatusOK {
t.Fatalf("expected 200, got %d", rr.Code)
}
// Immediate second request: blocked
rr = httptest.NewRecorder()
handler.ServeHTTP(rr, req)
if rr.Code != http.StatusTooManyRequests {
t.Fatalf("expected 429, got %d", rr.Code)
}
}
The test suite also covers:
- Disabled limiter (
rps=0): the middleware must pass all requests through without creating a limiter - Under the limit (
rps=10): the first request must pass - Non-POST methods: GET requests on
/api/v1/jobsare never blocked, even withrps=1
httptest.NewRequest and httptest.NewRecorder are all you need.
No server, no port, no goroutine overhead. Tests run in a few milliseconds.
What we didn't do — and why
Several more sophisticated approaches were deliberately left out.
Distributed Redis. A Redis-backed rate limiter (or Valkey) is essential when multiple API instances run behind a load balancer — counters need to be shared. ClaudeGate runs as a single instance. Redis would add an external dependency, a network round-trip on every request, and an extra failure point. Zero benefit in this context.
Sliding window. The token bucket accumulates credits between requests. A sliding window counter enforces a strict limit over any time window, without the burst spike possible at bucket boundaries. More precise, but also more complex to implement correctly without Redis. For a personal gateway, the token bucket is sufficient.
Quota headers in the response. X-RateLimit-Limit,
X-RateLimit-Remaining, X-RateLimit-Reset — API clients
often expect these. rate.Limiter exposes methods like
Tokens() and Reserve() that would allow computing them,
but it wasn't judged necessary for this use case.
Conclusion
Under 80 lines. One cleanup goroutine. No external dependency. ClaudeGate's rate limiter covers the real use case without over-engineering.
golang.org/x/time/rate is the right abstraction for a token bucket in Go:
thread-safe, accurate, well-documented. The integration work is mostly about
what surrounds it — the per-IP map, the cleanup, extracting the real client IP,
and targeting the middleware only at the expensive endpoints.
If the context changes — multi-instance deployment, per-authenticated-user quotas, daily limits — the design will need to evolve. But we don't anticipate what doesn't exist yet.