CQRS+ES: The Pubsub Bridge for Command Outcomes and Atomic Audit Logging

In the previous articles of this series, we covered the network and security layers of the authentication service: PKCS#12, timing oracle, mTLS, CRL. Let's now dive into the application architecture. The service is built with CQRS + Event Sourcing, and two non-obvious patterns deserve an article.

The event-XOR-error invariant

In a well-disciplined CQRS/ES system, a command handler does exactly one thing: it emits an event OR returns an error. Never both. Never neither.

func (h *LoginHandler) Handle(cmd LoginCommand) ([]Event, error) {
    user, err := h.repo.Load(cmd.UserID)
    if err != nil {
        return nil, err // infra error = no event
    }

    if !user.VerifyPassword(cmd.Password) {
        return []Event{LoginFailed{UserID: cmd.UserID}}, nil // event, not error
    }

    return []Event{LoginSucceeded{UserID: cmd.UserID}}, nil // event, not error
}

This discipline is crucial: errors are infrastructure problems (DB down, network timeout). Business outcomes are events (login succeeded, login failed). Mixing the two breaks traceability and makes projectors unpredictable.

The problem: how to inform the caller?

The command handler emits a LoginSucceeded event. The event is persisted in the event store. A projector consumes it and updates the read model. All of this is asynchronous.

But the HTTP handler that dispatched the command needs a response now. The user is waiting. How do you tell them "your login succeeded, here's your session cookie"?

The temptation: put the result in a typed error.

// DO NOT DO THIS
if user.VerifyPassword(cmd.Password) {
    return nil, &LoginResult{Success: true, SessionID: "abc"}
    // Breaks the invariant: it's a business result, not an infra error
}

This breaks the "error = infra" vs "business result = event" separation. Error middlewares, retry policies, circuit breakers — everything is calibrated on the assumption that error != nil means a problem, not a success.

The pubsub bridge pattern

The solution: the projector republishes the event on a pubsub channel with the command's CorrelationID. The caller subscribes before dispatching the command, filters by CorrelationID, and translates the event into a return value.

func (h *HTTPLoginHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    cmd := LoginCommand{
        UserID:        extractUserID(r),
        Password:      r.FormValue("password"),
        CorrelationID: uuid.New().String(),
    }

    // Subscribe BEFORE dispatching the command
    sub := h.pubsub.Subscribe(cmd.CorrelationID)
    defer sub.Close()

    // Dispatch the command
    if err := h.bus.Dispatch(cmd); err != nil {
        http.Error(w, "Internal error", 500)
        return
    }

    // Wait for the event with timeout
    select {
    case event := <-sub.Events():
        switch e := event.(type) {
        case LoginSucceeded:
            setSessionCookie(w, e.SessionID)
            http.Redirect(w, r, "/dashboard", 302)
        case LoginFailed:
            renderLoginPage(w, "Invalid credentials")
        }
    case <-time.After(5 * time.Second):
        http.Error(w, "Timeout", 504)
    }
}

The full flow:

HTTP handler generates a CorrelationID and subscribes to pubsub
HTTP handler dispatches the command
Command handler emits an event (LoginSucceeded or LoginFailed)
Event is persisted in the event store
Projector consumes the event, republishes it on pubsub with the CorrelationID
HTTP handler receives the event via pubsub, translates to HTTP response

The event-XOR-error invariant is preserved. The caller gets a synchronous response. The projector remains the single point for event → side-effect transformation.

Atomic audit logging

Second pattern: audit logging. In a projector, you often do several things in response to an event: update the read model, write an audit entry, send a notification, sometimes trigger a logout.

The trap: if the business projection succeeds and the audit fails, you have a divergence. The user was connected, but the audit log doesn't know. Or worse: the audit says "login at 14:03" but the session was created at 14:02 because the audit was retried.

The pattern: all side-effects in a single DB transaction, except the post-tx logout which is best-effort.

func (p *LoginProjector) Handle(event LoginSucceeded) error {
    tx, err := p.db.Begin()
    if err != nil {
        return err
    }
    defer tx.Rollback()

    // 1. Update read model
    if err := p.updateSession(tx, event); err != nil {
        return err
    }

    // 2. Write audit entry - IN the same transaction
    if err := p.writeAuditEntry(tx, AuditEntry{
        Action:    "login_succeeded",
        UserID:    event.UserID,
        Timestamp: event.Timestamp,
        IP:        event.IP,
    }); err != nil {
        return err
    }

    // 3. Publish to pubsub bridge - IN the same transaction
    // (uses pg_notify or outbox pattern)
    if err := p.publishOutbox(tx, event); err != nil {
        return err
    }

    // Atomic commit: all or nothing
    if err := tx.Commit(); err != nil {
        return err
    }

    // 4. Best-effort: notify, cleanup, etc.
    // If this fails, the transaction is already committed
    go p.notifySecurityTeam(event)

    return nil
}

A corollary trap: login failures on users that don't exist. If you write an audit entry for every attempt, a brute-forcer trying 10 million random emails produces 10 million rows in audit_events.

The rule: login failures on existing accounts deserve an audit (it's a security signal). Login failures on non-existent accounts are noise — slog.Warn and that's it. The per-IP rate limiter upstream limits the volume of logs themselves.

Conclusion

The pubsub bridge solves the "synchronous return in an asynchronous system" problem without breaking the event-XOR-error invariant. Atomic audit logging enforces "all or nothing" on critical side-effects.

Both patterns share a common thread: they impose discipline on transactional boundaries. In an event-sourced system, these boundaries are the only safety net between "the system is consistent" and "we no longer know what happened."

The architecture is in place, security is audited. But how was the audit itself conducted? The methodology — iterative audit passes, self-generated false positives, and how to turn probes into regression tests — that's the subject of the next article.

CQRS+ES: The Pubsub Bridge for Command Outcomes and Atomic Audit Logging

The event-XOR-error invariant

The problem: how to inform the caller?

The pubsub bridge pattern

Atomic audit logging

Conclusion

Related articles

Comments (0)

CQRS+ES: The Pubsub Bridge for Command Outcomes and Atomic Audit Logging

The event-XOR-error invariant

The problem: how to inform the caller?

The pubsub bridge pattern

Atomic audit logging

Login failures on unknown users: slog only

Conclusion

Related articles

Comments (0)