Concurrent Counting and Resilience: Sharded Metrics and a Circuit Breaker
Concurrent Counting and Resilience: Sharded Metrics and a Circuit Breaker
Under load, one global lock is the bottleneck. Gortex’s metrics collector shards the lock into 16 and counts with atomics; the circuit breaker is a textbook three-state machine. This post explains why channels are the wrong tool here.
Prerequisite: A4 goroutines/channels, A5 sync primitives.
Lock sharding: one lock into sixteen
Under load, every goroutine contending for one lock over one metrics map makes that lock the bottleneck. Gortex’s ShardedCollector splits business metrics into 16 independent shards, each with its own lock:
type ShardedCollector struct {
httpRequestCount int64 // atomic counter
shardCount int // fixed at 16
shards []*metricShard
}
type metricShard struct {
mu sync.RWMutex
metrics map[string]float64
lruList *list.List // per-shard LRU
// ...
}
To record a metric, an FNV hash decides which shard it lands in, and only that shard is locked:
shardIndex := c.hashKey(metricKey) // fnv.New32a() % 16
shard := c.shards[shardIndex]
shard.mu.Lock()
defer shard.mu.Unlock()
// ...touch only this shard
Sixteen is fixed (the source comment says “for predictable performance”), not tied to CPU count. Lock contention drops from “one global lock” to roughly 1/16.
Atomic counters and per-shard LRU
Not everything needs a lock. The highest-frequency counters — HTTP request count, WebSocket connections — use atomic directly:
atomic.AddInt64(&c.httpRequestCount, 1) // lock-free
Only when a map must be updated alongside (breakdowns by status, by method) does httpMu come into play. That’s the A5 choice: a single counter is atomic, a block of state is a Mutex.
Each shard runs its own LRU eviction: when its metric count exceeds the per-shard cap (maxCardinality / 16), the least-recently-used entry is evicted from that shard’s list.List. The LRU is per-shard — a global LRU would itself become the new bottleneck, undoing the sharding.
The circuit breaker’s three states
A circuit breaker stops requests hitting a downstream that’s already broken, giving it room to recover. Gortex’s is a textbook three-state machine:
- Closed: requests pass; failures accumulate, and once
ReadyToTripfires (default “more than 10 requests and a failure ratio above 0.5”) it trips to Open. - Open: requests get
ErrCircuitOpenimmediately, sparing the downstream; afterTimeout(default 60s) it moves to Half-Open. - Half-Open: only
MaxRequestsprobe requests are admitted; it returns to Closed only after that many succeed, and a single failure throws it back to Open.
Two things drive the concurrency control: state lives in an atomic.Value while counts and expiry are guarded by a sync.Mutex; and a generation — expiry.UnixNano() is the generation number, and afterRequest compares generations to discard results from a stale one.
func (cb *CircuitBreaker) onBeforeRequestHalfOpen() (uint64, error) {
cb.mu.Lock()
defer cb.mu.Unlock()
if cb.halfOpen >= cb.config.MaxRequests {
return 0, ErrTooManyRequests
}
cb.halfOpen++
return uint64(cb.expiry.UnixNano()), nil
}
Note that the half-open admission counter halfOpen is guarded by mu, not an atomic — the gate has to stay consistent with the state/expiry transitions that run under the same lock, or concurrent probes would breach MaxRequests.
Why not channels here
This is where A4’s setup pays off. Follow the actor mindset and funnel every count through a channel into one serialising goroutine, and that goroutine becomes the new global bottleneck — as bad as one global lock, with extra latency on top.
High-frequency counting has the shape of “many goroutines each bumping a number”, best served by atomics (lock-free single value) plus lock sharding (contention spread out). The breaker has the shape of “one small shared state machine”, clearest when guarded by a mutex. Channels earn their place in B6’s “own a piece of state, serialise an event stream” shape. One project: B5 uses atomic/mutex, B6 uses a channel — pick the tool for the shape of the data, not reflexively reach for a channel.
Takeaways
- Lock sharding: an FNV hash spreads metrics across 16 shards, each with its own lock, cutting contention to about 1/16.
- The highest-frequency counters use atomics (lock-free); a
Mutexonly joins when a map must update too; the LRU is per-shard to avoid a global bottleneck. - The breaker’s three states (Closed / Open / Half-Open): state in
atomic.Value, counts under a mutex, a generation to discard stale requests; the half-open gate is deliberatelymu-guarded for consistency. - Why not channels: high-frequency counting and a shared state machine are faster and more direct with atomics/mutexes; channels are for B6’s hub.
- It’s A4’s “don’t fetishise channels” and A5’s “pick the right primitive” made concrete (B5 locks/atomic ↔ B6 channel,
gortex-websocket-actor-hub).
Source: yshengliao/gortex.
Outline by Sheng, drafted with Claude · Go 1.25 (gortex go.mod) · compiled retroactively · part of the 2026-06-13 blog renovation, paint still drying.