Reliability & Fault Tolerance
Reliability at a glance
Most incidents are caused by upstreams (RPCs, vendors, networks). Acceso is built to keep behavior predictable under partial failure.
Where failures usually happen
RPC provider degradation.
Protocol/API vendor downtime.
Network partitions and latency spikes.
Hot keys and burst traffic.
Platform reliability layers
Gateway protections (edge safety)
Auth + key validation, quotas, and rate limits.
Parameter validation and request shaping.
Service isolation (fault containment)
Domain services isolate protocol failures and hot spots.
One failing venue should not degrade unrelated endpoints.
Upstream resilience
Timeouts, bounded retries, circuit breakers, fallbacks.
Caching as resilience
Cache-first reads when safe.
Derived views for expensive analytics.
Observability
Request identifiers for tracing.
Metrics/logs to spot upstream degradation quickly.
Client defaults that work well
Do not infinite-retry on the client. Use bounded retries with backoff, or switch to streams/webhooks.
Set client-side timeouts.
Retry only on transient failures (
5xx, timeouts).Back off hard on
429(and batch requests when possible).Prefer WebSockets or webhooks for high-frequency updates.
Deep dive
Read the full failure model and degraded-mode behavior in Reliability, Fault Tolerance & Resilience.
Last updated