Reliability & Fault Tolerance

Reliability at a glance

Most incidents are caused by upstreams (RPCs, vendors, networks). Acceso is built to keep behavior predictable under partial failure.

Where failures usually happen

  • RPC provider degradation.

  • Protocol/API vendor downtime.

  • Network partitions and latency spikes.

  • Hot keys and burst traffic.

Platform reliability layers

  1. Gateway protections (edge safety)

    • Auth + key validation, quotas, and rate limits.

    • Parameter validation and request shaping.

  2. Service isolation (fault containment)

    • Domain services isolate protocol failures and hot spots.

    • One failing venue should not degrade unrelated endpoints.

  3. Upstream resilience

    • Timeouts, bounded retries, circuit breakers, fallbacks.

  4. Caching as resilience

    • Cache-first reads when safe.

    • Derived views for expensive analytics.

  5. Observability

    • Request identifiers for tracing.

    • Metrics/logs to spot upstream degradation quickly.

Client defaults that work well

  • Set client-side timeouts.

  • Retry only on transient failures (5xx, timeouts).

  • Back off hard on 429 (and batch requests when possible).

  • Prefer WebSockets or webhooks for high-frequency updates.

Deep dive

Read the full failure model and degraded-mode behavior in Reliability, Fault Tolerance & Resilience.

Last updated