Page cover

Reliability, Fault Tolerance & Resilience

Failure containment

Acceso is designed to keep serving traffic under partial failures. Most failures happen upstream (RPCs, vendors, networks).

  • Service isolation keeps failures inside domain boundaries.

  • Gateway controls prevent overload from reaching the core.

  • Caching reduces dependency on unstable upstreams.

Gateway controls (edge safety)

The gateway is the choke point. It keeps abusive or accidental overload away from domain services.

  • Auth and key validation.

  • Quotas and rate limits.

  • Parameter validation and sanitization.

  • Request identifiers propagated end-to-end.

Service isolation (blast-radius control)

Each domain service owns its upstreams, parsers, cache keys, and failure modes.

  • A failing upstream should not affect other domains.

  • Rollouts can be domain-scoped.

  • Hot spots scale independently.

Upstream resilience patterns

Treat upstreams as unreliable by default.

  • Timeouts to bound tail latency.

  • Retries with exponential backoff for transient failures.

  • Circuit breakers to stop hammering degraded dependencies.

  • Fallback strategies when multiple upstreams exist.

Degraded-mode behavior (explicit, structured)

When safe, Acceso may serve cached normalized data to preserve availability.

  • Prefer "stale but correct shape" over "random upstream shape".

  • Degradation should be visible via consistent errors and telemetry.

Observability (make incidents diagnosable)

  • Clients receive consistent error formats.

  • Responses avoid leaking internal details.

  • Request identifiers make incident diagnosis fast.

Client-side guidance

  • Set timeouts shorter than your overall workflow SLA.

  • Retry only on transient failures (5xx, timeouts).

  • Back off hard on 429.

  • Use WebSockets/webhooks for high-frequency updates.

Last updated