Skip to main content

Retry and Fallback

Retry a flaky backend up to three times on transient failure, then serve a deterministic fallback response if all retries are exhausted. The client always receives a structured response — even when the upstream is down.

Typical scenario: Your inventory service occasionally returns 503 Service Unavailable during peak load. You want the gateway to retry up to three times with a short delay, and if the service is still unreachable, return a cached or static fallback response instead of surfacing a raw 5xx to the client.


Prerequisites

  • You are signed in to the Management UI as an admin or editor
  • A collection and proxy exist (e.g. inventory-api, GET /inventory/{itemId})
  • The upstream service URL is known (e.g. https://inventory.internal/items/{itemId})

Workflow overview


Step 1 — Open the workflow canvas

  1. Open Collections → inventory-api.
  2. Select the GET /inventory/{itemId} proxy.
  3. Click the Workflow tab.

Step 2 — Add the first attempt node

  1. Drag an http_request_node from the palette, connect from http_trigger.
  2. Configure:
    • Name: attempt-1
    • Method: GET
    • URL: https://inventory.internal/items/{{trigger.params.itemId}}
    • Timeout: 3000ms

Step 3 — Add condition after attempt 1

  1. Drag a Condition node and connect from attempt-1.
  2. Name it check-1.
  3. Condition expression:
{{attempt-1.response.status}} >= 500 || {{attempt-1.timedOut}} == true
  • True path (failure): continue to retry
  • False path (success): go directly to the success response node

Step 4 — Add delay and retry nodes

After check-1 true branch (first retry):

  1. Drag a Delay node, connect from check-1 (true).
  2. Set delay to 500ms. Name it wait-1.
  3. Add an http_request_node after wait-1:
    • Name: attempt-2
    • Same URL and timeout as attempt-1

Add condition after attempt 2:

  1. Drag another Condition node from attempt-2.
  2. Name it check-2.
  3. Expression: {{attempt-2.response.status}} >= 500 || {{attempt-2.timedOut}} == true
    • True → second retry
    • False → success response

After check-2 true branch (second retry):

  1. Add Delay node with 1000ms. Name it wait-2.
  2. Add http_request_node after wait-2:
    • Name: attempt-3
    • Same URL and timeout

Step 5 — Add the fallback response node

Connect from attempt-3 when it also fails (status ≥ 500 or timeout).

  1. Drag a Response node. Name it fallback-response.
  2. Configure:
    • Status: 503
    • Body:
      {
      "error": "service_unavailable",
      "message": "The inventory service is temporarily unavailable. Please try again shortly.",
      "retryAfter": 30
      }
    • Headers: Retry-After: 30, Content-Type: application/json

Step 6 — Add the success response node

Connect from:

  • check-1 false branch (attempt-1 succeeded)
  • check-2 false branch (attempt-2 succeeded)
  • attempt-3 false branch (attempt-3 succeeded)
  1. Drag a Response node. Name it success-response.
  2. Configure:
    • Status: {{lastSuccessfulAttempt.response.status}}
    • Body: {{lastSuccessfulAttempt.response.body}}
    • Content-Type: application/json
tip

Use a set_node before this response node to capture whichever attempt succeeded:

successBody = {{attempt-1.response.body || attempt-2.response.body || attempt-3.response.body}}

Step 7 — Save and publish

  1. Click Save on the canvas.
  2. Click Publish on the proxy.

Verify

Happy path (service is healthy):

curl -i https://gateway.example.com/inventory-api/inventory/item-123 \
-H "X-Client-ID: <client-id>" \
-H "Authorization: Bearer <token>" \
-H "X-Profile-ID: <profile-id>"

Expected: 200 with inventory body. Latency should be approximately equal to one upstream call.

Simulated failure (temporarily block upstream):

Temporarily misconfigure the upstream URL in attempt-1, attempt-2, and attempt-3 to force timeouts.

Expected: 503 response with the fallback body after ~4.5 seconds (3 × timeout + 2 × delay).

Check logs:

Open Logs → Request Logs, select the failed request. You should see three workflow steps for each attempt, followed by the fallback response node.


Tuning retry parameters

ParameterGuidance
Attempt timeoutSet to p99 upstream latency + 20%. Start with 2–3 seconds.
Delay between retriesUse exponential backoff: 500ms → 1s → 2s for three attempts.
Number of retriesTwo or three retries are sufficient for transient errors. More retries increase client-perceived latency.
Fallback bodyUse RFC 7807 problem format or your own error envelope for consistency.

Extending this recipe

Cache-based fallback: Replace the static fallback response with a redis_node lookup that returns the last known good response for this item ID.

Circuit breaker pattern: Track consecutive failures in a code_node using a counter stored in Redis. After N failures in a window, route directly to fallback without attempting upstream.


Next steps