Retry and Fallback
Retry a flaky backend up to three times on transient failure, then serve a deterministic fallback response if all retries are exhausted. The client always receives a structured response — even when the upstream is down.
Typical scenario: Your inventory service occasionally returns 503 Service Unavailable during peak load. You want the gateway to retry up to three times with a short delay, and if the service is still unreachable, return a cached or static fallback response instead of surfacing a raw 5xx to the client.
Prerequisites
- You are signed in to the Management UI as an admin or editor
- A collection and proxy exist (e.g.
inventory-api,GET /inventory/{itemId}) - The upstream service URL is known (e.g.
https://inventory.internal/items/{itemId})
Workflow overview
Step 1 — Open the workflow canvas
- Open Collections → inventory-api.
- Select the
GET /inventory/{itemId}proxy. - Click the Workflow tab.
Step 2 — Add the first attempt node
- Drag an
http_request_nodefrom the palette, connect fromhttp_trigger. - Configure:
- Name:
attempt-1 - Method:
GET - URL:
https://inventory.internal/items/{{trigger.params.itemId}} - Timeout:
3000ms
- Name:
Step 3 — Add condition after attempt 1
- Drag a Condition node and connect from
attempt-1. - Name it
check-1. - Condition expression:
{{attempt-1.response.status}} >= 500 || {{attempt-1.timedOut}} == true
- True path (failure): continue to retry
- False path (success): go directly to the success response node
Step 4 — Add delay and retry nodes
After check-1 true branch (first retry):
- Drag a Delay node, connect from
check-1(true). - Set delay to
500ms. Name itwait-1. - Add an
http_request_nodeafterwait-1:- Name:
attempt-2 - Same URL and timeout as
attempt-1
- Name:
Add condition after attempt 2:
- Drag another Condition node from
attempt-2. - Name it
check-2. - Expression:
{{attempt-2.response.status}} >= 500 || {{attempt-2.timedOut}} == true- True → second retry
- False → success response
After check-2 true branch (second retry):
- Add Delay node with
1000ms. Name itwait-2. - Add
http_request_nodeafterwait-2:- Name:
attempt-3 - Same URL and timeout
- Name:
Step 5 — Add the fallback response node
Connect from attempt-3 when it also fails (status ≥ 500 or timeout).
- Drag a Response node. Name it
fallback-response. - Configure:
- Status:
503 - Body:
{
"error": "service_unavailable",
"message": "The inventory service is temporarily unavailable. Please try again shortly.",
"retryAfter": 30
} - Headers:
Retry-After: 30,Content-Type: application/json
- Status:
Step 6 — Add the success response node
Connect from:
check-1false branch (attempt-1 succeeded)check-2false branch (attempt-2 succeeded)attempt-3false branch (attempt-3 succeeded)
- Drag a Response node. Name it
success-response. - Configure:
- Status:
{{lastSuccessfulAttempt.response.status}} - Body:
{{lastSuccessfulAttempt.response.body}} - Content-Type:
application/json
- Status:
Use a set_node before this response node to capture whichever attempt succeeded:
successBody = {{attempt-1.response.body || attempt-2.response.body || attempt-3.response.body}}
Step 7 — Save and publish
- Click Save on the canvas.
- Click Publish on the proxy.
Verify
Happy path (service is healthy):
curl -i https://gateway.example.com/inventory-api/inventory/item-123 \
-H "X-Client-ID: <client-id>" \
-H "Authorization: Bearer <token>" \
-H "X-Profile-ID: <profile-id>"
Expected: 200 with inventory body. Latency should be approximately equal to one upstream call.
Simulated failure (temporarily block upstream):
Temporarily misconfigure the upstream URL in attempt-1, attempt-2, and attempt-3 to force timeouts.
Expected: 503 response with the fallback body after ~4.5 seconds (3 × timeout + 2 × delay).
Check logs:
Open Logs → Request Logs, select the failed request. You should see three workflow steps for each attempt, followed by the fallback response node.
Tuning retry parameters
| Parameter | Guidance |
|---|---|
| Attempt timeout | Set to p99 upstream latency + 20%. Start with 2–3 seconds. |
| Delay between retries | Use exponential backoff: 500ms → 1s → 2s for three attempts. |
| Number of retries | Two or three retries are sufficient for transient errors. More retries increase client-perceived latency. |
| Fallback body | Use RFC 7807 problem format or your own error envelope for consistency. |
Extending this recipe
Cache-based fallback: Replace the static fallback response with a redis_node lookup that returns the last known good response for this item ID.
Circuit breaker pattern: Track consecutive failures in a code_node using a counter stored in Redis. After N failures in a window, route directly to fallback without attempting upstream.
Next steps
- Workflow Nodes — Delay — delay node reference
- Workflow Nodes — Condition — condition expression reference
- Workflow Templates — Resiliency — resiliency template
- Request Logs — debugging retry behavior