Retry and Fallback

Retry a flaky backend up to three times on transient failure, then serve a deterministic fallback response if all retries are exhausted. The client always receives a structured response — even when the upstream is down.

Typical scenario: Your inventory service occasionally returns 503 Service Unavailable during peak load. You want the gateway to retry up to three times with a short delay, and if the service is still unreachable, return a cached or static fallback response instead of surfacing a raw 5xx to the client.

Prerequisites

You are signed in to the Management UI as an admin or editor
A collection and proxy exist (e.g. inventory-api, GET /inventory/{itemId})
The upstream service URL is known (e.g. https://inventory.internal/items/{itemId})

Workflow overview

Step 1 — Open the workflow canvas

Open Collections → inventory-api.
Select the GET /inventory/{itemId} proxy.
Click the Workflow tab.

Step 2 — Add the first attempt node

Drag an http_request_node from the palette, connect from http_trigger.
Configure:
- Name: attempt-1
- Method: GET
- URL: https://inventory.internal/items/{{trigger.params.itemId}}
- Timeout: 3000ms

Step 3 — Add condition after attempt 1

Drag a Condition node and connect from attempt-1.
Name it check-1.
Condition expression:

{{attempt-1.response.status}} >= 500 || {{attempt-1.timedOut}} == true

True path (failure): continue to retry
False path (success): go directly to the success response node

Step 4 — Add delay and retry nodes

After check-1 true branch (first retry):

Drag a Delay node, connect from check-1 (true).
Set delay to 500ms. Name it wait-1.
Add an http_request_node after wait-1:
- Name: attempt-2
- Same URL and timeout as attempt-1

Add condition after attempt 2:

Drag another Condition node from attempt-2.
Name it check-2.
Expression: {{attempt-2.response.status}} >= 500 || {{attempt-2.timedOut}} == true
- True → second retry
- False → success response

After check-2 true branch (second retry):

Add Delay node with 1000ms. Name it wait-2.
Add http_request_node after wait-2:
- Name: attempt-3
- Same URL and timeout

Step 5 — Add the fallback response node

Connect from attempt-3 when it also fails (status ≥ 500 or timeout).

Drag a Response node. Name it fallback-response.

Configure:

Status: 503

Body:

{
  "error": "service_unavailable",
  "message": "The inventory service is temporarily unavailable. Please try again shortly.",
  "retryAfter": 30
}

Headers: Retry-After: 30, Content-Type: application/json

Step 6 — Add the success response node

Connect from:

check-1 false branch (attempt-1 succeeded)
check-2 false branch (attempt-2 succeeded)
attempt-3 false branch (attempt-3 succeeded)

Drag a Response node. Name it success-response.
Configure:
- Status: {{lastSuccessfulAttempt.response.status}}
- Body: {{lastSuccessfulAttempt.response.body}}
- Content-Type: application/json

tip

Use a set_node before this response node to capture whichever attempt succeeded:

successBody = {{attempt-1.response.body || attempt-2.response.body || attempt-3.response.body}}

Step 7 — Save and publish

Click Save on the canvas.
Click Publish on the proxy.

Verify

Happy path (service is healthy):

curl -i https://gateway.example.com/inventory-api/inventory/item-123 \
  -H "X-Client-ID: <client-id>" \
  -H "Authorization: Bearer <token>" \
  -H "X-Profile-ID: <profile-id>"

Expected: 200 with inventory body. Latency should be approximately equal to one upstream call.

Simulated failure (temporarily block upstream):

Temporarily misconfigure the upstream URL in attempt-1, attempt-2, and attempt-3 to force timeouts.

Expected: 503 response with the fallback body after ~4.5 seconds (3 × timeout + 2 × delay).

Check logs:

Open Logs → Request Logs, select the failed request. You should see three workflow steps for each attempt, followed by the fallback response node.

Tuning retry parameters

Parameter	Guidance
Attempt timeout	Set to p99 upstream latency + 20%. Start with 2–3 seconds.
Delay between retries	Use exponential backoff: 500ms → 1s → 2s for three attempts.
Number of retries	Two or three retries are sufficient for transient errors. More retries increase client-perceived latency.
Fallback body	Use RFC 7807 problem format or your own error envelope for consistency.

Extending this recipe

Cache-based fallback: Replace the static fallback response with a redis_node lookup that returns the last known good response for this item ID.

Circuit breaker pattern: Track consecutive failures in a code_node using a counter stored in Redis. After N failures in a window, route directly to fallback without attempting upstream.

Next steps

Workflow Nodes — Delay — delay node reference
Workflow Nodes — Condition — condition expression reference
Workflow Templates — Resiliency — resiliency template
Request Logs — debugging retry behavior

Prerequisites​

Workflow overview​

Step 1 — Open the workflow canvas​

Step 2 — Add the first attempt node​

Step 3 — Add condition after attempt 1​

Step 4 — Add delay and retry nodes​

Step 5 — Add the fallback response node​

Step 6 — Add the success response node​

Step 7 — Save and publish​

Verify​

Tuning retry parameters​

Extending this recipe​

Next steps​