What is a retry policy?
A retry policy is the source's rule for re-sending a webhook after a delivery fails — typically on a backoff schedule with a max attempt cap. The shape is three numbers: initial delay, backoff multiplier, and max duration or attempts. Stripe retries up to 3 days with exponential backoff; GitHub retries 8 times across ~24 hours; Shopify retries up to 19 times over 48 hours. What counts as a failure varies — universally HTTP 5xx, timeouts, TLS errors, and DNS failures; sometimes 4xx. Retries arrive with the same event ID, so dedupe on it. Plan for 3-10x normal volume during recovery from an outage.
Webhooks fail. Receivers go down, certificates expire, networks blip, deploys take 30 seconds. The retry policy is what the source does when a delivery doesn't get a 2xx back. Every major provider has one; they vary in aggressiveness and duration.
The shape of a policy is three numbers: initial delay (how soon to retry after the first failure), backoff multiplier (how the delay grows between attempts), and max duration / max attempts (when to give up). Stripe retries up to 3 days with exponential backoff. GitHub retries 8 times across ~24 hours. Shopify retries up to 19 times over 48 hours. Slack retries up to 3 times in the first ~30 minutes, then stops.
What counts as a failure that triggers retry varies. Universally: HTTP 5xx, connection timeout, TLS error, DNS failure. Often: HTTP 4xx (some providers retry 4xx, others treat them as terminal). Sometimes: long-running 2xx (if your handler takes 30 seconds to respond, the source may have already considered it a timeout and started retrying).
What the receiver must understand about retries:
- Retries arrive with the same event ID. This is the core idempotency guarantee. Your handler must dedupe on event ID — every event will be delivered at least once, possibly more. - Retries can interleave with new events. Don't assume sequential delivery. Event A's first attempt may interleave with event B's first attempt and event A's second attempt. - Retry storms compound load. If your handler fails for an hour, a flood of retries arrives when you recover. Plan for 3-10x normal volume during recovery. Queue ingest, don't synchronously process. - You can't ask for fewer retries. The policy is set by the source. Some providers expose dashboards to disable retries per endpoint; most don't. Design for the worst-case retry pattern.
When retries finally exhaust, the event is dead — the source gives up. Most provider dashboards show "failed deliveries" so you can manually trigger a redelivery later. Some support automatic dead-letter handling or webhook-replay APIs (Stripe, GitHub). For mission-critical events, build alerting on "stale events in the source dashboard" so you don't miss exhaustions.
Don't try to "be polite" by intentionally returning 500 to slow retries. Sources interpret 500 as "your endpoint is broken," not "I'm busy" — they'll retry harder, not less, and may eventually disable the endpoint.
See Retry Policy in real traffic
WebhookWhisper captures every webhook with full headers, body, signature, and timing — so concepts like retry policy stop being abstract and become something you can inspect.
Start Free