Webhook 503 Service Unavailable — Causes & Fixes

A webhook 503 Service Unavailable response means your service is intentionally refusing the request because it's not ready to handle it. Unlike 502 (proxy can't reach upstream), 503 is a deliberate signal: maintenance mode, dependency unavailable, or load-shedding kicked in.

Root Causes

1. Maintenance mode page

Your team enabled a "maintenance" toggle that returns 503 globally. Webhook routes shouldn't go through that — they should keep accepting traffic so events queue up and you process them after maintenance ends. Either exempt webhook routes from maintenance mode, or accept that you'll rely on retries to recover.

2. Dependency check returning 503

Your handler verifies that the database, Redis, and a downstream service are all reachable. One is down. You return 503 to fail fast. The webhook source sees 503, retries on schedule. This is the right behavior — but only if the dependency outage is short. For long outages, queueing is better than rejecting.

3. Circuit breaker open

You implemented a circuit breaker (Hystrix-style) on a downstream call. The breaker opened after 5 consecutive failures, and now your handler returns 503 for the duration of the open window. Webhook deliveries during that window are retried by the source — by the time retries arrive, the breaker has half-opened and traffic flows again.

4. Load-shedding by reverse proxy

nginx's limit_req with nodelay can return 503 instead of 429 for over-limit requests. Some load balancers return 503 when all upstream replicas are unhealthy. Both are valid responses; both result in source-side retries.

Fix It

Always include Retry-After

res.status(503)
   .set('Retry-After', '60')   // seconds, or HTTP-date
   .send('temporary outage')

All major webhook senders honor Retry-After. A 503 without Retry-After triggers immediate retry, which is what you don't want during a real outage.

Health-check-aware webhook routing

// Express — health endpoint reflects degraded mode
import { dbPing } from './lib/db'
import { redisPing } from './lib/redis'

app.get('/health', async (req, res) => {
  try {
    await Promise.all([dbPing(), redisPing()])
    res.status(200).json({ status: 'ok' })
  } catch (err) {
    res.status(503).set('Retry-After', '30').json({ status: 'degraded', err: err.message })
  }
})

// Webhook handler stays available even when /health 503s — it queues
// to durable storage that's separate from your hot dependencies.
app.post('/webhooks/stripe',
  express.raw({ type: 'application/json' }),
  webhookHandler  // writes to a queue that's resilient
)

503 vs 5xx Strategy

The key principle: webhook receivers should be the most resilient component in your stack. The receive path should not depend on your application's hot dependencies. Verify, queue, return 200 — even when half of production is down, the queue write should still work, because the queue is the cheapest, simplest dependency you have.

How to Reproduce

Stop your database (don't do this in prod). Fire a test webhook. If your handler returns 503 (because dependency check failed), good — that's defensive. Now route the webhook to a queue that doesn't need the DB to accept writes. Re-fire the test. The handler should return 200 even with the DB down. That's resilience.

Frequently Asked Questions

Should webhook routes return 503 during a deploy?

Briefly during the cutover, yes — but the retry mechanism will handle it. Better: blue-green deploys so there's no 503 window. Acceptable: a few seconds of 503s during rolling restart.

Is 503 better than 500 for transient failures?

Yes. 503 communicates 'temporary, please retry' explicitly; 500 is ambiguous. 503 + Retry-After is the right shape for any transient receive-side failure.

My circuit breaker keeps tripping during webhook bursts. What now?

Either raise the threshold or move the dependency call out of the receive path. The webhook handler itself should never trip a breaker; it should always succeed in writing to the queue.

Debug This Error in Real Time

WebhookWhisper captures every webhook request with full headers, body, and timing — so you can see exactly what the provider sent and reproduce the error instantly.

Start Debugging Free

Related Webhook Errors

502 bad gateway 429 rate limit Webhook retry strategies What is a retry policy? At-least-once delivery All Webhook Errors →