A webhook 500 Internal Server Error means your handler threw an unhandled exception. The provider sees the 5xx, will retry on its schedule, and the duplicate deliveries pile up — meanwhile your application is broken in production and possibly cascading failure to other services.
Root Causes
1. Unhandled exception in business logic
Your handler runs charge.refund(orderId) synchronously and the orders service throws "order not found." The exception bubbles up, Express returns 500. The provider retries. The same exception fires. Loop.
2. Database connection failure
Your handler writes to Postgres for idempotency. Postgres is overloaded or restarting. The connection pool times out, your INSERT throws, your handler 500s. The retry happens at the worst time — when DB is recovering and least able to absorb extra load.
3. Downstream API failure
Your handler synchronously calls a third-party service (analytics, email, fulfillment) inside the request flow. That service times out. Your handler 500s. The provider retries. You re-hammer the slow service.
4. Race condition on rapid retries
The provider retried before your idempotency write committed. Both deliveries try to INSERT the same event ID. Postgres throws unique violation, your handler 500s if you didn't use ON CONFLICT.
Fix It — Receive Fast, Process Async
The single highest-leverage fix is structural: separate receive from process. The webhook handler does only the work that must happen synchronously — signature verification, idempotency check, queue write, return 200. Everything else (refund, email, analytics) runs in a worker that pulls from the queue.
// Webhook handler — fast receive, no business logic
app.post('/webhooks/stripe',
express.raw({ type: 'application/json' }),
async (req, res) => {
try {
const event = stripe.webhooks.constructEvent(
req.body,
req.headers['stripe-signature'],
process.env.STRIPE_WEBHOOK_SECRET
)
// Idempotent insert — duplicates are no-ops, not errors
const { rowCount } = await db.query(
'INSERT INTO webhook_events (event_id, type, body) VALUES ($1, $2, $3) ON CONFLICT (event_id) DO NOTHING',
[event.id, event.type, req.body]
)
if (rowCount === 0) {
return res.status(200).send('duplicate')
}
// Queue the work — do not run it inline
await queue.send('webhook.process', { eventId: event.id })
res.status(200).send('queued')
} catch (err) {
// Only 500 on truly transient errors (DB unavailable, queue down)
log.error({ err }, 'webhook receive failure')
res.status(500).send('retryable')
}
}
)
When 500 Is Actually The Right Answer
You should return 500 if the failure is genuinely transient — DB connection lost, queue unreachable, your service is mid-deploy. Don't return 500 for permanent failures (signature mismatch → 401, malformed body → 400, idempotent duplicate → 200).
Observability for 500s
Every 500 needs a structured log line with the event ID, the error class, and a stack trace fingerprint. When you see a spike of 500s on the dashboard, you should be able to find the matching log line in seconds — without that, you're guessing why retries are arriving.
How to Reproduce
Deliberately make your downstream slow (set setTimeout(..., 30000) in your dev handler) and fire a test webhook. The provider's 30s timeout fires, retries arrive, and you can verify your handler doesn't compound the failure with bad retry behavior.
Frequently Asked Questions
How long do providers retry 500s?
Stripe: 3 days, exponential backoff. GitHub: ~24 hours, 8 attempts. Shopify: 48 hours, 19 attempts. Slack: 30 minutes, 3 attempts. The retry window is long — a stuck 500 means days of duplicate deliveries.
Should I email myself on every webhook 500?
No — alert on the rate (e.g., '>10 500s/min') or on a stuck 500 ('same event ID failing for >5 minutes'). Per-event alerting drowns you in noise during incidents.
My handler returns 500 once and the event never retries. Why?
Either the provider doesn't retry that event type, or your monitoring is missing the retries. Check the provider's dashboard: Stripe shows 'Recent webhook attempts' per event with timestamps.