Back to Blog
Best Practices18 min readApril 21, 2026

Webhook Best Practices: The Definitive Guide (2026)

Best-practices lists are usually marketing copy. This is not one. These are the patterns I have watched companies adopt only after an incident. Most are easy on day one of an integration and impossible to retrofit cleanly after a year of accumulated handler code.

A
Founder, WebhookWhisper · April 21, 2026

"Best practices" lists are usually marketing copy disguised as advice. This isn't one. These are the patterns I've watched companies adopt only after an incident — duplicate fulfilment, silent data loss, a 3am page about a webhook queue that backed up because one slow handler downstream blocked everything else. Most of them are easy to implement on day one of a webhook integration, and impossible to retrofit cleanly after a year of accumulated handler code.

I run WebhookWhisper, which means I've designed for receivers and as of 2026 also for senders (we sign every event we forward — see our HMAC signing docs). I've also reviewed code at three startups in the last year that ran into one or more of these problems in production. The order below is roughly the order I'd implement them on a new integration today.

1. Verify signatures, every request, no exceptions

An unverified webhook endpoint is a public POST that triggers business logic. Anyone who guesses or leaks the URL can fire payloads at it. The cost of skipping verification is concrete: forged payment_intent.succeeded shipping a product, forged customer.subscription.deleted revoking access to a paying customer, forged checkout.session.completed granting digital goods that were never paid for. None of these are theoretical — all three have happened to companies I've consulted with.

Every major provider signs with HMAC-SHA256. The shape of the signature header varies (Stripe-Signature, X-Hub-Signature-256, X-Shopify-Hmac-Sha256, X-Twilio-Signature) but the algorithm is the same: HMAC of the raw bytes with a per-endpoint secret, hex-encoded, sometimes with a timestamp prefix.

The canonical generic verifier in Node.js, with the two non-obvious pieces in comments:

import crypto from 'crypto'

function verifyHmac(rawBody, receivedSig, secret) {
  const expected = crypto
    .createHmac('sha256', secret)
    .update(rawBody)               // Buffer or string of EXACT bytes
    .digest('hex')

  // Constant-time comparison: a naive == leaks the secret over many requests
  const expectedBuf = Buffer.from(expected, 'hex')
  const receivedBuf = Buffer.from(receivedSig, 'hex')
  if (expectedBuf.length !== receivedBuf.length) return false
  return crypto.timingSafeEqual(expectedBuf, receivedBuf)
}

Two mistakes I see constantly. First: using === instead of timingSafeEqual. The first version of every signature verifier I've ever written, including in WebhookWhisper itself, did this. The naive comparison returns false on the first differing character, and the time it takes to do that is observable on the wire. A determined attacker can fish out the secret byte-by-byte over millions of requests. Always use the timing-safe primitive (crypto.timingSafeEqual in Node, hmac.compare_digest in Python, hmac.Equal in Go).

Second: passing a re-serialized JSON object instead of the raw request body. Covered in §2.

Provider-specific deep guides for the major receivers: Stripe, and the free signature playground covers Stripe / GitHub / Shopify / Slack / generic HMAC interactively.

2. Pass the raw bytes, never a parsed object

This is the single most common cause of "signature verification fails" tickets. HMAC is a checksum over bytes. If your HTTP framework parses the JSON body before your handler reads it, the bytes are gone — what's left is a JavaScript object (or Python dict, or Go struct). Re-stringifying that object will produce different bytes than what the provider signed: keys may be in a different order, whitespace will be normalized, Unicode escapes may be expanded, trailing newlines may be stripped.

The fix is framework-specific:

  • Express: express.raw({ type: 'application/json' }) on the route, before any global express.json().
  • Next.js (App Router): const body = await req.text() in the route handler.
  • Next.js (Pages Router): export const config = { api: { bodyParser: false } }.
  • Django: request.body for raw bytes; mark the view @csrf_exempt.
  • Flask: request.get_data(); do not call request.get_json() first.
  • FastAPI: await request.body(), not await request.json().
  • Rails: request.body.read, with skip_before_action :verify_authenticity_token.
  • Go: io.ReadAll(r.Body) — the standard library cooperates here.
  • PHP: file_get_contents('php://input').

If you're operating behind a CDN or WAF (Cloudflare, API Gateway, NGINX with body buffering), you have a second body-mutation surface to worry about. Disable any "minify response" or "JSON pretty-print" feature on the webhook route. If you can't, verify on the edge — closer to ingress — before the body goes through the proxy.

3. Acknowledge fast, do the work asynchronously

Every major provider times out webhook delivery if you don't respond in time. Stripe gives you 30 seconds, GitHub 10, Shopify 5. If your handler does heavy synchronous work — database writes, email sends, external API calls — you'll hit the timeout under load and the provider will retry on exponential backoff, which means your slow handler will run again, which means more retries.

The canonical pattern: verify, persist the raw event durably, enqueue, return 200. The handler does no business work — it only enqueues durable work for a worker to do.

app.post('/webhooks/stripe',
  express.raw({ type: 'application/json' }),
  async (req, res) => {
    let event
    try {
      event = stripe.webhooks.constructEvent(
        req.body, req.headers['stripe-signature'], secret
      )
    } catch (err) {
      return res.status(400).send(`Webhook Error: ${err.message}`)
    }

    // Persist the raw event before acknowledging.
    // If the worker dies, we can replay from this row.
    await db.webhookEvents.insert({
      provider: 'stripe',
      eventId: event.id,
      type: event.type,
      payload: req.body,
      receivedAt: new Date(),
    })

    // Enqueue the actual work
    await queue.add('process-stripe-event', { eventId: event.id })

    // Acknowledge immediately
    res.json({ received: true })
  }
)

The "persist before acknowledge" line matters. If you only enqueue, and your queue is in-memory (Redis without persistence, in-process Bee-Queue), a server crash between queue.add and the worker pickup loses the event. The provider thinks delivery succeeded; you have nothing. A persisted row in Postgres is your durable record. Worker reads from Postgres, processes, marks the row complete.

4. Make every handler idempotent

Every major provider uses at-least-once delivery semantics. That phrase has a specific meaning: your handler will receive duplicate events under normal operation. It is not a misconfiguration; it is the contract. If your code is not idempotent, you fulfill orders twice, charge customers twice, send the same email twice. These are user-visible bugs that require manual cleanup.

The minimal idempotency pattern: deduplicate on the provider's event ID before doing any work, using an atomic insert.

CREATE TABLE processed_webhook_events (
  provider     TEXT NOT NULL,
  event_id     TEXT NOT NULL,
  started_at   TIMESTAMPTZ NOT NULL,
  completed_at TIMESTAMPTZ,
  PRIMARY KEY (provider, event_id)
);

-- In your handler:
async function handleEvent(provider, eventId, payload) {
  const result = await db.query(`
    INSERT INTO processed_webhook_events (provider, event_id, started_at)
    VALUES ($1, $2, NOW())
    ON CONFLICT (provider, event_id) DO NOTHING
    RETURNING event_id
  `, [provider, eventId])

  if (result.rowCount === 0) {
    // Another worker is already processing this, or already done.
    return { deduplicated: true }
  }

  await processEvent(payload)
  await db.query(`
    UPDATE processed_webhook_events SET completed_at = NOW()
    WHERE provider = $1 AND event_id = $2
  `, [provider, eventId])
}

The atomic INSERT ... ON CONFLICT DO NOTHING is what makes this safe under concurrent retries. Don't write "check if exists, then insert" in two queries — there's a race window between them where two workers can both pass the check and both do the work.

The provider's event ID is your idempotency key: event.id for Stripe, X-GitHub-Delivery header for GitHub, id in the payload for Shopify, EventSid header for Twilio. The full picture of how this composes with retries lives in our retry-and-idempotency reference.

5. Validate timestamps where the provider supports them

Stripe and a few others include a timestamp in the signature header. Without timestamp validation, a captured signed webhook (e.g. from a leaked log file) stays valid forever — an attacker who gains read access to your logs can replay any past event and your handler will accept it.

Stripe's constructEvent defaults to a 300-second tolerance. Don't disable it in production. In development, if you need to replay an old captured event, pass a longer tolerance to constructEvent for that specific test, never globally.

For providers that don't include a timestamp (GitHub, Shopify), there's no signature-level replay protection. Your defense is rate limiting and the idempotency check from §4 — the same event won't be processed twice even if replayed, but you can still be flooded with replay-attempt traffic. See §7 for rate limiting.

6. Log structurally, with the fields that actually help in incidents

Webhook bugs are forensic. By the time you start debugging, the failed delivery already happened. The only artifact is what you logged at the time. The fields that matter, in priority order:

  • Provider — Stripe, GitHub, Shopify, etc.
  • Event ID — the provider's unique identifier so you can cross-reference their delivery log.
  • Event typepayment_intent.succeeded, customer.subscription.deleted, etc.
  • Body hash — SHA-256 of the raw body. Lets you confirm whether middleware mutated bytes between ingress and your handler.
  • Signature header presence — true/false. Catches the case where a reverse proxy is silently stripping it.
  • Verification status — pass / fail with the specific error.
  • Time-to-200 (ms) — how long the handler took to acknowledge. Spikes here are the early signal of upcoming timeouts.
  • Worker outcome — separate log line from the async worker, joined to the receiver log by event ID.
// One structured log line per webhook receive
log.info({
  provider: 'stripe',
  eventId: event.id,
  eventType: event.type,
  bodyHash: sha256(req.body),
  hasSignature: !!req.headers['stripe-signature'],
  verificationStatus: 'ok',
  timeToAckMs: Date.now() - startTime,
}, 'webhook_received')

Avoid logging the raw body in plaintext at info level — webhook payloads contain customer data, sometimes payment metadata, sometimes secrets. Log the hash. If you need the body for forensics, log it at debug level only with retention bounded.

7. Rate-limit your webhook endpoints

Even with signature verification, an unverified-flood is bad: the verifier still has to run, which means CPU, which means your endpoint can be DoS'd by attackers spraying invalid signatures. Apply rate limiting at the network edge before your application code runs.

The dimensions to rate-limit on:

  • Per provider IP range — the provider publishes IPs (Stripe, GitHub, Shopify all do). Allow only those at the firewall.
  • Per endpoint — even legitimate provider traffic has an upper bound. A burst above that bound is either a bug on their side or an attack from someone who spoofed their IP.
  • Per event type — if your handler has different downstream cost for different event types (subscription.created is cheap, charge.dispute.created triggers a multi-step workflow), rate-limit them separately.
// Express + express-rate-limit
const webhookLimiter = rateLimit({
  windowMs: 60_000,             // 1 minute
  max: 1000,                    // generous for legitimate provider traffic
  keyGenerator: req => req.ip,  // per source IP
  message: { error: 'rate_limited' }
})

app.post('/webhooks/stripe', webhookLimiter, /* ...rest of stack */)

Don't set the limit so low that legitimate provider traffic gets rejected. Stripe can deliver hundreds of events per minute during a payment surge. Tune to your real peak plus headroom. (When the limit does fire, the right HTTP response is 429 Too Many Requests with a Retry-After header, not a 5xx.)

8. Use a dead-letter queue for events that exhaust retries

Even with idempotent, fast, signature-verifying handlers, things will fail. A downstream service is down. A schema change breaks a handler version mismatch. A bug in the worker code crashes on a specific event subtype.

The provider will retry up to its limit and then give up. If you don't have a dead-letter queue, the event is gone. For payment events, subscription state changes, and order fulfilment triggers, this is unacceptable.

The minimal DLQ pattern: when the worker exhausts its own retries on an event, write the event to a separate webhook_dlq table along with the error and timestamp. Page on inserts to that table. Build a small admin page that lets you inspect, requeue, or manually resolve DLQ entries.

CREATE TABLE webhook_dlq (
  id          BIGSERIAL PRIMARY KEY,
  provider    TEXT NOT NULL,
  event_id    TEXT NOT NULL,
  event_type  TEXT,
  payload     BYTEA NOT NULL,
  error       TEXT NOT NULL,
  attempt     INT NOT NULL,
  failed_at   TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  resolved_at TIMESTAMPTZ
);

CREATE INDEX idx_webhook_dlq_unresolved
  ON webhook_dlq (failed_at) WHERE resolved_at IS NULL;

The DLQ is the safety net. The forwarding inspector (§9) is the offensive line. Both matter.

9. Put a capture-and-forward inspector in front of every production webhook

This is the single highest-leverage piece of webhook infrastructure I know, and it's the most often skipped. The setup: provider POSTs to a public inspector URL (WebhookWhisper, Hookdeck, Svix, whoever); the inspector stores every event durably; the inspector forwards to your real handler. Your handler still does all the work. The inspector adds two superpowers:

  1. Forensics. Every event is captured with full headers and raw body. When something breaks at 3am, you have the exact bytes the provider sent. Not a parsed JSON guess — bytes.
  2. Replay. When you fix a bug that's been live for two weeks, you don't have to ask the provider to resend events one at a time. You batch-replay from the inspector. The exact original payloads, fired at your fixed handler, in minutes.

The "we'll add this when we need it" trap: by the time you need it (the day a bug ships and a hundred events failed), it's too late. The events you needed to replay weren't captured. Add the inspector before you need it.

This is the wedge product I built WebhookWhisper around — persistent URL, forwarding to localhost or production, durable retention (7 days on Free, 14 on Starter, 30 on Pro), replay built in, no CLI, free tier covers the majority of teams. Hookdeck and Svix do similar things at the upper end of the market. Use whichever fits your scale; the pattern matters more than the vendor.

10. Rotate signing secrets carefully, with a grace period

Secrets leak. A teammate offboards with the dev .env on their laptop, a secret accidentally lands in a public commit (the GitHub secret-scanning bot will catch this within minutes — assume the secret is compromised the moment it's committed), or a CI log captures it. When you rotate, do it without losing in-flight events.

The grace-period rotation pattern, supported natively by Stripe and most other providers:

  1. Generate a new signing secret in the provider dashboard. The provider shows both old and new for a configurable grace period (Stripe defaults to 24 hours).
  2. Deploy your handler to accept either the old or the new secret. Your verifier tries the new one first, falls back to the old one. Log which one matched — you should see traffic shift from "old" to "new" over the grace window.
  3. After the grace period, expire the old secret in the dashboard. Deploy your handler to only accept the new one. The "old" log line should be at zero by now.
function verifyWithGrace(rawBody, sig, secrets) {
  for (const [label, secret] of Object.entries(secrets)) {
    if (verifyHmac(rawBody, sig, secret)) {
      log.info({ matched: label }, 'webhook_verified')
      return true
    }
  }
  return false
}

// Usage during grace window:
verifyWithGrace(req.body, sig, {
  current: process.env.WEBHOOK_SECRET_NEW,
  legacy:  process.env.WEBHOOK_SECRET_OLD,
})

Without a grace period, you drop every event in flight when the rotation hits. Don't do that. The mechanics here have a glossary entry of their own — signing secret rotation. (And if a rotation goes wrong and starts producing signature mismatch errors, that's the runbook to walk back through.)

11. Test the failure paths, not just the happy path

Most webhook integrations get tested for payment_intent.succeeded and shipped. The hard bugs live in payment_intent.payment_failed (does your retry-payment flow work?), customer.subscription.deleted (does access actually get revoked?), charge.dispute.created (does someone get paged?), invoice.payment_failed (does dunning kick in?). The cost of skipping these is not "a bug" but "a class of bug that only fires when something else has gone wrong" — i.e. the worst time.

The minimum bar for an integration to be considered tested:

  • Every event type you've subscribed to has been fired at your handler at least once with a real payload, in a test environment.
  • Your handler has been tested with a malformed JSON body (verifier should reject 400, not crash).
  • Your handler has been tested with the wrong secret (verifier should reject 401).
  • Your handler has been tested with the same event ID twice in a row (idempotency check should kick in on the second).
  • Your handler has been tested with a deliberately slow downstream (the timeout-and-retry path should produce one DLQ entry, not duplicate processing).

The Stripe webhook testing tool and similar provider-specific pages let you fire each event type without burning a real charge. The browser-side HMAC tester generates valid headers for arbitrary payloads so you can synthesize the malformed-body and wrong-secret cases.

The 11-point production checklist

Compressed for the day you're writing your runbook:

  • ☑ Verify HMAC signatures on every request, with timing-safe comparison.
  • ☑ Pass raw bytes to the verifier; never a re-serialized object.
  • ☑ Acknowledge in under 5 seconds; do all work in a queue.
  • ☑ Persist the raw event before acknowledging — the queue alone isn't durable enough.
  • ☑ Idempotency table keyed on (provider, event_id) with atomic insert.
  • ☑ Validate timestamps where the provider supports them; keep the default tolerance.
  • ☑ Structured log per event with provider, event ID, body hash, verification status, time-to-ack.
  • ☑ Rate-limit at the edge by source IP, with limits tuned to real provider peak plus headroom.
  • ☑ Dead-letter queue for events that exhaust retries; alert on inserts.
  • ☑ Capture-and-forward inspector in the path from day one, with retention long enough to cover the longest provider retry window plus your bug-discovery time (a week minimum; two weeks is comfortable).
  • ☑ Grace-period rotation for signing secrets; never hard-cutover.
  • ☑ Test every event type, plus malformed body, wrong secret, duplicate event, slow downstream.

Failure modes this checklist prevents

Failure modeWhat happens without the practicePractice that prevents it
Forged event triggers fulfilment Attacker ships products, grants access, or triggers refunds #1 signature verification
"Signature mismatch" on every request Real events get rejected; production looks down #2 raw body, not parsed
Provider marks endpoint as failing under load Cascading retries, eventual events lost when retry budget exhausts #3 acknowledge fast, work async
Customer charged twice / order shipped twice Manual cleanup, refunds, customer-support load #4 idempotency table
Replayed signed webhook from leaked logs Old events re-execute weeks later #5 timestamp validation
3am incident with no forensic trail Debugging by guess; root cause never found #6 structured logs, #9 inspector retention
Event silently lost after retry exhaustion Customer paid, never got the thing #8 dead-letter queue
Bug fixed but events from the bug window are gone Manual customer outreach, refunds, churn #9 inspector replay
Secret rotation drops in-flight events Brief outage during a routine maintenance task #10 grace-period rotation

Frequently asked questions

Is "at-least-once delivery" the same as "exactly-once delivery"?

No, and the distinction matters operationally. Exactly-once delivery is impossible over an unreliable network — the standard distributed-systems result. Every major webhook provider explicitly documents at-least-once: your handler will receive duplicates under normal conditions (after a transient failure, retry, network blip). Idempotency on your side is what turns at-least-once into a behaviorally exactly-once outcome. Don't ask the provider for exactly-once; build idempotency into your receiver.

Should I run my webhook handler synchronously if it's "fast enough"?

Probably not. "Fast enough" today is "marginal under load" tomorrow. The async pattern (verify, persist, enqueue, ack) costs an extra database row per event and zero ongoing complexity. The synchronous pattern saves nothing and breaks the day a downstream service slows down. Always async.

Where should I store webhook secrets?

Environment variables loaded from a secrets manager (AWS Secrets Manager, HashiCorp Vault, Doppler, 1Password Connect, Google Secret Manager) — not in your code repository, not in .env.example with real values, and not in your CI logs. Rotate on offboarding, on suspected compromise, and on any commit that accidentally captured them. The GitHub secret-scanning bot catches public commits within minutes; assume the secret is burned the moment it lands.

Do I need a dead-letter queue from day one?

If you're processing payment, subscription, or fulfilment events: yes. The cost of a DLQ is one table and 30 lines of worker code. The cost of not having one, the first time a downstream service is down for 4 hours, is permanently lost events. Add it on day one.

How is best-practice webhook design different in 2026 vs 2020?

Three things matter more now than they did five years ago. First, provider AI-Overview consumption — the LLM-cited results for "webhook best practices" pull from pages with structured schema and named-author E-E-A-T anchors, which is why blog posts for the production-operator audience now ship with FAQPage JSON-LD and named bylines. Second, capture-and-forward inspectors as a first-class part of the stack — it used to be exotic; now it's table-stakes for any team running production webhooks. Third, HMAC signing on the receiver side, where you forward webhooks back out — webhook chains (provider → forwarder → your handler → downstream service) are common enough now that signing your own forwards is a real practice, not just an inbound concern.

Should I block webhook traffic by IP, by signature, or both?

Both, where the provider supports it. IP allowlisting at the edge stops invalid traffic before your application code runs (saving CPU and reducing the load on your verifier). Signature verification proves the request is authentic regardless of source. The two are complementary: a misbehaving allowlisted source still gets rejected by signature verification, and a forged-IP attack still gets rejected by signature verification. Layer them rather than pick one.

Closing

None of this is exotic. It's what production webhook setups at companies that take webhooks seriously look like. The cheap version of a webhook handler is 30 lines of Express. The production version is the same 30 lines plus signature verification, plus an idempotency table, plus a queue, plus a structured log, plus a DLQ, plus an inspector in the path. Maybe 200 lines and one extra table. The companion deep-dives for the security and retry sides of this list are webhook security best practices and our retry mechanics reference.

If you want the inspector + forwarding + replay layer for free, that's exactly what WebhookWhisper does — paste an endpoint URL into your provider, point forwarding at your handler, and every event is captured durably (7-day retention on the free tier, longer on paid), replayable on demand. The bugs you can't reproduce on demand become bugs you can replay until they're fixed. The signature playground at /webhook-signature-playground generates valid headers for synthetic test payloads if you need them for unit tests.

#webhooks#best-practices#security#reliability

Ready to test your webhooks?

Get a free HTTPS endpoint in under 5 seconds — no signup required.

Create Free Account
Webhook Best Practices 2026: 11-Point Production Checklist | WebhookWhisper