Webhook Connection Reset — Causes & Fixes

"Connection reset by peer" (ECONNRESET) means the TCP connection was established and then abruptly torn down by your side mid-request. Different from "connection refused" — the source got past TCP setup, then your server severed the connection without completing the response. The webhook delivery is treated as failed and retried.

Root Causes

1. Application crashed mid-request

Your handler started processing the webhook, hit an unhandled exception, and the Node / Python process crashed. Connection RSTs because there's no process to flush the response. The provider sees ECONNRESET and retries.

2. OOM kill

Your container is running near memory limit. A spike in webhook payload size pushes it over, the OOM killer terminates the process. All in-flight connections RST. Check dmesg | grep -i oom on the host.

3. Idle timeout mismatch

The source uses HTTP keep-alive and reuses connections across multiple deliveries. Your reverse proxy or app closes the idle connection after 60 seconds. The next delivery on that connection sees the close as a reset. Match keep-alive timeouts: source < proxy < app.

4. Reverse proxy buffer overflow

nginx's proxy_request_buffering caches the request body before forwarding to your app. For very large bodies (big Shopify orders), the buffer can fill and nginx aborts. Disable buffering for webhook routes.

5. TLS handshake failure mid-data

Rare but real: TLS renegotiation or session ticket invalidation can RST a connection mid-stream. Modern TLS 1.3 mostly eliminates this, but TLS 1.2 with renegotiation enabled is still vulnerable.

Fix It

Add error handlers and process supervision

// Top-level error handlers — catch what would otherwise crash
process.on('uncaughtException', (err) => {
  log.fatal({ err }, 'uncaught exception')
  setTimeout(() => process.exit(1), 1000).unref()
})

process.on('unhandledRejection', (err) => {
  log.fatal({ err }, 'unhandled rejection')
  setTimeout(() => process.exit(1), 1000).unref()
})

Tune keep-alive timeouts

// Express — set keep-alive timeout longer than your reverse proxy's
const server = app.listen(3000)
server.keepAliveTimeout = 65_000     // 65s > nginx's default 60s
server.headersTimeout   = 70_000     // must be > keepAliveTimeout

# nginx — match upstream keep-alive
upstream backend {
    server localhost:3000;
    keepalive 32;
    keepalive_timeout 60s;
}

server {
    location /webhooks/ {
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_request_buffering off;     # don't buffer large bodies
        proxy_pass http://backend;
    }
}

Memory limits

# Docker Compose — explicit memory limit
services:
  webhook-handler:
    deploy:
      resources:
        limits:
          memory: 1G
    # Plus a healthcheck so the orchestrator restarts on lockup
    healthcheck:
      test: ["CMD", "curl", "-fs", "http://localhost:3000/health"]
      interval: 10s
      timeout: 3s
      retries: 3

Diagnostic Steps

# 1. Check for OOM kills
dmesg | grep -iE "killed|oom"

# 2. Check process restart history
docker compose ps   # "Restarting" status indicates crash loop

# 3. Check keep-alive config in proxy logs
# nginx 499 status code = client closed connection before response

# 4. tcpdump for TCP RST flags during a failing delivery
sudo tcpdump -i eth0 'tcp[tcpflags] & tcp-rst != 0'

How to Reproduce

Set your container's memory limit to 64 MB and fire a Shopify-sized webhook with WebhookWhisper. The handler OOMs, connection RSTs. Bump the limit, try again. The pattern of "first request succeeds, second RSTs" usually means OOM is happening cumulatively.

Frequently Asked Questions

Why does my app work for hours then start RSTing all connections?

Usually a memory leak. Watch RSS over time with `docker stats` or `top`. If RSS grows unbounded, fix the leak; if the OOM killer fires after hours, that's the cause.

Source retries the connection-reset events successfully — is this a real problem?

Yes. Each RST is a wasted retry, extra log noise, and a customer-facing delay. Fix the root cause; don't rely on retries to mask infra fragility.

Can keep-alive be the cause without explicit timeout mismatches?

Yes — TCP keep-alive at the OS level (SO_KEEPALIVE) can RST connections that have been idle past system defaults (~2 hours on Linux). Tune /proc/sys/net/ipv4/tcp_keepalive_time if you need long-lived connections.

Debug This Error in Real Time

WebhookWhisper captures every webhook request with full headers, body, and timing — so you can see exactly what the provider sent and reproduce the error instantly.

Start Debugging Free

Related Webhook Errors

Connection refused errors 502 bad gateway Webhook best practices for 2026 What is a webhook handler? What is a webhook delivery? All Webhook Errors →