How to Implement Graceful Shutdown in Node.js for Kubernetes Deployments?
During a routine Kubernetes deployment, I noticed something troubling in our monitoring dashboards: a spike of 502 errors every time we pushed new code. Users were getting errors mid-request. What was going on?
The culprit? Our Node.js containers were being killed instantly during rolling updates, with no chance to finish in-flight requests.
The Problem: Instant Container Death
Here’s what was happening during each deployment:
- Kubernetes sends SIGTERM to the old pod
- Our Node.js process immediately dies (no signal handler)
- In-flight requests fail with connection errors
- New pod starts up
- Traffic resumes
The users caught in the middle? They got 502 errors.
# What users saw during deploysPOST /api/checkout -> 502 Bad GatewayGET /api/orders/123 -> Connection refusedFirst Attempt: Basic SIGTERM Handler
I added a signal handler to catch the termination signal:
const express = require('express');const app = express();
const server = app.listen(3000);
// My first attempt at graceful shutdownprocess.on('SIGTERM', () => { console.log('SIGTERM received, closing server...'); server.close(() => { console.log('Server closed'); process.exit(0); });});This was better. But there were still problems:
- New connections were still being accepted during shutdown
- Database connections weren’t being closed
- Some pods just… hung forever
The Zombie Process Problem
The third issue was the most troubling. Some pods refused to die, hanging indefinitely during the shutdown sequence. Kubernetes would eventually force-kill them after terminationGracePeriodSeconds, but this meant the deployment took much longer than expected.
The root cause? A slow database query that never completed:
app.get('/api/report', async (req, res) => { // This query can take 60+ seconds for large datasets const report = await db.query('SELECT * FROM orders WHERE ...'); res.json(report);});If a SIGTERM arrived while this query was running, server.close() would wait for it to finish. But it never finished in time.
The Hard Deadline Pattern
I learned an important lesson from production experience: always wrap shutdown in a hard deadline.
const HARD_DEADLINE_MS = 30000; // 30 seconds
async function shutdown(signal) { console.log(`Received ${signal}. Starting graceful shutdown...`);
// Hard deadline: force exit after 30s const hardDeadline = setTimeout(() => { console.error('Hard deadline reached. Force exiting.'); process.exit(1); }, HARD_DEADLINE_MS);
// Stop accepting new connections server.close(() => { console.log('HTTP server closed'); });
// Close database connections try { await db.end(); console.log('Database connections closed'); } catch (err) { console.error('Error closing database:', err); }
clearTimeout(hardDeadline); console.log('Graceful shutdown complete'); process.exit(0);}
process.on('SIGTERM', () => shutdown('SIGTERM'));process.on('SIGINT', () => shutdown('SIGINT'));This ensures the process exits even if something is stuck. The hard deadline saved me from zombie processes that hung on stuck database queries during rolling deploys.
Rejecting New Requests During Shutdown
Another issue: the load balancer might still send traffic to a pod that’s shutting down. I needed to:
- Signal that the pod is unhealthy (via health check)
- Reject new requests immediately
let isShuttingDown = false;
// Health check endpoint - returns 503 during shutdownapp.get('/health', (req, res) => { if (isShuttingDown) { return res.status(503).json({ status: 'shutting down' }); } res.json({ status: 'healthy' });});
// Middleware to reject new requests during shutdownapp.use((req, res, next) => { if (isShuttingDown) { return res.status(503).json({ error: 'Server is shutting down' }); } next();});When isShuttingDown is true, the readiness probe fails, and Kubernetes stops routing traffic to this pod.
The Complete Solution
Here’s the full pattern I ended up with:
const express = require('express');const { Pool } = require('pg');
const app = express();const db = new Pool({ /* config */ });
let server;let isShuttingDown = false;
// Health check endpoint - returns 503 during shutdownapp.get('/health', (req, res) => { if (isShuttingDown) { return res.status(503).json({ status: 'shutting down' }); } res.json({ status: 'healthy' });});
// Middleware to reject new requests during shutdownapp.use((req, res, next) => { if (isShuttingDown) { return res.status(503).json({ error: 'Server is shutting down' }); } next();});
// Your routes hereapp.get('/api/users', async (req, res) => { const users = await db.query('SELECT * FROM users'); res.json(users.rows);});
// Graceful shutdown handlerasync function shutdown(signal) { console.log(`Received ${signal}. Starting graceful shutdown...`);
isShuttingDown = true;
// Hard deadline: force exit after 30s const hardDeadline = setTimeout(() => { console.error('Hard deadline reached. Force exiting.'); process.exit(1); }, 30000);
// Stop accepting new connections server.close(() => { console.log('HTTP server closed'); });
// Close database connections try { await db.end(); console.log('Database connections closed'); } catch (err) { console.error('Error closing database:', err); }
clearTimeout(hardDeadline); console.log('Graceful shutdown complete'); process.exit(0);}
// Start serverserver = app.listen(3000, () => { console.log('Server listening on port 3000');});
// Register signal handlersprocess.on('SIGTERM', () => shutdown('SIGTERM'));process.on('SIGINT', () => shutdown('SIGINT'));Kubernetes Configuration
On the Kubernetes side, I needed to configure the pod termination properly:
apiVersion: apps/v1kind: Deploymentmetadata: name: my-nodejs-appspec: replicas: 3 template: spec: terminationGracePeriodSeconds: 35 # Slightly longer than app's 30s deadline containers: - name: app image: my-nodejs-app:latest ports: - containerPort: 3000 lifecycle: preStop: exec: command: ["/bin/sh", "-c", "sleep 5"] # Allow time for service mesh to update readinessProbe: httpGet: path: /health port: 3000 initialDelaySeconds: 5 periodSeconds: 5Key points:
terminationGracePeriodSeconds: 35is slightly longer than the app’s 30-second deadline- The
preStophook adds a 5-second delay for service mesh updates - The
readinessProbepoints to/healthwhich returns 503 during shutdown
Why the PreStop Hook?
The preStop hook deserves explanation. When using a service mesh like Istio or Linkerd, there’s a propagation delay before all proxies learn that a pod is being removed. The sleep gives them time to update their routing tables.
Without it, traffic might still be routed to the pod after the SIGTERM is sent.
TypeScript Version with Cleanup Tracking
For larger applications, I created a reusable shutdown manager:
interface ShutdownHandler { name: string; handler: () => Promise<void>;}
class GracefulShutdown { private handlers: ShutdownHandler[] = []; private isShuttingDown = false; private readonly deadline: number;
constructor(deadlineMs: number = 30000) { this.deadline = deadlineMs; }
register(name: string, handler: () => Promise<void>) { this.handlers.push({ name, handler }); }
check(): boolean { return this.isShuttingDown; }
async execute(signal: string): Promise<void> { if (this.isShuttingDown) return; this.isShuttingDown = true;
console.log(`Received ${signal}. Starting graceful shutdown...`);
const forceExit = setTimeout(() => { console.error('Hard deadline reached. Force exiting.'); process.exit(1); }, this.deadline);
for (const { name, handler } of this.handlers) { try { await handler(); console.log(`${name} cleanup complete`); } catch (err) { console.error(`${name} cleanup failed:`, err); } }
clearTimeout(forceExit); console.log('Graceful shutdown complete'); process.exit(0); }}
// Usageconst shutdown = new GracefulShutdown(30000);
shutdown.register('http-server', async () => { return new Promise((resolve) => server.close(resolve));});
shutdown.register('database', async () => { await db.end();});
shutdown.register('redis', async () => { await redis.quit();});
process.on('SIGTERM', () => shutdown.execute('SIGTERM'));process.on('SIGINT', () => shutdown.execute('SIGINT'));This makes it easy to register multiple cleanup handlers, each with a name for logging.
The Shutdown Sequence
To summarize, here’s the complete sequence:
1. Kubernetes decides to terminate the pod2. preStop hook runs (sleep 5)3. SIGTERM sent to Node.js process4. App sets isShuttingDown = true5. Health check starts returning 5036. Readiness probe fails, pod removed from service7. server.close() stops accepting new connections8. In-flight requests continue9. Database connections closed10. Process exits with code 011. OR: hard deadline forces exit after 30s12. Kubernetes removes the podWhat I Learned
Graceful shutdown is not optional in Kubernetes environments. Without it:
- Every rolling update drops requests
- Users see 502 errors
- Zombie processes consume resources
- Deployments can hang indefinitely
The key takeaways:
- Always handle SIGTERM - Kubernetes sends it 30s before force kill
- Stop new requests immediately, let existing ones finish
- Set a hard deadline to prevent zombie processes
- Cleanup all resources (DB, Redis, queues) before exit
- Configure Kubernetes
terminationGracePeriodSecondsslightly longer than your app’s deadline
This pattern has been battle-tested in production and prevents the common failure modes: dropped requests, connection leaks, and zombie processes during deployments.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Kubernetes Pods Lifecycle
- 👨💻 Node.js Process Signals
- 👨💻 Express Best Practices
- 👨💻 Termination Grace Period
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments