Skip to content

How to Implement Graceful Shutdown in Node.js for Kubernetes Deployments?

During a routine Kubernetes deployment, I noticed something troubling in our monitoring dashboards: a spike of 502 errors every time we pushed new code. Users were getting errors mid-request. What was going on?

The culprit? Our Node.js containers were being killed instantly during rolling updates, with no chance to finish in-flight requests.

The Problem: Instant Container Death

Here’s what was happening during each deployment:

  1. Kubernetes sends SIGTERM to the old pod
  2. Our Node.js process immediately dies (no signal handler)
  3. In-flight requests fail with connection errors
  4. New pod starts up
  5. Traffic resumes

The users caught in the middle? They got 502 errors.

terminal
# What users saw during deploys
POST /api/checkout -> 502 Bad Gateway
GET /api/orders/123 -> Connection refused

First Attempt: Basic SIGTERM Handler

I added a signal handler to catch the termination signal:

server.js
const express = require('express');
const app = express();
const server = app.listen(3000);
// My first attempt at graceful shutdown
process.on('SIGTERM', () => {
console.log('SIGTERM received, closing server...');
server.close(() => {
console.log('Server closed');
process.exit(0);
});
});

This was better. But there were still problems:

  1. New connections were still being accepted during shutdown
  2. Database connections weren’t being closed
  3. Some pods just… hung forever

The Zombie Process Problem

The third issue was the most troubling. Some pods refused to die, hanging indefinitely during the shutdown sequence. Kubernetes would eventually force-kill them after terminationGracePeriodSeconds, but this meant the deployment took much longer than expected.

The root cause? A slow database query that never completed:

example-hang.js
app.get('/api/report', async (req, res) => {
// This query can take 60+ seconds for large datasets
const report = await db.query('SELECT * FROM orders WHERE ...');
res.json(report);
});

If a SIGTERM arrived while this query was running, server.close() would wait for it to finish. But it never finished in time.

The Hard Deadline Pattern

I learned an important lesson from production experience: always wrap shutdown in a hard deadline.

shutdown-with-deadline.js
const HARD_DEADLINE_MS = 30000; // 30 seconds
async function shutdown(signal) {
console.log(`Received ${signal}. Starting graceful shutdown...`);
// Hard deadline: force exit after 30s
const hardDeadline = setTimeout(() => {
console.error('Hard deadline reached. Force exiting.');
process.exit(1);
}, HARD_DEADLINE_MS);
// Stop accepting new connections
server.close(() => {
console.log('HTTP server closed');
});
// Close database connections
try {
await db.end();
console.log('Database connections closed');
} catch (err) {
console.error('Error closing database:', err);
}
clearTimeout(hardDeadline);
console.log('Graceful shutdown complete');
process.exit(0);
}
process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT', () => shutdown('SIGINT'));

This ensures the process exits even if something is stuck. The hard deadline saved me from zombie processes that hung on stuck database queries during rolling deploys.

Rejecting New Requests During Shutdown

Another issue: the load balancer might still send traffic to a pod that’s shutting down. I needed to:

  1. Signal that the pod is unhealthy (via health check)
  2. Reject new requests immediately
health-check.js
let isShuttingDown = false;
// Health check endpoint - returns 503 during shutdown
app.get('/health', (req, res) => {
if (isShuttingDown) {
return res.status(503).json({ status: 'shutting down' });
}
res.json({ status: 'healthy' });
});
// Middleware to reject new requests during shutdown
app.use((req, res, next) => {
if (isShuttingDown) {
return res.status(503).json({ error: 'Server is shutting down' });
}
next();
});

When isShuttingDown is true, the readiness probe fails, and Kubernetes stops routing traffic to this pod.

The Complete Solution

Here’s the full pattern I ended up with:

server.js
const express = require('express');
const { Pool } = require('pg');
const app = express();
const db = new Pool({ /* config */ });
let server;
let isShuttingDown = false;
// Health check endpoint - returns 503 during shutdown
app.get('/health', (req, res) => {
if (isShuttingDown) {
return res.status(503).json({ status: 'shutting down' });
}
res.json({ status: 'healthy' });
});
// Middleware to reject new requests during shutdown
app.use((req, res, next) => {
if (isShuttingDown) {
return res.status(503).json({ error: 'Server is shutting down' });
}
next();
});
// Your routes here
app.get('/api/users', async (req, res) => {
const users = await db.query('SELECT * FROM users');
res.json(users.rows);
});
// Graceful shutdown handler
async function shutdown(signal) {
console.log(`Received ${signal}. Starting graceful shutdown...`);
isShuttingDown = true;
// Hard deadline: force exit after 30s
const hardDeadline = setTimeout(() => {
console.error('Hard deadline reached. Force exiting.');
process.exit(1);
}, 30000);
// Stop accepting new connections
server.close(() => {
console.log('HTTP server closed');
});
// Close database connections
try {
await db.end();
console.log('Database connections closed');
} catch (err) {
console.error('Error closing database:', err);
}
clearTimeout(hardDeadline);
console.log('Graceful shutdown complete');
process.exit(0);
}
// Start server
server = app.listen(3000, () => {
console.log('Server listening on port 3000');
});
// Register signal handlers
process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT', () => shutdown('SIGINT'));

Kubernetes Configuration

On the Kubernetes side, I needed to configure the pod termination properly:

deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-nodejs-app
spec:
replicas: 3
template:
spec:
terminationGracePeriodSeconds: 35 # Slightly longer than app's 30s deadline
containers:
- name: app
image: my-nodejs-app:latest
ports:
- containerPort: 3000
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5"] # Allow time for service mesh to update
readinessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 5

Key points:

  • terminationGracePeriodSeconds: 35 is slightly longer than the app’s 30-second deadline
  • The preStop hook adds a 5-second delay for service mesh updates
  • The readinessProbe points to /health which returns 503 during shutdown

Why the PreStop Hook?

The preStop hook deserves explanation. When using a service mesh like Istio or Linkerd, there’s a propagation delay before all proxies learn that a pod is being removed. The sleep gives them time to update their routing tables.

Without it, traffic might still be routed to the pod after the SIGTERM is sent.

TypeScript Version with Cleanup Tracking

For larger applications, I created a reusable shutdown manager:

shutdown.ts
interface ShutdownHandler {
name: string;
handler: () => Promise<void>;
}
class GracefulShutdown {
private handlers: ShutdownHandler[] = [];
private isShuttingDown = false;
private readonly deadline: number;
constructor(deadlineMs: number = 30000) {
this.deadline = deadlineMs;
}
register(name: string, handler: () => Promise<void>) {
this.handlers.push({ name, handler });
}
check(): boolean {
return this.isShuttingDown;
}
async execute(signal: string): Promise<void> {
if (this.isShuttingDown) return;
this.isShuttingDown = true;
console.log(`Received ${signal}. Starting graceful shutdown...`);
const forceExit = setTimeout(() => {
console.error('Hard deadline reached. Force exiting.');
process.exit(1);
}, this.deadline);
for (const { name, handler } of this.handlers) {
try {
await handler();
console.log(`${name} cleanup complete`);
} catch (err) {
console.error(`${name} cleanup failed:`, err);
}
}
clearTimeout(forceExit);
console.log('Graceful shutdown complete');
process.exit(0);
}
}
// Usage
const shutdown = new GracefulShutdown(30000);
shutdown.register('http-server', async () => {
return new Promise((resolve) => server.close(resolve));
});
shutdown.register('database', async () => {
await db.end();
});
shutdown.register('redis', async () => {
await redis.quit();
});
process.on('SIGTERM', () => shutdown.execute('SIGTERM'));
process.on('SIGINT', () => shutdown.execute('SIGINT'));

This makes it easy to register multiple cleanup handlers, each with a name for logging.

The Shutdown Sequence

To summarize, here’s the complete sequence:

1. Kubernetes decides to terminate the pod
2. preStop hook runs (sleep 5)
3. SIGTERM sent to Node.js process
4. App sets isShuttingDown = true
5. Health check starts returning 503
6. Readiness probe fails, pod removed from service
7. server.close() stops accepting new connections
8. In-flight requests continue
9. Database connections closed
10. Process exits with code 0
11. OR: hard deadline forces exit after 30s
12. Kubernetes removes the pod

What I Learned

Graceful shutdown is not optional in Kubernetes environments. Without it:

  • Every rolling update drops requests
  • Users see 502 errors
  • Zombie processes consume resources
  • Deployments can hang indefinitely

The key takeaways:

  1. Always handle SIGTERM - Kubernetes sends it 30s before force kill
  2. Stop new requests immediately, let existing ones finish
  3. Set a hard deadline to prevent zombie processes
  4. Cleanup all resources (DB, Redis, queues) before exit
  5. Configure Kubernetes terminationGracePeriodSeconds slightly longer than your app’s deadline

This pattern has been battle-tested in production and prevents the common failure modes: dropped requests, connection leaks, and zombie processes during deployments.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments