Skip to content

How to Load Test Node.js Microservices in Production Environment?

My Node.js API handled 5,000 requests per second on my laptop. I was confident it would easily handle our production traffic of 500 req/s. Then we deployed, and everything fell apart.

Response times spiked to 3 seconds. The database connection pool exhausted. Pods started crashing under load. What went wrong?

The problem wasn’t my code. It was my testing environment.

The False Confidence of Localhost Testing

On my development machine, I ran a simple load test:

terminal
# Using autocannon on localhost
npx autocannon -c 100 -d 30 http://localhost:3000/api/users
# Results looked great
# Requests per second: 5,234
# Latency p95: 12ms
# Errors: 0

Fantastic results. I told my team we were ready for production.

But production has something my laptop doesn’t:

  • Load balancer (NGINX, ALB, or Istio Gateway)
  • Container networking (CNI plugins, overlay networks)
  • Service mesh sidecar proxies (Istio, Linkerd)
  • DNS resolution for service discovery
  • SSL/TLS termination
  • Network policies and firewalls

Each layer adds latency and potential bottlenecks. Testing only the application layer is like testing a car engine on a bench and assuming it will perform the same on a muddy road.

First Attempt: Port-Forwarding to a Pod

My first attempt at production-like testing was naive. I port-forwarded to a single pod:

terminal
# Port-forward to a single pod in staging
kubectl port-forward pod/my-api-abc123 3000:3000
# Run load test against it
k6 run load-test.js

The results were misleading:

load-test.js
import http from 'k6/http';
import { check } from 'k6';
export const options = {
stages: [
{ duration: '2m', target: 100 },
{ duration: '5m', target: 500 },
],
};
const BASE_URL = __ENV.BASE_URL || 'http://localhost:3000';
export default function () {
const res = http.get(`${BASE_URL}/api/users`);
check(res, {
'status is 200': (r) => r.status === 200,
});
}

This test bypassed the load balancer entirely. It tested one pod in isolation, not how the system behaves under realistic conditions.

When we deployed to production, the load balancer became a bottleneck. Connection limits were hit. SSL termination added latency. The service mesh injected sidecar proxies that added 2-5ms per request.

The Real Bottlenecks

I discovered these issues only appeared in full-stack testing:

Load Balancer Connection Limits

nginx-config.yaml
# Default NGINX limits caught us off guard
upstream nodejs_backend {
server 10.0.0.1:3000;
keepalive 32; # Only 32 keepalive connections!
}
# Under load, connections queued up
# Response times spiked as clients waited

Service Mesh Overhead

Each request went through the Istio sidecar proxy:

terminal
# Latency breakdown from Istio metrics
# App latency: 15ms
# Sidecar proxy (client): 2ms
# Sidecar proxy (server): 2ms
# Total: 19ms (27% overhead for fast requests)

DNS Resolution Delays

Service discovery in Kubernetes added unpredictable delays:

dns-issue.js
// This looked fine locally, but in production...
const response = await fetch('http://user-service:3000/api/users');
// DNS resolution could take 5-50ms depending on cache state
// And under high load, CoreDNS became a bottleneck

Connection Pool Exhaustion

With many concurrent clients, the database connection pool behavior changed:

pool-config.js
// Local testing: single client, connections reused
// Production: many clients, each opening connections
const pool = new Pool({
max: 20, // 20 connections seemed plenty
// But with 100 concurrent test users, queue formed
});

Testing Through the Load Balancer

The key insight: always test through the entire stack.

terminal
# Run k6 against the staging load balancer (full stack)
k6 run -e BASE_URL=https://staging-api.example.com load-test.js
# This includes:
# - SSL termination
# - Load balancer routing
# - Service mesh proxies
# - Container networking
# - DNS resolution

The results were dramatically different:

terminal
# Localhost test
Requests/sec: 5,234
Latency p95: 12ms
# Full stack test
Requests/sec: 1,847 # 65% reduction!
Latency p95: 89ms # 7x higher

A Proper k6 Load Test Script

Here’s the k6 script I ended up using for realistic production testing:

load-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate } from 'k6/metrics';
// Custom error rate metric
const errorRate = new Rate('errors');
export const options = {
stages: [
{ duration: '2m', target: 100 }, // Ramp up to 100 users
{ duration: '5m', target: 100 }, // Stay at 100 users
{ duration: '2m', target: 500 }, // Ramp up to 500 users
{ duration: '5m', target: 500 }, // Stay at 500 users
{ duration: '2m', target: 1000 }, // Ramp up to 1000 users
{ duration: '5m', target: 1000 }, // Stay at 1000 users
{ duration: '2m', target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ['p(95)<500'], // 95% of requests < 500ms
errors: ['rate<0.01'], // Error rate < 1%
},
};
const BASE_URL = __ENV.BASE_URL || 'https://api.example.com';
export default function () {
// Simulate realistic user behavior
const userId = Math.floor(Math.random() * 10000);
// Get user profile
const profileRes = http.get(`${BASE_URL}/api/users/${userId}`);
check(profileRes, {
'profile status 200': (r) => r.status === 200,
'profile < 300ms': (r) => r.timings.duration < 300,
});
errorRate.add(profileRes.status !== 200);
sleep(1);
// Get user orders
const ordersRes = http.get(`${BASE_URL}/api/users/${userId}/orders`);
check(ordersRes, {
'orders status 200': (r) => r.status === 200,
});
errorRate.add(ordersRes.status !== 200);
sleep(2);
}

The thresholds section defines pass/fail criteria. If p95 latency exceeds 500ms or error rate exceeds 1%, the test fails.

Running Distributed Load Tests

A single machine can’t generate enough load to stress a production system. The k6 Operator lets you run distributed tests inside Kubernetes:

k6-test-job.yaml
apiVersion: k6.io/v1alpha1
kind: K6
metadata:
name: load-test
spec:
parallelism: 4 # Run 4 pods generating load
script:
configMap:
name: k6-test-script
arguments: --out influxdb=http://influxdb:8086/k6

Apply it:

terminal
# Create the config map with your test script
kubectl create configmap k6-test-script --from-file=load-test.js
# Run the distributed load test
kubectl apply -f k6-test-job.yaml
# Watch the test progress
kubectl logs -f job/k6-load-test

Testing Internal Services

Sometimes you need to test services that aren’t exposed externally. Run the load generator inside the cluster:

internal-load-test.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: k6-load-test
namespace: load-testing
spec:
template:
spec:
containers:
- name: k6
image: grafana/k6:latest
command: ['k6', 'run', '/scripts/load-test.js']
volumeMounts:
- name: test-scripts
mountPath: /scripts
env:
- name: BASE_URL
value: 'http://my-service.default.svc.cluster.local:3000'
volumes:
- name: test-scripts
configMap:
name: k6-scripts
restartPolicy: Never

This tests the service directly, bypassing the ingress but still going through cluster networking.

Monitoring During Load Tests

You need visibility into what’s happening. Here are the key metrics to watch:

Application Metrics

metrics-to-watch.js
const metricsToWatch = {
// k6 output
'http_req_duration': 'Response time (p95 should be < 500ms)',
'http_reqs': 'Requests per second',
'iterations': 'Total test iterations',
// Node.js specific
'nodejs_eventloop_lag_seconds': 'Event loop lag (should be < 100ms)',
'nodejs_active_requests': 'Active HTTP requests',
'nodejs_heap_size_used_bytes': 'Heap memory usage',
};

Prometheus Queries for Real-Time Monitoring

prometheus-queries.promql
# Request rate
rate(http_requests_total[1m])
# Error rate
rate(http_requests_total{status=~"5.."}[1m]) / rate(http_requests_total[1m])
# P95 latency
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[1m]))
# Event loop lag (critical for Node.js)
nodejs_eventloop_lag_seconds
# Container resource usage
container_cpu_usage_seconds_total{pod=~"my-api-.*"}
container_memory_working_set_bytes{pod=~"my-api-.*"}

Using Artillery for More Complex Scenarios

For complex user flows, I sometimes prefer Artillery:

artillery-config.yml
config:
target: 'https://api.example.com'
phases:
- duration: 60
arrivalRate: 10
name: Warm up
- duration: 120
arrivalRate: 10
rampTo: 100
name: Ramp up
- duration: 300
arrivalRate: 100
name: Sustained load
processor: './processor.js' # Custom JS for dynamic data
scenarios:
- name: 'User API Flow'
flow:
- get:
url: '/api/users/{{ userId }}'
capture:
- json: '$.id'
as: 'profileId'
- think: 2
- get:
url: '/api/users/{{ profileId }}/orders'
- think: 1
- post:
url: '/api/orders'
json:
userId: '{{ userId }}'
items: '{{ items }}'

The processor.js file generates dynamic test data:

processor.js
module.exports = {
generateUserData,
};
function generateUserData(userContext, events, done) {
userContext.vars.userId = Math.floor(Math.random() * 10000);
userContext.vars.items = [
{ productId: Math.floor(Math.random() * 100), quantity: 1 },
];
return done();
}

Production Testing Safety

Testing against production carries risk. Here’s how I mitigate it:

  1. Test during low-traffic windows - Sunday 3 AM, not Monday morning
  2. Start small - Ramp up gradually, watch for degradation
  3. Have a kill switch - Be ready to stop immediately
  4. Monitor everything - CPU, memory, database connections, error rates
  5. Set conservative thresholds - 500ms p95, not 50ms
  6. Use feature flags - Route test traffic to specific instances
safe-production-test.js
export const options = {
stages: [
{ duration: '5m', target: 10 }, // Start very small
{ duration: '5m', target: 50 }, // Gradual increase
{ duration: '10m', target: 100 }, // Watch carefully
{ duration: '5m', target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ['p(95)&lt;500'],
errors: ['rate&lt;0.005'], // Very strict error threshold
},
};

The Difference is Dramatic

After implementing proper full-stack testing, I found:

MetricLocalhostProduction-Like
Max Throughput5,000 req/s1,800 req/s
P95 Latency12ms89ms
P99 Latency25ms340ms
Error Rate at Limit0%2.3%

The 65% throughput reduction was a wake-up call. Without full-stack testing, I would have deployed an underprovisioned system.

What I Learned

  1. Localhost testing gives false confidence - Production infrastructure adds significant overhead
  2. Test through the load balancer - This is how real traffic reaches your service
  3. Include service mesh overhead - Istio/Linkerd adds 2-10ms per hop
  4. Use distributed load generators - One machine can’t stress a real system
  5. Monitor Node.js-specific metrics - Event loop lag is the silent killer
  6. Test with production-like data - Database size affects query performance
  7. Start small and ramp up - Don’t crash production with your first test

The gap between localhost and production can be 2-10x in throughput. Only full-stack testing reveals your system’s true capacity.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments