How to Load Test Node.js Microservices in Production Environment?
My Node.js API handled 5,000 requests per second on my laptop. I was confident it would easily handle our production traffic of 500 req/s. Then we deployed, and everything fell apart.
Response times spiked to 3 seconds. The database connection pool exhausted. Pods started crashing under load. What went wrong?
The problem wasn’t my code. It was my testing environment.
The False Confidence of Localhost Testing
On my development machine, I ran a simple load test:
# Using autocannon on localhostnpx autocannon -c 100 -d 30 http://localhost:3000/api/users
# Results looked great# Requests per second: 5,234# Latency p95: 12ms# Errors: 0Fantastic results. I told my team we were ready for production.
But production has something my laptop doesn’t:
- Load balancer (NGINX, ALB, or Istio Gateway)
- Container networking (CNI plugins, overlay networks)
- Service mesh sidecar proxies (Istio, Linkerd)
- DNS resolution for service discovery
- SSL/TLS termination
- Network policies and firewalls
Each layer adds latency and potential bottlenecks. Testing only the application layer is like testing a car engine on a bench and assuming it will perform the same on a muddy road.
First Attempt: Port-Forwarding to a Pod
My first attempt at production-like testing was naive. I port-forwarded to a single pod:
# Port-forward to a single pod in stagingkubectl port-forward pod/my-api-abc123 3000:3000
# Run load test against itk6 run load-test.jsThe results were misleading:
import http from 'k6/http';import { check } from 'k6';
export const options = { stages: [ { duration: '2m', target: 100 }, { duration: '5m', target: 500 }, ],};
const BASE_URL = __ENV.BASE_URL || 'http://localhost:3000';
export default function () { const res = http.get(`${BASE_URL}/api/users`); check(res, { 'status is 200': (r) => r.status === 200, });}This test bypassed the load balancer entirely. It tested one pod in isolation, not how the system behaves under realistic conditions.
When we deployed to production, the load balancer became a bottleneck. Connection limits were hit. SSL termination added latency. The service mesh injected sidecar proxies that added 2-5ms per request.
The Real Bottlenecks
I discovered these issues only appeared in full-stack testing:
Load Balancer Connection Limits
# Default NGINX limits caught us off guardupstream nodejs_backend { server 10.0.0.1:3000; keepalive 32; # Only 32 keepalive connections!}
# Under load, connections queued up# Response times spiked as clients waitedService Mesh Overhead
Each request went through the Istio sidecar proxy:
# Latency breakdown from Istio metrics# App latency: 15ms# Sidecar proxy (client): 2ms# Sidecar proxy (server): 2ms# Total: 19ms (27% overhead for fast requests)DNS Resolution Delays
Service discovery in Kubernetes added unpredictable delays:
// This looked fine locally, but in production...const response = await fetch('http://user-service:3000/api/users');
// DNS resolution could take 5-50ms depending on cache state// And under high load, CoreDNS became a bottleneckConnection Pool Exhaustion
With many concurrent clients, the database connection pool behavior changed:
// Local testing: single client, connections reused// Production: many clients, each opening connections
const pool = new Pool({ max: 20, // 20 connections seemed plenty // But with 100 concurrent test users, queue formed});Testing Through the Load Balancer
The key insight: always test through the entire stack.
# Run k6 against the staging load balancer (full stack)k6 run -e BASE_URL=https://staging-api.example.com load-test.js
# This includes:# - SSL termination# - Load balancer routing# - Service mesh proxies# - Container networking# - DNS resolutionThe results were dramatically different:
# Localhost testRequests/sec: 5,234Latency p95: 12ms
# Full stack testRequests/sec: 1,847 # 65% reduction!Latency p95: 89ms # 7x higherA Proper k6 Load Test Script
Here’s the k6 script I ended up using for realistic production testing:
import http from 'k6/http';import { check, sleep } from 'k6';import { Rate } from 'k6/metrics';
// Custom error rate metricconst errorRate = new Rate('errors');
export const options = { stages: [ { duration: '2m', target: 100 }, // Ramp up to 100 users { duration: '5m', target: 100 }, // Stay at 100 users { duration: '2m', target: 500 }, // Ramp up to 500 users { duration: '5m', target: 500 }, // Stay at 500 users { duration: '2m', target: 1000 }, // Ramp up to 1000 users { duration: '5m', target: 1000 }, // Stay at 1000 users { duration: '2m', target: 0 }, // Ramp down ], thresholds: { http_req_duration: ['p(95)<500'], // 95% of requests < 500ms errors: ['rate<0.01'], // Error rate < 1% },};
const BASE_URL = __ENV.BASE_URL || 'https://api.example.com';
export default function () { // Simulate realistic user behavior const userId = Math.floor(Math.random() * 10000);
// Get user profile const profileRes = http.get(`${BASE_URL}/api/users/${userId}`); check(profileRes, { 'profile status 200': (r) => r.status === 200, 'profile < 300ms': (r) => r.timings.duration < 300, }); errorRate.add(profileRes.status !== 200);
sleep(1);
// Get user orders const ordersRes = http.get(`${BASE_URL}/api/users/${userId}/orders`); check(ordersRes, { 'orders status 200': (r) => r.status === 200, }); errorRate.add(ordersRes.status !== 200);
sleep(2);}The thresholds section defines pass/fail criteria. If p95 latency exceeds 500ms or error rate exceeds 1%, the test fails.
Running Distributed Load Tests
A single machine can’t generate enough load to stress a production system. The k6 Operator lets you run distributed tests inside Kubernetes:
apiVersion: k6.io/v1alpha1kind: K6metadata: name: load-testspec: parallelism: 4 # Run 4 pods generating load script: configMap: name: k6-test-script arguments: --out influxdb=http://influxdb:8086/k6Apply it:
# Create the config map with your test scriptkubectl create configmap k6-test-script --from-file=load-test.js
# Run the distributed load testkubectl apply -f k6-test-job.yaml
# Watch the test progresskubectl logs -f job/k6-load-testTesting Internal Services
Sometimes you need to test services that aren’t exposed externally. Run the load generator inside the cluster:
apiVersion: batch/v1kind: Jobmetadata: name: k6-load-test namespace: load-testingspec: template: spec: containers: - name: k6 image: grafana/k6:latest command: ['k6', 'run', '/scripts/load-test.js'] volumeMounts: - name: test-scripts mountPath: /scripts env: - name: BASE_URL value: 'http://my-service.default.svc.cluster.local:3000' volumes: - name: test-scripts configMap: name: k6-scripts restartPolicy: NeverThis tests the service directly, bypassing the ingress but still going through cluster networking.
Monitoring During Load Tests
You need visibility into what’s happening. Here are the key metrics to watch:
Application Metrics
const metricsToWatch = { // k6 output 'http_req_duration': 'Response time (p95 should be < 500ms)', 'http_reqs': 'Requests per second', 'iterations': 'Total test iterations',
// Node.js specific 'nodejs_eventloop_lag_seconds': 'Event loop lag (should be < 100ms)', 'nodejs_active_requests': 'Active HTTP requests', 'nodejs_heap_size_used_bytes': 'Heap memory usage',};Prometheus Queries for Real-Time Monitoring
# Request raterate(http_requests_total[1m])
# Error raterate(http_requests_total{status=~"5.."}[1m]) / rate(http_requests_total[1m])
# P95 latencyhistogram_quantile(0.95, rate(http_request_duration_seconds_bucket[1m]))
# Event loop lag (critical for Node.js)nodejs_eventloop_lag_seconds
# Container resource usagecontainer_cpu_usage_seconds_total{pod=~"my-api-.*"}container_memory_working_set_bytes{pod=~"my-api-.*"}Using Artillery for More Complex Scenarios
For complex user flows, I sometimes prefer Artillery:
config: target: 'https://api.example.com' phases: - duration: 60 arrivalRate: 10 name: Warm up - duration: 120 arrivalRate: 10 rampTo: 100 name: Ramp up - duration: 300 arrivalRate: 100 name: Sustained load processor: './processor.js' # Custom JS for dynamic data
scenarios: - name: 'User API Flow' flow: - get: url: '/api/users/{{ userId }}' capture: - json: '$.id' as: 'profileId' - think: 2 - get: url: '/api/users/{{ profileId }}/orders' - think: 1 - post: url: '/api/orders' json: userId: '{{ userId }}' items: '{{ items }}'The processor.js file generates dynamic test data:
module.exports = { generateUserData,};
function generateUserData(userContext, events, done) { userContext.vars.userId = Math.floor(Math.random() * 10000); userContext.vars.items = [ { productId: Math.floor(Math.random() * 100), quantity: 1 }, ]; return done();}Production Testing Safety
Testing against production carries risk. Here’s how I mitigate it:
- Test during low-traffic windows - Sunday 3 AM, not Monday morning
- Start small - Ramp up gradually, watch for degradation
- Have a kill switch - Be ready to stop immediately
- Monitor everything - CPU, memory, database connections, error rates
- Set conservative thresholds - 500ms p95, not 50ms
- Use feature flags - Route test traffic to specific instances
export const options = { stages: [ { duration: '5m', target: 10 }, // Start very small { duration: '5m', target: 50 }, // Gradual increase { duration: '10m', target: 100 }, // Watch carefully { duration: '5m', target: 0 }, // Ramp down ], thresholds: { http_req_duration: ['p(95)<500'], errors: ['rate<0.005'], // Very strict error threshold },};The Difference is Dramatic
After implementing proper full-stack testing, I found:
| Metric | Localhost | Production-Like |
|---|---|---|
| Max Throughput | 5,000 req/s | 1,800 req/s |
| P95 Latency | 12ms | 89ms |
| P99 Latency | 25ms | 340ms |
| Error Rate at Limit | 0% | 2.3% |
The 65% throughput reduction was a wake-up call. Without full-stack testing, I would have deployed an underprovisioned system.
What I Learned
- Localhost testing gives false confidence - Production infrastructure adds significant overhead
- Test through the load balancer - This is how real traffic reaches your service
- Include service mesh overhead - Istio/Linkerd adds 2-10ms per hop
- Use distributed load generators - One machine can’t stress a real system
- Monitor Node.js-specific metrics - Event loop lag is the silent killer
- Test with production-like data - Database size affects query performance
- Start small and ramp up - Don’t crash production with your first test
The gap between localhost and production can be 2-10x in throughput. Only full-stack testing reveals your system’s true capacity.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 k6 Documentation
- 👨💻 Artillery Documentation
- 👨💻 Kubernetes Load Testing Best Practices
- 👨💻 Node.js Performance Monitoring
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments