Skip to content

What Monitoring Tools Do Spring Boot Developers Use in Production?

I deployed my first Spring Boot application to production without any monitoring. Three weeks later, it crashed at 3 AM on a Sunday. I had no idea why. No logs, no metrics, no clue. After that incident, I promised myself: never again.

When I started researching monitoring tools, I found dozens of options. Datadog looked slick but expensive. New Relic seemed comprehensive but complex. Then I kept seeing the same combination mentioned in forums and Reddit threads: Prometheus and Grafana.

Let me share what I learned setting up production monitoring for Spring Boot applications.

The Problem: Flying Blind in Production

Before adding monitoring, I was essentially flying blind. When users reported slowness, I could only guess. Was it the database? The API? Memory issues? CPU spikes? I had no data to diagnose problems.

What I needed was:

  • Metrics: Numbers I could track over time (request latency, error rates, memory usage)
  • Visualization: Graphs and dashboards to spot trends
  • Alerts: Notifications when something goes wrong

The Standard Stack: Prometheus + Grafana + Micrometer

After reading through numerous discussions, one pattern emerged consistently. The de facto standard for Spring Boot monitoring is:

┌─────────────────────────────────────────────────────────────────┐
│ Spring Boot Application │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Spring Boot Actuator │ │
│ │ ┌──────────────────────────────────────────────────────┐ │ │
│ │ │ Micrometer │ │ │
│ │ │ - Counters, Timers, Gauges │ │ │
│ │ │ - Vendor-neutral metrics facade │ │ │
│ │ │ - Prometheus registry integration │ │ │
│ │ └──────────────────────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────────────┘ │
│ ↓ │
│ /actuator/prometheus │
└─────────────────────────────────────────────────────────────────┘
↓ Scrape
┌─────────────────────────────────────────────────────────────────┐
│ Prometheus │
│ - Time-series database │
│ - Pull-based metrics collection │
│ - PromQL query language │
│ - Alertmanager integration │
└─────────────────────────────────────────────────────────────────┘
↓ Query
┌─────────────────────────────────────────────────────────────────┐
│ Grafana │
│ - Visualization dashboards │
│ - Alerting and notifications │
│ - Multi-data source support │
│ - Community dashboard sharing │
└─────────────────────────────────────────────────────────────────┘

Why this combination? Because Spring Boot has native support for it. The integration requires minimal code.

Step 1: Add Dependencies

First, I added the necessary dependencies to my pom.xml:

pom.xml
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

That’s it. Two dependencies. No code changes yet.

Step 2: Expose the Prometheus Endpoint

By default, Actuator endpoints are not exposed. I configured application.yml to expose the Prometheus endpoint:

application.yml
management:
endpoints:
web:
exposure:
include: prometheus,health,info,metrics
metrics:
export:
prometheus:
enabled: true

After restarting the application, I navigated to http://localhost:8080/actuator/prometheus. A wall of text appeared:

# HELP jvm_memory_used_bytes The amount of used memory
# TYPE jvm_memory_used_bytes gauge
jvm_memory_used_bytes{area="heap",id="PS Eden Space",} 1.2345678E7
jvm_memory_used_bytes{area="heap",id="PS Survivor Space",} 123456.0
...
# HELP http_server_requests_seconds
# TYPE http_server_requests_seconds summary
http_server_requests_seconds_count{exception="None",method="GET",outcome="SUCCESS",status="200",uri="/api/users",} 1523.0
...

This was exciting. Without writing any instrumentation code, I had access to JVM memory metrics, HTTP request timings, and more.

Step 3: Add Common Tags

To identify metrics from different applications, I added common tags:

Application.java
@SpringBootApplication
public class Application {
public static void main(String[] args) {
SpringApplication.run(Application.class, args);
}
@Bean
MeterRegistryCustomizer<MeterRegistry> metricsCommonTags() {
return registry -> registry.config()
.commonTags("application", "my-spring-boot-app");
}
}

This tag appears on every metric, making it easy to filter in Grafana.

Step 4: Custom Business Metrics

Out-of-the-box metrics are useful, but I also needed to track business-specific data. For example, tracking orders created:

OrderService.java
@Service
public class OrderService {
private final Counter orderCounter;
private final Timer orderProcessingTime;
public OrderService(MeterRegistry registry) {
this.orderCounter = Counter.builder("orders.created")
.description("Total orders created")
.tag("type", "online")
.register(registry);
this.orderProcessingTime = Timer.builder("orders.processing.time")
.description("Order processing duration")
.publishPercentiles(0.5, 0.95, 0.99)
.register(registry);
}
public Order createOrder(OrderRequest request) {
return orderProcessingTime.record(() -> {
Order order = processOrder(request);
orderCounter.increment();
return order;
});
}
private Order processOrder(OrderRequest request) {
// ... order processing logic
}
}

The Timer automatically tracks how long order processing takes and computes percentiles (p50, p95, p99). The Counter tracks the total number of orders.

Step 5: Configure Prometheus

I created a prometheus.yml configuration file:

prometheus.yml
scrape_configs:
- job_name: 'spring-boot-apps'
metrics_path: '/actuator/prometheus'
static_configs:
- targets:
- 'app1:8080'
- 'app2:8080'
scrape_interval: 15s

Prometheus “scrapes” metrics every 15 seconds by hitting the /actuator/prometheus endpoint. This pull model means my application doesn’t need to know about Prometheus.

Step 6: Run Everything with Docker Compose

For local development, I created a docker-compose.yml:

docker-compose.yml
version: '3.8'
services:
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
- grafana-storage:/var/lib/grafana
volumes:
grafana-storage:

Running docker-compose up -d started both services. Prometheus was available at http://localhost:9090 and Grafana at http://localhost:3000.

Step 7: Import a Grafana Dashboard

I didn’t want to build dashboards from scratch. Thankfully, the community has created dozens of pre-built dashboards. I imported dashboard ID 4701 (JVM Micrometer) into Grafana.

The result was immediate. I had graphs for:

  • JVM heap memory usage
  • HTTP request rates and latency
  • Garbage collection pauses
  • Thread counts
  • CPU usage

Setting Up Alerts

Dashboards are nice, but I needed alerts for when things go wrong. I added alerting rules to Prometheus:

alerts.yml
groups:
- name: spring-boot-alerts
rules:
- alert: HighErrorRate
expr: |
sum(rate(http_server_requests_seconds_count{status=~"5.."}[5m]))
/
sum(rate(http_server_requests_seconds_count[5m])) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate detected"
- alert: HighMemoryUsage
expr: |
jvm_memory_used_bytes{area="heap"} / jvm_memory_max_bytes{area="heap"} > 0.9
for: 5m
labels:
severity: warning
annotations:
summary: "JVM heap memory usage above 90%"

The first alert fires when the 5xx error rate exceeds 5% for 5 minutes. The second warns when heap usage exceeds 90%.

What About OpenTelemetry?

While setting this up, I also explored OpenTelemetry (OTel). It’s newer and provides unified observability across metrics, traces, and logs.

For new projects requiring distributed tracing, OTel might be the better choice:

pom.xml
<dependency>
<groupId>io.opentelemetry.instrumentation</groupId>
<artifactId>opentelemetry-spring-boot-starter</artifactId>
</dependency>

However, for my needs—primarily metrics—the Prometheus + Grafana stack was simpler and more mature. The community support is extensive, and I found answers to every question I had.

Lessons Learned

After running this stack in production for several months, here’s what I learned:

  1. Start simple. Actuator + Micrometer gives you 80% of what you need with zero code.

  2. Be careful with cardinality. High-cardinality tags (like user IDs) can explode your metrics storage. I made this mistake initially and Prometheus ran out of memory.

  3. Tune scrape intervals. 15 seconds works for most cases. Scrape too frequently and you’ll stress Prometheus.

  4. Use recording rules for complex queries. Pre-compute expensive PromQL queries rather than running them on every dashboard refresh.

  5. Import community dashboards first. Don’t reinvent the wheel. Customize only after you understand what’s available.

When to Choose What

StackBest For
Prometheus + GrafanaBattle-tested metrics monitoring, Kubernetes environments, teams familiar with PromQL
OpenTelemetryNew projects needing unified traces/metrics/logs, vendor flexibility, distributed systems
Commercial (Datadog, New Relic)Teams wanting managed solutions, all-in-one observability, budget available

Summary

In this post, I shared my journey from zero monitoring to a complete production monitoring setup for Spring Boot applications. The Prometheus + Grafana + Micrometer stack became the industry standard for good reasons: native Spring Boot integration, extensive community support, and proven reliability.

If you’re starting from scratch, here’s the quickest path:

  1. Add spring-boot-starter-actuator and micrometer-registry-prometheus dependencies
  2. Configure application.yml to expose the Prometheus endpoint
  3. Run Prometheus and Grafana via Docker Compose
  4. Import a community dashboard (ID: 4701)
  5. Set up alerts for critical metrics

The whole setup took me about an hour the first time. Now I can deploy any Spring Boot application with confidence, knowing I’ll have visibility into its behavior.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments