Skip to content

Spring Boot Micrometer Monitoring Best Practices - A Practical Guide

The Problem

I had set up Prometheus and Grafana for my Spring Boot application, and I could see JVM memory usage and HTTP request metrics. But when my product manager asked specific questions, I couldn’t answer them:

  • “How many orders are being processed per minute?”
  • “What’s the failure rate for payment transactions?”
  • “How long does it take to sync data from the external API?”

The built-in metrics weren’t enough. I needed custom metrics tailored to my business logic. I tried adding metrics using System.currentTimeMillis() and log statements, but that approach was messy and didn’t integrate with my monitoring stack.

I needed a better way to track business-specific metrics that would work seamlessly with Prometheus and Grafana.

Why Micrometer?

I discovered Micrometer, and it’s the standard solution for Spring Boot metrics. Think of it as “SLF4J for metrics” - you instrument your code once, and Micrometer handles translation to your monitoring backend.

Here’s what makes Micrometer the right choice:

+-------------------+
| Your Application |
| (Business Logic) |
+-------------------+
|
v
+-------------------+ +------------+ +---------+
| Micrometer | | | | |
| (MeterRegistry) |---->| Prometheus |---->| Grafana |
+-------------------+ +------------+ +---------+
| ^
| |
+-----> Datadog -------+
| |
+-----> InfluxDB ------+
| |
+-----> New Relic -----+

Key benefits I found:

  • Vendor neutrality: I can switch from Prometheus to Datadog by changing one dependency
  • Spring Boot integration: Auto-configured via Spring Boot Actuator
  • Dimensional metrics: Tags enable filtering and drill-down in dashboards
  • Battle-tested: Used in production at scale by major companies

Step 1: Understand Meter Types

Micrometer provides four main meter types. I needed to understand when to use each one:

Meter TypeBehaviorUse CaseExample
CounterOnly increasesCounting eventsOrders placed, errors occurred
GaugeCan go up or downCurrent stateActive connections, queue size
TimerMeasures duration + countLatency trackingAPI response time, DB query time
DistributionSummaryTracks distributionSize metricsRequest payload size, batch size

Counter - For Counting Events

Counters only go up. I use them for monotonically increasing values like request counts or completed tasks.

OrderService.java
import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
import org.springframework.stereotype.Service;
@Service
public class OrderService {
private final Counter ordersCreated;
private final Counter ordersFailed;
public OrderService(MeterRegistry registry) {
this.ordersCreated = Counter.builder("orders.created")
.description("Number of orders created")
.tag("type", "online")
.baseUnit("orders")
.register(registry);
this.ordersFailed = Counter.builder("orders.failed")
.description("Number of failed orders")
.tag("type", "online")
.register(registry);
}
public Order createOrder(OrderRequest request) {
try {
Order order = orderRepository.save(request);
ordersCreated.increment();
return order;
} catch (Exception e) {
ordersFailed.increment();
throw e;
}
}
}

Important: Never use Counter for values that can decrease. If you need to track current value (like active users), use Gauge instead.

Timer - For Measuring Duration

Timers measure both duration and count. They automatically track percentiles (p50, p95, p99).

PaymentService.java
import io.micrometer.core.instrument.Timer;
import io.micrometer.core.instrument.MeterRegistry;
import org.springframework.stereotype.Service;
import java.util.concurrent.TimeUnit;
@Service
public class PaymentService {
private final Timer paymentTimer;
public PaymentService(MeterRegistry registry) {
this.paymentTimer = Timer.builder("payment.processing.time")
.description("Time spent processing payments")
.tag("gateway", "stripe")
.publishPercentiles(0.5, 0.95, 0.99)
.publishPercentileHistogram()
.minimumExpectedValue(Duration.ofMillis(1))
.maximumExpectedValue(Duration.ofSeconds(30))
.register(registry);
}
public PaymentResult processPayment(PaymentRequest request) {
// Option 1: Record with supplier
return paymentTimer.record(() -> {
return paymentGateway.charge(request);
});
// Option 2: Manual timing
long start = System.nanoTime();
try {
PaymentResult result = paymentGateway.charge(request);
return result;
} finally {
paymentTimer.record(System.nanoTime() - start, TimeUnit.NANOSECONDS);
}
}
}

Timers give me both the count of operations and duration statistics in one meter.

Gauge - For Current Values

Gauges sample a value at observation time. They’re perfect for current state metrics.

ConnectionPoolService.java
import io.micrometer.core.instrument.Gauge;
import io.micrometer.core.instrument.MeterRegistry;
import java.util.concurrent.atomic.AtomicLong;
import org.springframework.stereotype.Service;
@Service
public class ConnectionPoolService {
private final AtomicLong activeConnections = new AtomicLong(0);
public ConnectionPoolService(MeterRegistry registry) {
// Gauge samples the value from AtomicLong
Gauge.builder("connections.active", activeConnections, AtomicLong::get)
.description("Number of active database connections")
.tag("pool", "main")
.register(registry);
}
public Connection acquireConnection() {
activeConnections.incrementAndGet();
return connectionPool.borrowObject();
}
public void releaseConnection(Connection conn) {
activeConnections.decrementAndGet();
connectionPool.returnObject(conn);
}
}

Gotcha I learned: Gauges are sampled, not pushed. Prometheus scrapes the current value when it queries the endpoint.

Step 2: Follow Naming Conventions

I made mistakes with metric naming initially. Here’s what I learned:

Use Lowercase Dot Notation

NamingConvention.java
// GOOD - Consistent, readable
Counter.builder("http.server.requests")
.tag("method", "GET")
.tag("uri", "/api/users");
Counter.builder("database.queries")
.tag("table", "users")
.tag("operation", "select");
// BAD - Inconsistent, hard to filter
Counter.builder("HTTP_Requests");
Counter.builder("databaseQueries");
Counter.builder("user-query-count");

Name the Thing Being Measured

The metric name should answer “what is being measured?”:

GoodNaming.java
// Clear what's being measured
Timer.builder("http.server.requests");
Counter.builder("orders.created");
Gauge.builder("jvm.memory.used");
// Unclear - what is this?
Timer.builder("timing");
Counter.builder("count");
Gauge.builder("value");

Step 3: Use Tags Effectively

Tags (also called labels in Prometheus) enable dimensional metrics. They let me filter and group metrics in Grafana.

Tags for Granularity

MetricsWithTags.java
// Track HTTP requests by method, endpoint, and status
registry.counter("http.server.requests",
"method", "GET",
"uri", "/api/users",
"status", "200"
);
// Track database calls by operation
registry.counter("database.queries",
"database", "users",
"operation", "select",
"table", "orders"
);
// Track business metrics by region
registry.counter("orders.processed",
"region", "us-east-1",
"service", "order-service"
);

Now in Grafana, I can query:

  • Total orders: sum(orders_processed)
  • Orders by region: sum by (region) (orders_processed)
  • US-east orders only: orders_processed{region="us-east-1"}

Common Tags for All Metrics

I set application-wide tags so every metric includes context:

MetricsConfig.java
import io.micrometer.core.instrument.MeterRegistry;
import org.springframework.boot.actuate.autoconfigure.metrics.MeterRegistryCustomizer;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class MetricsConfig {
@Bean
public MeterRegistryCustomizer<MeterRegistry> commonTags() {
return registry -> registry.config()
.commonTags("application", "order-service")
.commonTags("environment", "production")
.commonTags("region", "us-east-1");
}
}

Or via application properties:

application.yml
spring:
application:
name: order-service
management:
metrics:
tags:
application: ${spring.application.name}
environment: ${DEPLOY_ENV:development}
region: ${AWS_REGION:us-east-1}

Step 4: Avoid Cardinality Explosion

I learned this the hard way. High-cardinality tags can crash Prometheus.

The Problem: Too Many Time Series

BadCardinality.java
// DANGEROUS - Creates a time series for each user
Counter.builder("api.requests")
.tag("userId", userId) // Could be millions of users!
.register(registry)
.increment();
// DANGEROUS - Creates a time series for each request ID
Counter.builder("http.requests")
.tag("requestId", UUID.randomUUID().toString()) // Unique per request!
.register(registry)
.increment();

With 100,000 users, this creates 100,000 time series. With 1 million requests, that’s 1 million time series. Prometheus will run out of memory.

The Solution: Low-Cardinality Tags

GoodCardinality.java
// GOOD - Low cardinality (3-4 values)
Counter.builder("api.requests")
.tag("userType", getUserType(userId)) // "free", "premium", "enterprise"
.tag("tier", getUserTier(userId)) // "tier1", "tier2", "tier3"
.register(registry)
.increment();
// GOOD - Bounded values
Counter.builder("http.requests")
.tag("endpoint", getEndpointPattern(uri)) // "/api/users/{id}"
.tag("method", method) // "GET", "POST", etc.
.tag("status", String.valueOf(status)) // "200", "404", "500"
.register(registry)
.increment();

Rule of thumb: Keep cardinality under 10 for each tag, and total time series under 100,000.

Step 5: A Complete Example

Here’s a real-world example I use in my payment service:

PaymentService.java
import io.micrometer.core.instrument.*;
import org.springframework.stereotype.Service;
import java.util.concurrent.atomic.AtomicLong;
@Service
public class PaymentService {
private final Counter paymentsProcessed;
private final Counter paymentsFailed;
private final Timer paymentTimer;
private final AtomicLong pendingPayments = new AtomicLong(0);
public PaymentService(MeterRegistry registry) {
// Counter for successful payments
this.paymentsProcessed = Counter.builder("payments.processed")
.description("Total payments processed successfully")
.tag("service", "payment")
.tag("gateway", "stripe")
.register(registry);
// Counter for failed payments
this.paymentsFailed = Counter.builder("payments.failed")
.description("Total failed payments")
.tag("service", "payment")
.tag("gateway", "stripe")
.register(registry);
// Timer for payment processing duration
this.paymentTimer = Timer.builder("payments.processing.time")
.description("Time to process payment")
.tag("service", "payment")
.tag("gateway", "stripe")
.publishPercentiles(0.5, 0.95, 0.99)
.publishPercentileHistogram()
.register(registry);
// Gauge for pending payments
Gauge.builder("payments.pending", pendingPayments, AtomicLong::get)
.description("Number of payments currently being processed")
.tag("service", "payment")
.register(registry);
}
public PaymentResult processPayment(PaymentRequest request) {
pendingPayments.incrementAndGet();
try {
PaymentResult result = paymentTimer.record(() -> {
return stripeGateway.charge(request);
});
paymentsProcessed.increment();
return result;
} catch (Exception e) {
paymentsFailed.increment();
throw new PaymentException("Payment failed", e);
} finally {
pendingPayments.decrementAndGet();
}
}
}

This gives me:

  • payments_processed_total: Total successful payments
  • payments_failed_total: Total failed payments
  • payments_processing_time_seconds: Duration statistics (p50, p95, p99)
  • payments_pending: Current number of payments in flight

Step 6: Test Metrics Locally

I always test metrics before deploying. Here’s my workflow:

Start the Application

Terminal window
./mvnw spring-boot:run

Check Available Metrics

Terminal window
# List all available metrics
curl http://localhost:8080/actuator/metrics
# Get specific metric
curl http://localhost:8080/actuator/metrics/payments.processed
# Check Prometheus format
curl http://localhost:8080/actuator/prometheus | grep payments

Expected Output

Sample Prometheus output
# HELP payments_processed_total Total payments processed successfully
# TYPE payments_processed_total counter
payments_processed_total{application="order-service",environment="production",gateway="stripe",service="payment"} 42.0
# HELP payments_processing_time_seconds Time to process payment
# TYPE payments_processing_time_seconds summary
payments_processing_time_seconds_count{application="order-service",gateway="stripe",service="payment"} 42.0
payments_processing_time_seconds_sum{application="order-service",gateway="stripe",service="payment"} 12.34
payments_processing_time_seconds{application="order-service",gateway="stripe",service="payment",quantile="0.5"} 0.25
payments_processing_time_seconds{application="order-service",gateway="stripe",service="payment",quantile="0.95"} 0.89
payments_processing_time_seconds{application="order-service",gateway="stripe",service="payment",quantile="0.99"} 1.23

Common Issues I Encountered

Issue 1: Metric Not Showing Up

Symptoms: Metric doesn’t appear in /actuator/prometheus

Causes I found:

  • Meter not registered: Call .register(registry)
  • Meter never used: Counters and Timers only emit when incremented
  • Filtered by management settings

Fix:

RegisteringMeters.java
// WRONG - Not registered
Counter counter = Counter.builder("my.metric")
.tag("key", "value");
// CORRECT - Registered
Counter counter = Counter.builder("my.metric")
.tag("key", "value")
.register(registry); // <-- Important!

Issue 2: Wrong Metric Type

Symptoms: Using Counter for values that go down

Wrong approach:

WrongMetricType.java
// WRONG - Counter only increases
Counter userCount = registry.counter("users.active");
userCount.increment(); // When user logs in
// Can't decrement when user logs out!

Correct approach:

CorrectMetricType.java
// CORRECT - Gauge can go up and down
AtomicLong activeUsers = new AtomicLong(0);
Gauge.builder("users.active", activeUsers, AtomicLong::get)
.register(registry);
// Increment/decrement as needed
activeUsers.incrementAndGet(); // User logs in
activeUsers.decrementAndGet(); // User logs out

Issue 3: Prometheus Memory Issues

Symptoms: Prometheus runs out of memory, slow queries

Cause: High-cardinality tags

How to find the problem:

Find high cardinality metrics
topk(10, count by (__name__) ({__name__=~".+"}))

This shows the top 10 metrics by cardinality. Look for metrics with millions of time series.

Issue 4: Percentiles Missing

Symptoms: No p95/p99 values in Timer metrics

Fix: Enable percentile publishing

TimerWithPercentiles.java
Timer.builder("my.timer")
.publishPercentiles(0.5, 0.95, 0.99) // Enable percentiles
.publishPercentileHistogram() // For Prometheus histogram
.register(registry);

Production Considerations

Separate Management Port

I expose metrics on a different port for security:

application-production.yml
management:
server:
port: 9090
address: 0.0.0.0
server:
port: 8080

Now only internal monitoring systems can access the metrics endpoint.

Disable Unnecessary Metrics

Spring Boot enables many metrics by default. I disable what I don’t need:

application.yml
management:
metrics:
enable:
jvm: true
process: true
tomcat: true
http: true
logback: false
uptime: false

Resource Limits

In Kubernetes, I set limits on the metrics endpoint:

kubernetes-deployment.yml
livenessProbe:
httpGet:
path: /actuator/health
port: 9090
initialDelaySeconds: 30
readinessProbe:
httpGet:
path: /actuator/health
port: 9090
initialDelaySeconds: 10
resources:
limits:
memory: "512Mi"
requests:
memory: "256Mi"

Summary

In this post, I covered how to set up and use Micrometer for Spring Boot monitoring with best practices. I started with the problem of needing business-specific metrics beyond the defaults, then walked through the solution using Micrometer’s meter types.

The key practices I follow:

  1. Choose the right meter type: Counter for totals, Gauge for current values, Timer for durations
  2. Follow naming conventions: Use lowercase dot notation and descriptive names
  3. Use tags wisely: Enable filtering without creating cardinality explosion
  4. Set common tags: Add application and environment context to all metrics
  5. Test locally: Verify metrics appear correctly before deploying
  6. Configure for production: Separate management port and resource limits

Micrometer’s integration with Spring Boot Actuator means I can focus on instrumenting my business logic while Micrometer handles translation to Prometheus, Datadog, or any other monitoring backend.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments