How to Implement Tool Calling with Confidence Levels in Spring AI

Mar 26, 2026

The Problem

My AI agent was calling tools, but I couldn’t tell how certain the model was about its choices. The same log entry appeared whether the LLM was highly confident or just guessing:

2026-03-26 09:15:22 INFO  Tool called: retrievePatientHealthStatus
2026-03-26 09:15:22 INFO  Arguments: {patientId=PAT-001}

When the agent made a wrong decision, I had no signal to detect it. I needed to know: was this a confident choice or an uncertain guess?

The Solution

Spring AI’s Tool Argument Augmenter lets you add a confidence field to every tool call. The LLM evaluates its own certainty and reports “low”, “medium”, or “high” for each tool selection.

Adding Confidence Scoring

Step 1: Create the Thinking DTO

Define a record with a required confidence field:

import org.springframework.ai.tool.annotation.ToolParam;

public record AgentThinking(
    @ToolParam(description = """
        Your step-by-step reasoning for why you're calling this tool.
        """, required = true)
    String innerThought,

    @ToolParam(description = "Confidence level (low, medium, high) in this tool choice", required = true)
    String confidence
) {}

The required = true is critical. Without it, the LLM might skip the confidence field inconsistently.

Step 2: Configure the Augmented Provider

Wire up the augmenter with confidence-aware handling:

import org.springframework.ai.tool.augment.AugmentedToolCallbackProvider;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class ToolConfig {

    private static final Logger log = LoggerFactory.getLogger(ToolConfig.class);
    private final AlertService alertService;
    private final MetricsService metricsService;

    public ToolConfig(AlertService alertService, MetricsService metricsService) {
        this.alertService = alertService;
        this.metricsService = metricsService;
    }

    @Bean
    public AugmentedToolCallbackProvider&lt;AgentThinking&gt; augmentedToolProvider(
            HealthTools healthTools) {

        return AugmentedToolCallbackProvider
            .&lt;AgentThinking&gt;builder()
            .toolObject(healthTools)
            .argumentType(AgentThinking.class)
            .argumentConsumer(event -&gt; {
                AgentThinking thinking = event.arguments();

                log.info("Tool: {}", event.toolDefinition().name());
                log.info("Reasoning: {}", thinking.innerThought());
                log.info("Confidence: {}", thinking.confidence());

                // Track confidence metrics
                metricsService.recordConfidence(
                    event.toolDefinition().name(),
                    thinking.confidence()
                );

                // Alert on low confidence
                if ("low".equals(thinking.confidence())) {
                    log.warn("Low confidence tool selection detected");
                    alertService.notifyLowConfidence(event, thinking);
                }
            })
            .build();
    }
}

Step 3: Conditional Execution Based on Confidence

The real power comes from acting on confidence levels. Here’s a service that skips low-confidence calls:

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.stereotype.Service;

@Service
public class ConfidenceAwareAgentService {

    private final ChatClient chatClient;
    private final Map&lt;String, String&gt; pendingLowConfidenceCalls = new ConcurrentHashMap&lt;&gt;();

    public ConfidenceAwareAgentService(ChatClient.Builder builder,
                                       AugmentedToolCallbackProvider&lt;AgentThinking&gt; provider) {
        this.chatClient = builder
            .defaultToolCallbacks(provider)
            .build();
    }

    public AgentResponse process(String userInput) {
        String response = chatClient.prompt()
            .user(userInput)
            .call()
            .content();

        return new AgentResponse(response, pendingLowConfidenceCalls.isEmpty());
    }

    public Map&lt;String, String&gt; getPendingReviews() {
        return Map.copyOf(pendingLowConfidenceCalls);
    }

    public void approveLowConfidenceCall(String callId) {
        pendingLowConfidenceCalls.remove(callId);
        // Execute the approved call
    }
}

And here’s an alert service for human-in-the-loop:

import org.springframework.stereotype.Service;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

@Service
public class AlertService {

    private final NotificationService notificationService;
    private final Map&lt;String, LowConfidenceEvent&gt; pendingReviews = new ConcurrentHashMap&lt;&gt;();

    public AlertService(NotificationService notificationService) {
        this.notificationService = notificationService;
    }

    public void notifyLowConfidence(ToolCallEvent event, AgentThinking thinking) {
        String callId = UUID.randomUUID().toString();

        LowConfidenceEvent lowConfEvent = new LowConfidenceEvent(
            callId,
            event.toolDefinition().name(),
            thinking.innerThought(),
            Instant.now()
        );

        pendingReviews.put(callId, lowConfEvent);

        // Alert ops team
        notificationService.sendAlert("""
            Low Confidence Tool Call Detected

            Tool: %s
            Reasoning: %s
            Call ID: %s
            Time: %s

            Review and approve or reject.
            """.formatted(
                lowConfEvent.toolName(),
                lowConfEvent.reasoning(),
                callId,
                lowConfEvent.timestamp()
            ));
    }

    public Map&lt;String, LowConfidenceEvent&gt; getPendingReviews() {
        return Map.copyOf(pendingReviews);
    }
}

The Result

Now when I run my agent, I see confidence in the logs:

2026-03-26 09:20:15 INFO  Tool: retrievePatientHealthStatus
2026-03-26 09:20:15 INFO  Reasoning: The user asked about patient PAT-001's health. I need to retrieve their current status to provide accurate information.
2026-03-26 09:20:15 INFO  Confidence: high

When confidence is low, my alert service fires:

2026-03-26 09:22:31 WARN  Low confidence tool selection detected
2026-03-26 09:22:31 INFO  Alert sent to ops team for review

Why This Matters

Confidence levels enable production-grade AI systems:

Conditional Execution: Skip or queue low-confidence calls for review
Human-in-the-Loop: Route uncertain decisions to operators
Quality Metrics: Track confidence distribution over time
Fallback Mechanisms: Use alternative approaches when confidence drops

Common Mistake: Making Confidence Optional

I initially made the confidence field optional:

// DON'T DO THIS
@ToolParam(description = "Confidence level (optional)")
String confidence  // May be null, inconsistent data

This broke my monitoring. The LLM sometimes skipped the field, giving me inconsistent data. Always mark it required = true:

// ALWAYS DO THIS
@ToolParam(description = "Confidence level (low, medium, high)", required = true)
String confidence  // Always populated, consistent monitoring

Tracking Confidence Over Time

Here’s a simple metrics service to track confidence patterns:

import org.springframework.stereotype.Service;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.atomic.AtomicLong;

@Service
public class MetricsService {

    private final Map&lt;String, AtomicLong&gt; highConfidenceCount = new ConcurrentHashMap&lt;&gt;();
    private final Map&lt;String, AtomicLong&gt; mediumConfidenceCount = new ConcurrentHashMap&lt;&gt;();
    private final Map&lt;String, AtomicLong&gt; lowConfidenceCount = new ConcurrentHashMap&lt;&gt;();

    public void recordConfidence(String toolName, String confidence) {
        switch (confidence.toLowerCase()) {
            case "high" -&gt; highConfidenceCount
                .computeIfAbsent(toolName, k -&gt; new AtomicLong())
                .incrementAndGet();
            case "medium" -&gt; mediumConfidenceCount
                .computeIfAbsent(toolName, k -&gt; new AtomicLong())
                .incrementAndGet();
            case "low" -&gt; lowConfidenceCount
                .computeIfAbsent(toolName, k -&gt; new AtomicLong())
                .incrementAndGet();
        }
    }

    public ConfidenceReport getReport() {
        return new ConfidenceReport(
            Map.copyOf(highConfidenceCount),
            Map.copyOf(mediumConfidenceCount),
            Map.copyOf(lowConfidenceCount)
        );
    }

    public record ConfidenceReport(
        Map&lt;String, AtomicLong&gt; high,
        Map&lt;String, AtomicLong&gt; medium,
        Map&lt;String, AtomicLong&gt; low
    ) {}
}

Confidence Patterns to Watch

When analyzing confidence metrics, look for:

Consistently low confidence on a tool: Description may be unclear
Confidence drops after model changes: Test thoroughly before deploying
Low confidence on specific inputs: Edge cases needing special handling
High variance in confidence: User prompts may be ambiguous

Environment

Spring Boot 3.3.x
Spring AI 1.0.0
Java 21

Summary

Add confidence scoring to Spring AI tool calls by defining a DTO with a required confidence field and registering it with AugmentedToolCallbackProvider. The LLM populates “low”, “medium”, or “high” for each tool selection. Use this signal for conditional execution, human-in-the-loop triggers, quality metrics, and fallback mechanisms. Never make confidence optional, or your monitoring data becomes inconsistent.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Spring AI Documentation
👨‍💻 Tool Argument Augmenter

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!