How to Implement Tool Calling with Confidence Levels in Spring AI
The Problem
My AI agent was calling tools, but I couldn’t tell how certain the model was about its choices. The same log entry appeared whether the LLM was highly confident or just guessing:
2026-03-26 09:15:22 INFO Tool called: retrievePatientHealthStatus2026-03-26 09:15:22 INFO Arguments: {patientId=PAT-001}When the agent made a wrong decision, I had no signal to detect it. I needed to know: was this a confident choice or an uncertain guess?
The Solution
Spring AI’s Tool Argument Augmenter lets you add a confidence field to every tool call. The LLM evaluates its own certainty and reports “low”, “medium”, or “high” for each tool selection.
Adding Confidence Scoring
Step 1: Create the Thinking DTO
Define a record with a required confidence field:
import org.springframework.ai.tool.annotation.ToolParam;
public record AgentThinking( @ToolParam(description = """ Your step-by-step reasoning for why you're calling this tool. """, required = true) String innerThought,
@ToolParam(description = "Confidence level (low, medium, high) in this tool choice", required = true) String confidence) {}The required = true is critical. Without it, the LLM might skip the confidence field inconsistently.
Step 2: Configure the Augmented Provider
Wire up the augmenter with confidence-aware handling:
import org.springframework.ai.tool.augment.AugmentedToolCallbackProvider;import org.springframework.context.annotation.Bean;import org.springframework.context.annotation.Configuration;
@Configurationpublic class ToolConfig {
private static final Logger log = LoggerFactory.getLogger(ToolConfig.class); private final AlertService alertService; private final MetricsService metricsService;
public ToolConfig(AlertService alertService, MetricsService metricsService) { this.alertService = alertService; this.metricsService = metricsService; }
@Bean public AugmentedToolCallbackProvider<AgentThinking> augmentedToolProvider( HealthTools healthTools) {
return AugmentedToolCallbackProvider .<AgentThinking>builder() .toolObject(healthTools) .argumentType(AgentThinking.class) .argumentConsumer(event -> { AgentThinking thinking = event.arguments();
log.info("Tool: {}", event.toolDefinition().name()); log.info("Reasoning: {}", thinking.innerThought()); log.info("Confidence: {}", thinking.confidence());
// Track confidence metrics metricsService.recordConfidence( event.toolDefinition().name(), thinking.confidence() );
// Alert on low confidence if ("low".equals(thinking.confidence())) { log.warn("Low confidence tool selection detected"); alertService.notifyLowConfidence(event, thinking); } }) .build(); }}Step 3: Conditional Execution Based on Confidence
The real power comes from acting on confidence levels. Here’s a service that skips low-confidence calls:
import org.springframework.ai.chat.client.ChatClient;import org.springframework.stereotype.Service;
@Servicepublic class ConfidenceAwareAgentService {
private final ChatClient chatClient; private final Map<String, String> pendingLowConfidenceCalls = new ConcurrentHashMap<>();
public ConfidenceAwareAgentService(ChatClient.Builder builder, AugmentedToolCallbackProvider<AgentThinking> provider) { this.chatClient = builder .defaultToolCallbacks(provider) .build(); }
public AgentResponse process(String userInput) { String response = chatClient.prompt() .user(userInput) .call() .content();
return new AgentResponse(response, pendingLowConfidenceCalls.isEmpty()); }
public Map<String, String> getPendingReviews() { return Map.copyOf(pendingLowConfidenceCalls); }
public void approveLowConfidenceCall(String callId) { pendingLowConfidenceCalls.remove(callId); // Execute the approved call }}And here’s an alert service for human-in-the-loop:
import org.springframework.stereotype.Service;import java.util.Map;import java.util.concurrent.ConcurrentHashMap;
@Servicepublic class AlertService {
private final NotificationService notificationService; private final Map<String, LowConfidenceEvent> pendingReviews = new ConcurrentHashMap<>();
public AlertService(NotificationService notificationService) { this.notificationService = notificationService; }
public void notifyLowConfidence(ToolCallEvent event, AgentThinking thinking) { String callId = UUID.randomUUID().toString();
LowConfidenceEvent lowConfEvent = new LowConfidenceEvent( callId, event.toolDefinition().name(), thinking.innerThought(), Instant.now() );
pendingReviews.put(callId, lowConfEvent);
// Alert ops team notificationService.sendAlert(""" Low Confidence Tool Call Detected
Tool: %s Reasoning: %s Call ID: %s Time: %s
Review and approve or reject. """.formatted( lowConfEvent.toolName(), lowConfEvent.reasoning(), callId, lowConfEvent.timestamp() )); }
public Map<String, LowConfidenceEvent> getPendingReviews() { return Map.copyOf(pendingReviews); }}The Result
Now when I run my agent, I see confidence in the logs:
2026-03-26 09:20:15 INFO Tool: retrievePatientHealthStatus2026-03-26 09:20:15 INFO Reasoning: The user asked about patient PAT-001's health. I need to retrieve their current status to provide accurate information.2026-03-26 09:20:15 INFO Confidence: highWhen confidence is low, my alert service fires:
2026-03-26 09:22:31 WARN Low confidence tool selection detected2026-03-26 09:22:31 INFO Alert sent to ops team for reviewWhy This Matters
Confidence levels enable production-grade AI systems:
- Conditional Execution: Skip or queue low-confidence calls for review
- Human-in-the-Loop: Route uncertain decisions to operators
- Quality Metrics: Track confidence distribution over time
- Fallback Mechanisms: Use alternative approaches when confidence drops
Common Mistake: Making Confidence Optional
I initially made the confidence field optional:
// DON'T DO THIS@ToolParam(description = "Confidence level (optional)")String confidence // May be null, inconsistent dataThis broke my monitoring. The LLM sometimes skipped the field, giving me inconsistent data. Always mark it required = true:
// ALWAYS DO THIS@ToolParam(description = "Confidence level (low, medium, high)", required = true)String confidence // Always populated, consistent monitoringTracking Confidence Over Time
Here’s a simple metrics service to track confidence patterns:
import org.springframework.stereotype.Service;import java.util.Map;import java.util.concurrent.ConcurrentHashMap;import java.util.concurrent.atomic.AtomicLong;
@Servicepublic class MetricsService {
private final Map<String, AtomicLong> highConfidenceCount = new ConcurrentHashMap<>(); private final Map<String, AtomicLong> mediumConfidenceCount = new ConcurrentHashMap<>(); private final Map<String, AtomicLong> lowConfidenceCount = new ConcurrentHashMap<>();
public void recordConfidence(String toolName, String confidence) { switch (confidence.toLowerCase()) { case "high" -> highConfidenceCount .computeIfAbsent(toolName, k -> new AtomicLong()) .incrementAndGet(); case "medium" -> mediumConfidenceCount .computeIfAbsent(toolName, k -> new AtomicLong()) .incrementAndGet(); case "low" -> lowConfidenceCount .computeIfAbsent(toolName, k -> new AtomicLong()) .incrementAndGet(); } }
public ConfidenceReport getReport() { return new ConfidenceReport( Map.copyOf(highConfidenceCount), Map.copyOf(mediumConfidenceCount), Map.copyOf(lowConfidenceCount) ); }
public record ConfidenceReport( Map<String, AtomicLong> high, Map<String, AtomicLong> medium, Map<String, AtomicLong> low ) {}}Confidence Patterns to Watch
When analyzing confidence metrics, look for:
- Consistently low confidence on a tool: Description may be unclear
- Confidence drops after model changes: Test thoroughly before deploying
- Low confidence on specific inputs: Edge cases needing special handling
- High variance in confidence: User prompts may be ambiguous
Environment
- Spring Boot 3.3.x
- Spring AI 1.0.0
- Java 21
Summary
Add confidence scoring to Spring AI tool calls by defining a DTO with a required confidence field and registering it with AugmentedToolCallbackProvider. The LLM populates “low”, “medium”, or “high” for each tool selection. Use this signal for conditional execution, human-in-the-loop triggers, quality metrics, and fallback mechanisms. Never make confidence optional, or your monitoring data becomes inconsistent.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments