How to Debug AI Tool Selection Decisions in Spring AI

Mar 26, 2026

The Problem

My AI agent called the wrong tool. The user asked about a patient’s medication, but the agent retrieved their appointment history instead. My logs showed:

2026-03-26 14:22:15 INFO  Tool called: retrieveAppointmentHistory
2026-03-26 14:22:16 INFO  Tool result: {"appointments": [...]}

Why did it choose retrieveAppointmentHistory over retrieveMedications? I had no idea. The logs only showed outcomes, not decisions.

The Solution

Spring AI’s Tool Argument Augmenter captures the LLM’s reasoning for each tool call. It forces the model to explain why it selected a specific tool before executing it.

Understanding the Debug Problem

Traditional debugging shows me when tools fire, not why.

// This tells me nothing about WHY
@Tool
public String retrieveAppointmentHistory(String patientId) {
    log.info("Called retrieveAppointmentHistory for {}", patientId);
    return appointmentService.getHistory(patientId);
}

The log tells me the tool was called. It doesn’t tell me what alternatives the model considered or why it rejected them.

Adding Reasoning Capture

I need to inject reasoning parameters into every tool call.

public record ToolReasoning(
    @ToolParam(description = "Explain why you selected this tool over alternatives")
    String innerThought,

    @ToolParam(description = "Your confidence in this tool selection: high, medium, low")
    String confidence
) {}

@Component
public class PatientTools {

    @Tool(description = "Retrieve patient medication list")
    public String retrieveMedications(
        @ToolParam(description = "Patient ID") String patientId,
        ToolReasoning reasoning  // Automatically populated
    ) {
        return medicationService.getMedications(patientId);
    }

    @Tool(description = "Retrieve patient appointment history")
    public String retrieveAppointmentHistory(
        @ToolParam(description = "Patient ID") String patientId,
        ToolReasoning reasoning
    ) {
        return appointmentService.getHistory(patientId);
    }

    @Tool(description = "Retrieve patient health status")
    public String retrievePatientHealthStatus(
        @ToolParam(description = "Patient ID") String patientId,
        ToolReasoning reasoning
    ) {
        return healthService.getStatus(patientId);
    }
}

Now I configure the augmenter with debug-friendly logging.

@Service
public class DebuggableAgentService {
    private final ChatClient chatClient;
    private static final Logger log = LoggerFactory.getLogger(DebuggableAgentService.class);

    public DebuggableAgentService(OpenAiChatModel model) {
        AugmentedToolCallbackProvider&lt;ToolReasoning&gt; provider =
            AugmentedToolCallbackProvider.&lt;ToolReasoning&gt;builder()
                .toolObject(new PatientTools())
                .argumentType(ToolReasoning.class)
                .argumentConsumer(event -> {
                    ToolReasoning reasoning = event.arguments();

                    log.info("""
                        ========== TOOL DECISION ==========
                        Tool: {}
                        Reasoning: {}
                        Confidence: {}
                        ====================================
                        """,
                        event.toolDefinition().name(),
                        reasoning.innerThought(),
                        reasoning.confidence());
                })
                .build();

        chatClient = ChatClient.builder(model)
            .defaultToolCallbacks(provider)
            .build();
    }

    public String process(String userMessage) {
        return chatClient.prompt()
            .user(userMessage)
            .call()
            .content();
    }
}

Debug Output That Reveals Decisions

Now when I run my agent, I see the reasoning chain.

========== TOOL DECISION ==========
Tool: retrievePatientHealthStatus
Reasoning: The user asked about medication. I need to first check the patient's
current health status to understand their medication context and ensure the
medication list is appropriate for their condition.
Confidence: high
====================================

========== TOOL DECISION ==========
Tool: retrieveMedications
Reasoning: Now that I have confirmed the patient's health status, I can
retrieve their medication list. The health status shows no allergies that
would conflict with the medications I'm about to retrieve.
Confidence: high
====================================

I can see the model’s thought process. It checks health status first for context, then retrieves medications.

Debugging Wrong Tool Selection

When the model chooses the wrong tool, the reasoning reveals the issue.

========== TOOL DECLECTION ==========
Tool: retrieveAppointmentHistory
Reasoning: The user asked about "what the patient is taking". I interpret
"taking" as scheduling or appointments they're taking on, so I'll retrieve
their appointment history.
Confidence: medium
====================================

The confidence is medium, and the reasoning shows a misinterpretation. The model thought “taking” meant appointments rather than medications. This tells me my tool descriptions need improvement.

Fixing Tool Descriptions

Based on the debug output, I improve my tool descriptions.

@Component
public class PatientTools {

    @Tool(description = """
        Retrieve patient's medication list.
        Use when user asks about:
        - Medications, prescriptions, drugs the patient is taking
        - What medications the patient is on
        - Current prescriptions
        """)
    public String retrieveMedications(
        @ToolParam(description = "Patient ID") String patientId,
        ToolReasoning reasoning
    ) {
        return medicationService.getMedications(patientId);
    }

    @Tool(description = """
        Retrieve patient's appointment history.
        Use when user asks about:
        - Appointments, visits, scheduled visits
        - Past or upcoming appointments
        - When the patient has visited
        """)
    public String retrieveAppointmentHistory(
        @ToolParam(description = "Patient ID") String patientId,
        ToolReasoning reasoning
    ) {
        return appointmentService.getHistory(patientId);
    }
}

Chain-of-Tool Debugging

When multiple tools are called in sequence, I see each step’s reasoning.

========== TOOL DECISION ==========
Tool: retrievePatientId
Reasoning: I encountered an issue - the user provided a patient name
"P002" but I need a patient ID. I'll first get the patient ID using
the provided identifier.
Confidence: high
====================================

========== TOOL DECISION ==========
Tool: retrievePatientHealthStatus
Reasoning: Now that I have the patient ID (PAT-12345), I can use it
to retrieve the health status. The ID was successfully resolved from
the previous tool call.
Confidence: high
====================================

I can trace the full decision chain. Each step explains what data it needed and how previous results informed the next choice.

Common Mistake: Print Statements Around Tool Calls

I tried adding print statements in my service layer.

// DON'T DO THIS - shows WHEN, not WHY
public String processRequest(String message) {
    log.info("Processing: {}", message);
    String result = chatClient.prompt().user(message).call().content();
    log.info("Result: {}", result);  // No visibility into tool decisions
    return result;
}

This approach misses the model’s internal reasoning. The augmenter pattern captures reasoning before the tool executes.

Environment

Spring Boot 3.3.x
Spring AI 1.0.0
Java 21

Debugging Workflow

My debugging workflow now:

Run the agent with reasoning capture enabled
Review the debug logs to see tool selection reasoning
Identify low-confidence decisions or misinterpretations
Update tool descriptions based on observed reasoning errors
Re-test to verify improvements

Summary

Debugging AI tool selection requires visibility into the model’s decision process. Spring AI’s Tool Argument Augmenter captures this by injecting @ToolParam fields that force the LLM to explain its choices.

Key insights from debug output:

Why a tool was selected: The innerThought reveals interpretation
Confidence level: Low confidence signals potential issues
Decision chain: Multiple tool calls show how reasoning flows
Misinterpretations: Incorrect tool choices trace back to ambiguous descriptions

The fix is usually improving tool descriptions based on what the reasoning reveals.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Spring AI Documentation
👨‍💻 Tool Calling Guide

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!