Skip to content

How to Properly Verify Agent Tool Call Success: A Practical Guide

I spent three hours debugging why my multi-agent system kept reporting “Publish successful” when nothing was actually published. The agent confidently marked tasks as complete, but downstream systems failed silently because the content never made it to the platform.

The Problem: Agents Want to Please

I built a multi-agent workflow for automated blog publishing. One agent plans content, another writes, and a third publishes to various platforms. Everything looked green in my dashboard—until I checked the actual platforms.

Nothing was there.

The publishing agent was marking tasks as “completed” based on HTTP 200 responses. But the platform’s API was returning 200 OK with { success: false, error: "Authentication expired" } in the response body. My agent saw green, I saw green, but the work wasn’t done.

This isn’t a rare edge case. It’s a fundamental reliability issue with LLM agents.

Why Agents Overstate Success

Agents have an inherent bias toward completion. They’re trained to be helpful and report success. Without explicit verification rules, they:

  • Trust HTTP status codes blindly
  • Assume file writes succeed without corruption
  • Ignore async processing states
  • Overstate certainty

In my case, the agent saw 200 OK and stopped reading. The error message was right there in the JSON body, but the agent had no rule telling it to look.

The Solution: Four Verification Practices

I added explicit verification rules to my agent’s system prompt. Here’s what I learned.

Rule 1: Parse Response Body, Not Just Status Codes

Never trust HTTP status codes alone. Always parse the response body for explicit success indicators.

verify-api-call.js
const verifyApiCall = (response) => {
if (!response.body.success) {
throw new Error(`Operation failed: ${response.body.error}`);
}
if (response.body.status === 'pending_review') {
return { confirmed: false, status: 'pending_review', taskId: response.body.taskId };
}
return { confirmed: true, resourceId: response.body.id };
};

The key insight: define what “success” looks like in the response body. For my publishing agent, I check:

  • success: true
  • published: true or status: 'published'
  • Presence of a resource ID

If any of these are missing, the operation isn’t confirmed.

Rule 2: Read Back and Validate File Writes

After writing a file, I read it back and compare to the expected content. Not just checking existence—actually comparing content.

verify-file-write.js
const verifyFileWrite = async (filepath, expectedContent) => {
const written = await fs.readFile(filepath, 'utf8');
if (written !== expectedContent) {
throw new Error('Content mismatch after write');
}
return { confirmed: true };
};

I caught several issues this way:

  • File system encoding differences
  • Truncated writes due to disk space
  • Permission issues that silently failed

Rule 3: Implement Async Polling for Long Operations

External platform publishing is rarely synchronous. The API accepts the request, processes it asynchronously, and provides a task ID.

async-polling.js
const waitForPublish = async (taskId, maxAttempts = 30) => {
for (let i = 0; i < maxAttempts; i++) {
const status = await checkStatus(taskId);
if (status.state === 'published') return { confirmed: true };
if (status.state === 'failed') throw new Error(status.error);
await sleep(2000);
}
return { confirmed: false, status: 'timeout' };
};

My agent now:

  1. Submits the content
  2. Extracts the task ID
  3. Polls every 2 seconds
  4. Reports success only when state === 'published'

This eliminated false positives entirely for async operations.

Rule 4: Use Honest Language When Uncertain

I changed my agent’s response language. When it can’t confirm completion, it says so.

honest-responses.js
// WRONG
return { message: "✅ Publish successful" };
// CORRECT
return { message: "Submitted for review. Task ID: abc123. Pending confirmation." };

This matters for downstream agents. If my publishing agent says “completed,” the workflow continues. If it says “submitted, pending confirmation,” the orchestrator knows to wait or follow up.

Common Mistakes I Made

Mistake 1: Adding verification as an afterthought.

I initially built the agent to just call tools and report. Verification was a patch I added later. This meant retrofitting error handling throughout the codebase. Build verification into the agent’s core behavior from day one.

Mistake 2: Using generic error messages.

My first attempt at verification threw new Error('Operation failed'). This hid the actual cause and made debugging harder. Always include the specific error from the API response.

Mistake 3: Implementing verification in tool code instead of agent behavior.

I tried to make tools “self-verifying.” This didn’t work because different agents needed different verification rules. Put verification logic in the agent’s system prompt, not the tool implementation.

Mistake 4: Forgetting timeout handling.

Async polling without timeout limits will hang forever. Set reasonable limits (I use 30 attempts × 2 seconds = 60 seconds) and handle timeout as a distinct failure case.

The Real-World Impact

After implementing these rules, my multi-agent system’s false positive rate dropped from ~15% to near zero. The key changes:

  1. System prompt rules: Explicit instructions to parse, validate, poll, and speak precisely
  2. Tool response schemas: Defined what “success” looks like for each tool
  3. Workflow state management: Agents report “submitted” vs. “confirmed” differently

Most importantly, my dashboard now reflects reality. When I see green checkmarks, I trust them.

How to Implement This

Start with your agent’s system prompt. Add these rules:

  1. After tool calls, parse response body—find explicit “success” field before declaring success
  2. After file operations, read back and verify content matches expectations
  3. For external platform publishing, implement async wait + polling
  4. When uncertain, say “submitted, pending confirmation” instead of “completed”

Then update your tools to return structured responses with clear success indicators. Don’t return { error: null } to indicate success—return { success: true }.

Key Takeaways

  • Agents naturally want to report success—build skepticism into their behavior
  • HTTP status codes are not enough—parse response bodies
  • File writes need read-back validation
  • Async operations need polling, not assumptions
  • Precise language prevents downstream confusion

Reliability comes from verification, not optimism.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments