Why Does My Local LLM Fail at Tool Calling for Coding?

Mar 21, 2026

Problem

“The model didn’t know how to use the tools for coding.”

That’s the error I kept getting when trying to use Roo Code with my local LLM. After spending time setting up Ollama and pulling Qwen Coder, I expected it to just work. Instead, Roo Code was “pretty much useless” - it couldn’t perform basic coding tasks because the model didn’t know how to use tools.

I tried Continue.dev next, hoping for better results. It was slightly better but still failed at basic tasks - confusing directory creation with file creation, throwing errors on anything complex. Both assistants were essentially glorified autocomplete tools.

Environment

Here’s what I was working with:

Ollama: 0.5.x running locally
Models tested: Qwen 2.5 Coder (7B), Devstral
Coding assistants: Roo Code, Cline, Continue.dev
VS Code: Latest version
OS: macOS (Apple Silicon)

What Happened

I had set up Ollama and pulled what I thought were capable models:

ollama pull qwen2.5-coder:7b

Then I configured Roo Code to use my local model. The extension connected successfully, and I could see the model responding in the chat. But when I tried to perform actual coding tasks:

Error: The model didn't know how to use the tools for coding

I thought maybe I picked the wrong model. I tried Devstral, specifically advertised as an agentic coding model. Same result. I even tried smaller models like Qwen 2.5 Coder 3B, thinking maybe the smaller footprint would help.

Nothing worked.

Continue.dev was marginally better - it could respond to chat prompts. But when I asked it to create a file:

Created directory: /src/utils.ts

It created a directory instead of a file. And when I tried more complex tasks like refactoring across multiple files? Complete failure with cryptic errors.

How to Solve It

Step 1: Use a Model That Actually Supports Tool Calling

The first revelation came from a Reddit comment: “You’re having issues with tool calling because small and old models weren’t trained on tool usage.”

Not all models are created equal. Tool calling (also called function calling) is a specific capability that requires training. I needed to verify my chosen model actually had this capability.

# Verify tool calling support in Ollama
ollama show qwen2.5-coder:7b --modelfile

The key insight: models under 7B parameters rarely have good tool calling support. I needed at least 7B, and ideally a model specifically fine-tuned for agentic use.

Step 2: Explicitly Enable Tool Calling in Configuration

This was my “aha” moment. Another comment pointed out: “With Roo or Cline you might have to tell the app the model has tool calling and vision capabilities.”

Coding assistants don’t automatically detect model capabilities. They often default to assuming the model lacks tool calling support. I had to explicitly configure it.

For Roo Code:

{
  "models": {
    "qwen-coder": {
      "provider": "ollama",
      "modelId": "qwen2.5-coder:7b",
      "capabilities": {
        "toolCalling": true,
        "vision": false
      }
    }
  }
}

For Cline:

{
  "apiProvider": "ollama",
  "modelId": "devstral:latest",
  "supportsFunctions": true,
  "supportsImages": false
}

For Continue.dev:

models:
  - name: "Qwen Coder Local"
    provider: "ollama"
    model: "qwen2.5-coder:7b"
    roles:
      - "chat"
      - "edit"
    capabilities:
      tool_use: true

Step 3: Disable Excessive Reasoning

After enabling tool calling, I hit another issue. The model would generate pages of reasoning tokens before actually making a tool call, or sometimes skip the tool call entirely.

One user described a similar debugging journey: “With Roo, I only got it working by stopping the excessive thinking. It took me a couple of days working with Claude troubleshooting to get VSCode + Roo + Qwen Coder Next to all work together.”

The fix was to adjust the model’s reasoning parameters:

# Create a custom model with reduced thinking
ollama create qwen-coder-tools -f - <<EOF
FROM qwen2.5-coder:7b
PARAMETER temperature 0.3
PARAMETER num_ctx 32768
PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"
EOF

Or in the assistant configuration, limit the thinking budget:

{
  "models": {
    "qwen-coder": {
      "provider": "ollama",
      "modelId": "qwen2.5-coder:7b",
      "capabilities": {
        "toolCalling": true
      },
      "thinking": {
        "enabled": false
      }
    }
  }
}

Alternative: Use a Different Assistant

If you’re still struggling, consider assistants designed from the ground up for local models:

OpenCode: Designed for local Ollama servers
Codex: Supports local inference
ClaudeCode: Can work with local Ollama servers

These assistants have better defaults for local models and don’t assume OpenAI-style capabilities by default.

The Reason

Why Tool Calling Isn’t Universal

Tool calling requires the model to:

Understand structured tool definitions (JSON schemas)
Output correctly formatted tool calls in specific formats
Handle multi-step tool sequences without losing context

This isn’t something every model can do. It requires specific training data that includes tool usage examples. Many local models are fine-tuned for:

Chat conversations
Code completion
Instruction following

But not specifically for agentic tool use. The model needs to understand that when given a tool definition like:

{
  "name": "create_file",
  "parameters": {
    "path": "string",
    "content": "string"
  }
}

It should output something like:

{
  "tool": "create_file",
  "arguments": {
    "path": "/src/utils.ts",
    "content": "export function hello() {}"
  }
}

Without specific training, the model might:

Ignore the tool definition entirely
Output malformed JSON
Generate reasoning text instead of tool calls
Confuse tool calling with regular chat

Why Configuration Matters

Coding assistants make assumptions about model capabilities. By default, many assume local models lack:

Tool calling support
Vision capabilities
Structured output formatting

This is actually reasonable - if they assumed every model had these capabilities, they’d break on models that don’t. But it means you must explicitly enable them.

Why Model Size Matters

Smaller models (under 7B parameters) typically lack the capacity to:

Remember tool definitions across long contexts
Generate precisely formatted outputs
Distinguish between “thinking” and “acting”

The 7B threshold isn’t arbitrary - it’s roughly the minimum size where these capabilities emerge reliably. Even then, not all 7B models are equal. Models specifically trained for agentic coding (like Devstral) will outperform general-purpose models of the same size.

Summary

Local LLM tool calling failures come down to three issues:

Wrong model: Using models not trained for tool calling
Missing configuration: Not explicitly enabling tool calling in your assistant
Excessive reasoning: Models overthinking instead of acting

The fix is straightforward:

Use a model explicitly trained for tool calling (Qwen 2.5 Coder 7B+, Devstral)
Explicitly enable tool calling capabilities in your assistant’s configuration
Adjust reasoning/thinking parameters if the model generates too much internal thought

With proper configuration, local LLMs become genuinely useful coding assistants - not just expensive autocomplete tools.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!