Why Does My Local LLM Fail at Tool Calling for Coding?
Problem
“The model didn’t know how to use the tools for coding.”
That’s the error I kept getting when trying to use Roo Code with my local LLM. After spending time setting up Ollama and pulling Qwen Coder, I expected it to just work. Instead, Roo Code was “pretty much useless” - it couldn’t perform basic coding tasks because the model didn’t know how to use tools.
I tried Continue.dev next, hoping for better results. It was slightly better but still failed at basic tasks - confusing directory creation with file creation, throwing errors on anything complex. Both assistants were essentially glorified autocomplete tools.
Environment
Here’s what I was working with:
- Ollama: 0.5.x running locally
- Models tested: Qwen 2.5 Coder (7B), Devstral
- Coding assistants: Roo Code, Cline, Continue.dev
- VS Code: Latest version
- OS: macOS (Apple Silicon)
What Happened
I had set up Ollama and pulled what I thought were capable models:
ollama pull qwen2.5-coder:7bThen I configured Roo Code to use my local model. The extension connected successfully, and I could see the model responding in the chat. But when I tried to perform actual coding tasks:
Error: The model didn't know how to use the tools for codingI thought maybe I picked the wrong model. I tried Devstral, specifically advertised as an agentic coding model. Same result. I even tried smaller models like Qwen 2.5 Coder 3B, thinking maybe the smaller footprint would help.
Nothing worked.
Continue.dev was marginally better - it could respond to chat prompts. But when I asked it to create a file:
Created directory: /src/utils.tsIt created a directory instead of a file. And when I tried more complex tasks like refactoring across multiple files? Complete failure with cryptic errors.
How to Solve It
Step 1: Use a Model That Actually Supports Tool Calling
The first revelation came from a Reddit comment: “You’re having issues with tool calling because small and old models weren’t trained on tool usage.”
Not all models are created equal. Tool calling (also called function calling) is a specific capability that requires training. I needed to verify my chosen model actually had this capability.
# Verify tool calling support in Ollamaollama show qwen2.5-coder:7b --modelfileThe key insight: models under 7B parameters rarely have good tool calling support. I needed at least 7B, and ideally a model specifically fine-tuned for agentic use.
Step 2: Explicitly Enable Tool Calling in Configuration
This was my “aha” moment. Another comment pointed out: “With Roo or Cline you might have to tell the app the model has tool calling and vision capabilities.”
Coding assistants don’t automatically detect model capabilities. They often default to assuming the model lacks tool calling support. I had to explicitly configure it.
For Roo Code:
{ "models": { "qwen-coder": { "provider": "ollama", "modelId": "qwen2.5-coder:7b", "capabilities": { "toolCalling": true, "vision": false } } }}For Cline:
{ "apiProvider": "ollama", "modelId": "devstral:latest", "supportsFunctions": true, "supportsImages": false}For Continue.dev:
models: - name: "Qwen Coder Local" provider: "ollama" model: "qwen2.5-coder:7b" roles: - "chat" - "edit" capabilities: tool_use: trueStep 3: Disable Excessive Reasoning
After enabling tool calling, I hit another issue. The model would generate pages of reasoning tokens before actually making a tool call, or sometimes skip the tool call entirely.
One user described a similar debugging journey: “With Roo, I only got it working by stopping the excessive thinking. It took me a couple of days working with Claude troubleshooting to get VSCode + Roo + Qwen Coder Next to all work together.”
The fix was to adjust the model’s reasoning parameters:
# Create a custom model with reduced thinkingollama create qwen-coder-tools -f - <<EOFFROM qwen2.5-coder:7bPARAMETER temperature 0.3PARAMETER num_ctx 32768PARAMETER stop "<|im_start|>"PARAMETER stop "<|im_end|>"EOFOr in the assistant configuration, limit the thinking budget:
{ "models": { "qwen-coder": { "provider": "ollama", "modelId": "qwen2.5-coder:7b", "capabilities": { "toolCalling": true }, "thinking": { "enabled": false } } }}Alternative: Use a Different Assistant
If you’re still struggling, consider assistants designed from the ground up for local models:
- OpenCode: Designed for local Ollama servers
- Codex: Supports local inference
- ClaudeCode: Can work with local Ollama servers
These assistants have better defaults for local models and don’t assume OpenAI-style capabilities by default.
The Reason
Why Tool Calling Isn’t Universal
Tool calling requires the model to:
- Understand structured tool definitions (JSON schemas)
- Output correctly formatted tool calls in specific formats
- Handle multi-step tool sequences without losing context
This isn’t something every model can do. It requires specific training data that includes tool usage examples. Many local models are fine-tuned for:
- Chat conversations
- Code completion
- Instruction following
But not specifically for agentic tool use. The model needs to understand that when given a tool definition like:
{ "name": "create_file", "parameters": { "path": "string", "content": "string" }}It should output something like:
{ "tool": "create_file", "arguments": { "path": "/src/utils.ts", "content": "export function hello() {}" }}Without specific training, the model might:
- Ignore the tool definition entirely
- Output malformed JSON
- Generate reasoning text instead of tool calls
- Confuse tool calling with regular chat
Why Configuration Matters
Coding assistants make assumptions about model capabilities. By default, many assume local models lack:
- Tool calling support
- Vision capabilities
- Structured output formatting
This is actually reasonable - if they assumed every model had these capabilities, they’d break on models that don’t. But it means you must explicitly enable them.
Why Model Size Matters
Smaller models (under 7B parameters) typically lack the capacity to:
- Remember tool definitions across long contexts
- Generate precisely formatted outputs
- Distinguish between “thinking” and “acting”
The 7B threshold isn’t arbitrary - it’s roughly the minimum size where these capabilities emerge reliably. Even then, not all 7B models are equal. Models specifically trained for agentic coding (like Devstral) will outperform general-purpose models of the same size.
Summary
Local LLM tool calling failures come down to three issues:
- Wrong model: Using models not trained for tool calling
- Missing configuration: Not explicitly enabling tool calling in your assistant
- Excessive reasoning: Models overthinking instead of acting
The fix is straightforward:
- Use a model explicitly trained for tool calling (Qwen 2.5 Coder 7B+, Devstral)
- Explicitly enable tool calling capabilities in your assistant’s configuration
- Adjust reasoning/thinking parameters if the model generates too much internal thought
With proper configuration, local LLMs become genuinely useful coding assistants - not just expensive autocomplete tools.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments