Skip to content

How to Set Up GLM-5.1 + Qwen 3.5 for Planning and Execution Roles

When I started building AI-powered applications, I hit a wall. Frontier models like Claude or GPT-4 delivered great results, but the costs added up fast. Smaller open-source models were cheaper, but they struggled with complex reasoning tasks.

AI Workflow Setup

I needed a way to get the best of both worlds: deep reasoning when it matters, and fast cheap execution for everything else.

The Core Problem

Using a single model for all tasks is wasteful. Planning needs deep reasoning—breaking down problems, considering edge cases, designing architecture. Execution needs speed and consistency—generating code, processing data, making API calls.

When I ran everything through a frontier model, I burned budget on tasks that didn’t need that level of intelligence. When I used only smaller models, complex planning suffered.

The Solution: Role-Based Model Routing

I found a pattern that works: route planning tasks to GLM-5.1 and execution tasks to Qwen 3.5. Both run through Ollama Cloud, so I don’t need local GPU hardware.

Here’s why this combination clicks:

GLM-5.1 (Planning Role):

  • Handles complex reasoning and task decomposition
  • Achieves ~94.6% of Claude Opus 4.6’s coding benchmark performance
  • Slower but smarter—exactly what planning needs

Qwen 3.5 (Execution Role):

  • Apache 2.0 licensed (fully open source, commercial-friendly)
  • Near-frontier performance on agentic tasks
  • Fast and cost-effective for high-volume operations

Setting Up Ollama Cloud

First, I installed Ollama and pulled the models with the :cloud suffix. This routes requests through ollama.com instead of running locally.

Install and configure Ollama Cloud
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull models with cloud suffix
ollama pull glm-5.1:cloud
ollama pull qwen-3.5:cloud
# Verify cloud routing is active
ollama list | grep cloud

The :cloud suffix is critical. Without it, Ollama tries to run models locally, which requires GPU resources I don’t have on my laptop.

Configuring Role-Based Routing

I use oh-my-pi to define which model handles which role. The configuration is straightforward:

oh-my-pi routing configuration
models:
slow/plan:
model: glm-5.1:cloud
temperature: 0.3
max_tokens: 4096
default:
model: qwen-3.5:cloud
temperature: 0.7
max_tokens: 2048
routing:
planning_tasks:
- task_decomposition
- strategy_design
- code_review_planning
- architecture_decisions
execution_tasks:
- code_generation
- data_processing
- api_calls
- routine_operations

The slow/plan role uses GLM-5.1 with lower temperature for more deterministic output. The default role uses Qwen 3.5 with higher temperature for creative execution.

Using the Pipeline in Code

Here’s how I use this setup in a Python project:

Using role-based agents
from oh_my_pi import Agent
# Planning phase uses GLM-5.1
planner = Agent(role="slow/plan")
plan = planner.run("Design a REST API for user management")
# Execution phase uses Qwen 3.5
executor = Agent(role="default")
result = executor.run(f"Implement: {plan}")

The planner breaks down the problem, considers edge cases, and produces a detailed plan. The executor takes that plan and generates working code.

What I Learned

After running this setup for a few weeks, a few patterns emerged:

Cost dropped significantly. I only pay for deep reasoning on tasks that actually need it. Routine code generation, data processing, and API calls run through the cheaper Qwen 3.5.

Quality stayed high. GLM-5.1’s planning quality rivals frontier models. The plans it produces are thorough and well-structured.

Flexibility improved. When better models come out, I swap them in the config. No code changes needed.

Mistakes I Made

A few things I got wrong at first:

Using GLM-5.1 for everything. The first week, I routed all tasks through the planner. Costs went up, not down. Be selective about what needs deep reasoning.

Skipping the :cloud suffix. Requests failed silently because I forgot the suffix. Always double-check the model names.

Not defining clear role boundaries. Initially, I had vague rules about which role to use. The routing became inconsistent. Explicit task lists for each role fixed this.

When to Use This Setup

This pattern works well when:

  • You have a mix of planning-heavy and execution-heavy tasks
  • Cost optimization matters
  • You want open-source flexibility for some components
  • Your tasks can be clearly categorized

It’s overkill for simple projects with uniform task types. If all your tasks are similar, a single model is easier to manage.

Getting Started

  1. Install Ollama and pull both models with :cloud suffix
  2. Set up oh-my-pi with the routing configuration above
  3. Identify which of your tasks are planning vs. execution
  4. Test with a few representative tasks
  5. Adjust the routing rules based on results

The architecture scales well. As new models emerge, you can swap them in without rebuilding your pipeline.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments