How to Set Up GLM-5.1 + Qwen 3.5 for Planning and Execution Roles

Apr 20, 2026

When I started building AI-powered applications, I hit a wall. Frontier models like Claude or GPT-4 delivered great results, but the costs added up fast. Smaller open-source models were cheaper, but they struggled with complex reasoning tasks.

AI Workflow Setup

I needed a way to get the best of both worlds: deep reasoning when it matters, and fast cheap execution for everything else.

The Core Problem

Using a single model for all tasks is wasteful. Planning needs deep reasoning—breaking down problems, considering edge cases, designing architecture. Execution needs speed and consistency—generating code, processing data, making API calls.

When I ran everything through a frontier model, I burned budget on tasks that didn’t need that level of intelligence. When I used only smaller models, complex planning suffered.

The Solution: Role-Based Model Routing

I found a pattern that works: route planning tasks to GLM-5.1 and execution tasks to Qwen 3.5. Both run through Ollama Cloud, so I don’t need local GPU hardware.

Here’s why this combination clicks:

GLM-5.1 (Planning Role):

Handles complex reasoning and task decomposition
Achieves ~94.6% of Claude Opus 4.6’s coding benchmark performance
Slower but smarter—exactly what planning needs

Qwen 3.5 (Execution Role):

Apache 2.0 licensed (fully open source, commercial-friendly)
Near-frontier performance on agentic tasks
Fast and cost-effective for high-volume operations

Setting Up Ollama Cloud

First, I installed Ollama and pulled the models with the :cloud suffix. This routes requests through ollama.com instead of running locally.

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull models with cloud suffix
ollama pull glm-5.1:cloud
ollama pull qwen-3.5:cloud

# Verify cloud routing is active
ollama list | grep cloud

The :cloud suffix is critical. Without it, Ollama tries to run models locally, which requires GPU resources I don’t have on my laptop.

Configuring Role-Based Routing

I use oh-my-pi to define which model handles which role. The configuration is straightforward:

models:
  slow/plan:
    model: glm-5.1:cloud
    temperature: 0.3
    max_tokens: 4096

  default:
    model: qwen-3.5:cloud
    temperature: 0.7
    max_tokens: 2048

routing:
  planning_tasks:
    - task_decomposition
    - strategy_design
    - code_review_planning
    - architecture_decisions

  execution_tasks:
    - code_generation
    - data_processing
    - api_calls
    - routine_operations

The slow/plan role uses GLM-5.1 with lower temperature for more deterministic output. The default role uses Qwen 3.5 with higher temperature for creative execution.

Using the Pipeline in Code

Here’s how I use this setup in a Python project:

from oh_my_pi import Agent

# Planning phase uses GLM-5.1
planner = Agent(role="slow/plan")
plan = planner.run("Design a REST API for user management")

# Execution phase uses Qwen 3.5
executor = Agent(role="default")
result = executor.run(f"Implement: {plan}")

The planner breaks down the problem, considers edge cases, and produces a detailed plan. The executor takes that plan and generates working code.

What I Learned

After running this setup for a few weeks, a few patterns emerged:

Cost dropped significantly. I only pay for deep reasoning on tasks that actually need it. Routine code generation, data processing, and API calls run through the cheaper Qwen 3.5.

Quality stayed high. GLM-5.1’s planning quality rivals frontier models. The plans it produces are thorough and well-structured.

Flexibility improved. When better models come out, I swap them in the config. No code changes needed.

Mistakes I Made

A few things I got wrong at first:

Using GLM-5.1 for everything. The first week, I routed all tasks through the planner. Costs went up, not down. Be selective about what needs deep reasoning.

Skipping the :cloud suffix. Requests failed silently because I forgot the suffix. Always double-check the model names.

Not defining clear role boundaries. Initially, I had vague rules about which role to use. The routing became inconsistent. Explicit task lists for each role fixed this.

When to Use This Setup

This pattern works well when:

You have a mix of planning-heavy and execution-heavy tasks
Cost optimization matters
You want open-source flexibility for some components
Your tasks can be clearly categorized

It’s overkill for simple projects with uniform task types. If all your tasks are similar, a single model is easier to manage.

Getting Started

Install Ollama and pull both models with :cloud suffix
Set up oh-my-pi with the routing configuration above
Identify which of your tasks are planning vs. execution
Test with a few representative tasks
Adjust the routing rules based on results

The architecture scales well. As new models emerge, you can swap them in without rebuilding your pipeline.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Ollama Cloud
👨‍💻 GLM-5.1 Model
👨‍💻 Qwen 3.5 Model

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!