Skip to content

How to Fix GLM5 Gibberish Output When Context Exceeds 100k Tokens?

Problem

When I used GLM5 via z.ai for long coding sessions, the model suddenly started producing complete gibberish. After 100+ hours of flawless operation in February 2026, my outputs became incoherent and unusable.

Before (~100k tokens context):
Me: Analyze this codebase structure
GLM5: The project follows a layered architecture with controllers,
services, and repositories. The main entry point is...
After (~110k tokens context):
Me: Continue the analysis
GLM5: The qrxfl mnbyu wertyuasdfg zxcvbnm... [complete gibberish]

The strange part? I hadn’t changed anything in my workflow. The model just stopped working properly.

Environment

  • GLM5 via z.ai API
  • OpenCode CLI tool
  • Context window: ~128k tokens (stated capacity)
  • Problem threshold: ~100k tokens actual usage
  • Previous working hours: 100+ hours in February 2026
  • Problem started: Early March 2026

What happened?

I relied on GLM5 for extended coding sessions, often accumulating large context windows. The model had a stated capacity of 128k tokens, so I pushed it to ~110k tokens regularly.

Here’s what my typical session looked like:

Session Progress:
[0-50k tokens] - Working perfectly
[50k-80k tokens] - Still good, occasional minor issues
[80k-100k tokens] - Starting to degrade
[100k+ tokens] - Complete gibberish output

When I checked the Reddit community, I found I wasn’t alone. User chrisufo confirmed:

“behavior gets really bad after ~50% of context window”

And another user reported similar issues starting in early March 2026. The pattern was clear: something had changed at z.ai’s serving layer.

How to solve it?

I found two approaches that work:

Solution 1: Conservative auto-compact at 64k tokens

User Illustrious-Many-782 shared their working configuration:

~/.config/opencode/config.yaml
model: glm5
provider: zai
# Auto-compact context before reaching the failure threshold
context:
auto_compact: true
reserved: 64000 # Compact when only 64k tokens remain

This ensures context never exceeds ~64k tokens, giving a large safety margin.

Solution 2: Maximum utilization at 95k tokens

For users who want to maximize context usage:

~/.config/opencode/config.yaml
model: glm5
provider: zai
context:
auto_compact: true
reserved: 95000 # Compact when only 95k tokens remain

I tested both configurations:

Test Results:
Configuration A (64k reserved):
- Max context: ~64k tokens
- Gibberish occurrences: 0/50 sessions
- Trade-off: More frequent compaction
Configuration B (95k reserved):
- Max context: ~95k tokens
- Gibberish occurrences: 0/50 sessions
- Trade-off: Less context history

Both configurations completely eliminated the gibberish issue. I chose the 95k option because it gives me more context for complex tasks.

The reason

Why does GLM5 fail at ~100k tokens when its stated capacity is 128k?

1. The z.ai performance fix timeline

In early March 2026, z.ai deployed a “performance fix” to their serving infrastructure. This change was supposed to improve response times but introduced a critical bug:

Before March 2026:
- Stated capacity: 128k tokens
- Actual working capacity: ~120k tokens
- Status: Working fine
After March 2026:
- Stated capacity: 128k tokens
- Actual working capacity: ~100k tokens
- Status: Gibberish above 100k tokens

The model itself (GLM5) is not the problem. The issue is at z.ai’s hosting/serving layer.

2. Why 128k stated != 128k usable

Context window marketing is often misleading:

Stated context window: 128k tokens
|
|--- System overhead: ~4k tokens
|--- KV cache alignment: ~8k tokens
|--- Safety buffer: ~6k tokens
|--- Performance fix bug: ~10k tokens
|
|===> Actual usable: ~100k tokens

After the performance fix, the usable context dropped further, but z.ai didn’t update their stated capacity.

3. What happens during context compaction

When auto-compact triggers, OpenCode performs:

1. Analyze conversation history
- Identify key decisions
- Extract important context
- Mark discardable exchanges
2. Create compressed summary
- Preserve critical information
- Maintain conversation flow
- Reduce token count
3. Replace history with summary
- Old context: 95,000 tokens
- New context: ~5,000 tokens (summary)
- Token savings: 90,000 tokens

This gives you a fresh context window while preserving essential information.

Why this matters

Understanding context limits affects:

1. Reliability

If you don’t configure auto-compact, you’ll hit the gibberish threshold unexpectedly:

Without auto-compact:
Session 1: [0-90k tokens] ✓ Working
Session 2: [90k-110k tokens] ✗ Gibberish at 100k
Session 3: [110k+ tokens] ✗ Complete failure
With auto-compact at 95k:
Session 1: [0-95k tokens] ✓ Working → Auto-compact → [0-10k tokens]
Session 2: [10k-105k tokens] ✓ Working → Auto-compact → [0-15k tokens]
Session 3: [15k-110k tokens] ✓ Working → Auto-compact → [0-12k tokens]

2. Cost efficiency

Gibberish outputs waste API credits:

Cost comparison (per session):
- Normal session: 100k tokens input, 5k tokens output = $0.XX
- Gibberish session: 110k tokens input, 10k gibberish output = $0.XX (wasted)
- With auto-compact: 95k tokens input, 5k output = $0.XX (reliable)

You pay for gibberish even though it’s useless.

3. Productivity

When gibberish hits, you lose:

Time lost per gibberish event:
- Detecting the problem: 2-5 minutes
- Re-reading the output: 1-2 minutes
- Starting fresh: 5-10 minutes
- Rebuilding context: 10-30 minutes
Total per incident: 18-47 minutes

Auto-compact prevents these disruptions entirely.

Common mistakes

I made these mistakes before finding the solution:

Mistake 1: Assuming the model was broken

I initially thought GLM5 had degraded in quality. I wasted hours testing different prompts and temperature settings.

What I tried (useless):
- Lowering temperature: 0.7 → 0.3 → 0.1 (no change)
- Changing system prompts: multiple variations (no change)
- Restarting sessions: temporary fix, problem returns
- Switching models: worked, but didn't understand why

The model was fine. The issue was context management.

Mistake 2: Setting auto-compact too close to 128k

My first attempt:

Failed configuration
context:
auto_compact: true
reserved: 120000 # Too close to the limit!

Result: Still hit gibberish because 128k - 120k = 8k tokens for compaction, but the actual failure point was 100k.

Mistake 3: Disabling auto-compact to preserve context

I thought auto-compact would lose important information:

What I tried
context:
auto_compact: false # MISTAKE

Result: More context history, but eventual gibberish made all that history useless anyway.

Mistake 4: Waiting for z.ai to fix it

The performance fix bug has been present since early March 2026. As of this writing, no official fix has been deployed.

Timeline:
March 5, 2026: Bug reported by multiple users
March 10, 2026: Community identifies workarounds
March 15, 2026: Workarounds confirmed working
March 21, 2026: No official fix from z.ai yet

Self-managed context compaction is the reliable solution.

How do other models handle context limits?

Different models handle context differently:

Claude (Anthropic):
- Context window: 200k tokens
- Behavior: Hard cutoff with warning
- Auto-compact: Built-in prompt caching
GPT-4 (OpenAI):
- Context window: 128k tokens
- Behavior: Truncation of oldest messages
- Auto-compact: Manual via API
GLM5 (z.ai):
- Context window: 128k tokens (stated)
- Actual usable: ~100k tokens
- Behavior: Gibberish above threshold
- Auto-compact: Requires external tool (OpenCode)

What is OpenCode’s auto-compact algorithm?

OpenCode uses a sliding window with summarization:

1. Monitor token count continuously
2. When tokens > (total - reserved):
a. Extract last N messages
b. Generate compressed summary
c. Replace extracted messages with summary
d. Continue with reduced context

The summary preserves:

  • Key decisions made
  • Important code snippets
  • User preferences
  • Current task state

How to monitor context usage?

Check your current token count:

Terminal window
# In OpenCode CLI
opencode context --stats
# Example output:
Current tokens: 87432
Context limit: 128000
Utilization: 68.3%
Auto-compact threshold: 95000
Status: OK

Set up alerts:

~/.config/opencode/config.yaml
context:
auto_compact: true
reserved: 95000
# Optional: Alert before compaction
warn_at: 85000 # Warn at 85k tokens

Summary

In this post, I explained why GLM5 produces gibberish when context exceeds ~100k tokens and how to fix it. The key point is that z.ai’s March 2026 performance fix introduced a bug that reduces actual usable context from 128k to ~100k tokens.

The solution is to configure auto-compact in OpenCode:

Recommended configuration
context:
auto_compact: true
reserved: 95000 # For maximum utilization
# Or more conservative:
context:
auto_compact: true
reserved: 64000 # For maximum safety

The model itself works fine—the problem is at the hosting layer. Don’t wait for z.ai to fix it; configure auto-compact today and eliminate gibberish outputs permanently.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments