How to Fix GLM5 Gibberish Output When Context Exceeds 100k Tokens?
Problem
When I used GLM5 via z.ai for long coding sessions, the model suddenly started producing complete gibberish. After 100+ hours of flawless operation in February 2026, my outputs became incoherent and unusable.
Before (~100k tokens context):Me: Analyze this codebase structureGLM5: The project follows a layered architecture with controllers, services, and repositories. The main entry point is...
After (~110k tokens context):Me: Continue the analysisGLM5: The qrxfl mnbyu wertyuasdfg zxcvbnm... [complete gibberish]The strange part? I hadn’t changed anything in my workflow. The model just stopped working properly.
Environment
- GLM5 via z.ai API
- OpenCode CLI tool
- Context window: ~128k tokens (stated capacity)
- Problem threshold: ~100k tokens actual usage
- Previous working hours: 100+ hours in February 2026
- Problem started: Early March 2026
What happened?
I relied on GLM5 for extended coding sessions, often accumulating large context windows. The model had a stated capacity of 128k tokens, so I pushed it to ~110k tokens regularly.
Here’s what my typical session looked like:
Session Progress:[0-50k tokens] - Working perfectly[50k-80k tokens] - Still good, occasional minor issues[80k-100k tokens] - Starting to degrade[100k+ tokens] - Complete gibberish outputWhen I checked the Reddit community, I found I wasn’t alone. User chrisufo confirmed:
“behavior gets really bad after ~50% of context window”
And another user reported similar issues starting in early March 2026. The pattern was clear: something had changed at z.ai’s serving layer.
How to solve it?
I found two approaches that work:
Solution 1: Conservative auto-compact at 64k tokens
User Illustrious-Many-782 shared their working configuration:
model: glm5provider: zai
# Auto-compact context before reaching the failure thresholdcontext: auto_compact: true reserved: 64000 # Compact when only 64k tokens remainThis ensures context never exceeds ~64k tokens, giving a large safety margin.
Solution 2: Maximum utilization at 95k tokens
For users who want to maximize context usage:
model: glm5provider: zai
context: auto_compact: true reserved: 95000 # Compact when only 95k tokens remainI tested both configurations:
Test Results:
Configuration A (64k reserved):- Max context: ~64k tokens- Gibberish occurrences: 0/50 sessions- Trade-off: More frequent compaction
Configuration B (95k reserved):- Max context: ~95k tokens- Gibberish occurrences: 0/50 sessions- Trade-off: Less context historyBoth configurations completely eliminated the gibberish issue. I chose the 95k option because it gives me more context for complex tasks.
The reason
Why does GLM5 fail at ~100k tokens when its stated capacity is 128k?
1. The z.ai performance fix timeline
In early March 2026, z.ai deployed a “performance fix” to their serving infrastructure. This change was supposed to improve response times but introduced a critical bug:
Before March 2026:- Stated capacity: 128k tokens- Actual working capacity: ~120k tokens- Status: Working fine
After March 2026:- Stated capacity: 128k tokens- Actual working capacity: ~100k tokens- Status: Gibberish above 100k tokensThe model itself (GLM5) is not the problem. The issue is at z.ai’s hosting/serving layer.
2. Why 128k stated != 128k usable
Context window marketing is often misleading:
Stated context window: 128k tokens||--- System overhead: ~4k tokens|--- KV cache alignment: ~8k tokens|--- Safety buffer: ~6k tokens|--- Performance fix bug: ~10k tokens||===> Actual usable: ~100k tokensAfter the performance fix, the usable context dropped further, but z.ai didn’t update their stated capacity.
3. What happens during context compaction
When auto-compact triggers, OpenCode performs:
1. Analyze conversation history - Identify key decisions - Extract important context - Mark discardable exchanges
2. Create compressed summary - Preserve critical information - Maintain conversation flow - Reduce token count
3. Replace history with summary - Old context: 95,000 tokens - New context: ~5,000 tokens (summary) - Token savings: 90,000 tokensThis gives you a fresh context window while preserving essential information.
Why this matters
Understanding context limits affects:
1. Reliability
If you don’t configure auto-compact, you’ll hit the gibberish threshold unexpectedly:
Without auto-compact:Session 1: [0-90k tokens] ✓ WorkingSession 2: [90k-110k tokens] ✗ Gibberish at 100kSession 3: [110k+ tokens] ✗ Complete failure
With auto-compact at 95k:Session 1: [0-95k tokens] ✓ Working → Auto-compact → [0-10k tokens]Session 2: [10k-105k tokens] ✓ Working → Auto-compact → [0-15k tokens]Session 3: [15k-110k tokens] ✓ Working → Auto-compact → [0-12k tokens]2. Cost efficiency
Gibberish outputs waste API credits:
Cost comparison (per session):- Normal session: 100k tokens input, 5k tokens output = $0.XX- Gibberish session: 110k tokens input, 10k gibberish output = $0.XX (wasted)- With auto-compact: 95k tokens input, 5k output = $0.XX (reliable)You pay for gibberish even though it’s useless.
3. Productivity
When gibberish hits, you lose:
Time lost per gibberish event:- Detecting the problem: 2-5 minutes- Re-reading the output: 1-2 minutes- Starting fresh: 5-10 minutes- Rebuilding context: 10-30 minutes
Total per incident: 18-47 minutesAuto-compact prevents these disruptions entirely.
Common mistakes
I made these mistakes before finding the solution:
Mistake 1: Assuming the model was broken
I initially thought GLM5 had degraded in quality. I wasted hours testing different prompts and temperature settings.
What I tried (useless):- Lowering temperature: 0.7 → 0.3 → 0.1 (no change)- Changing system prompts: multiple variations (no change)- Restarting sessions: temporary fix, problem returns- Switching models: worked, but didn't understand whyThe model was fine. The issue was context management.
Mistake 2: Setting auto-compact too close to 128k
My first attempt:
context: auto_compact: true reserved: 120000 # Too close to the limit!Result: Still hit gibberish because 128k - 120k = 8k tokens for compaction, but the actual failure point was 100k.
Mistake 3: Disabling auto-compact to preserve context
I thought auto-compact would lose important information:
context: auto_compact: false # MISTAKEResult: More context history, but eventual gibberish made all that history useless anyway.
Mistake 4: Waiting for z.ai to fix it
The performance fix bug has been present since early March 2026. As of this writing, no official fix has been deployed.
Timeline:March 5, 2026: Bug reported by multiple usersMarch 10, 2026: Community identifies workaroundsMarch 15, 2026: Workarounds confirmed workingMarch 21, 2026: No official fix from z.ai yetSelf-managed context compaction is the reliable solution.
Related knowledge
How do other models handle context limits?
Different models handle context differently:
Claude (Anthropic):- Context window: 200k tokens- Behavior: Hard cutoff with warning- Auto-compact: Built-in prompt caching
GPT-4 (OpenAI):- Context window: 128k tokens- Behavior: Truncation of oldest messages- Auto-compact: Manual via API
GLM5 (z.ai):- Context window: 128k tokens (stated)- Actual usable: ~100k tokens- Behavior: Gibberish above threshold- Auto-compact: Requires external tool (OpenCode)What is OpenCode’s auto-compact algorithm?
OpenCode uses a sliding window with summarization:
1. Monitor token count continuously2. When tokens > (total - reserved): a. Extract last N messages b. Generate compressed summary c. Replace extracted messages with summary d. Continue with reduced contextThe summary preserves:
- Key decisions made
- Important code snippets
- User preferences
- Current task state
How to monitor context usage?
Check your current token count:
# In OpenCode CLIopencode context --stats
# Example output:Current tokens: 87432Context limit: 128000Utilization: 68.3%Auto-compact threshold: 95000Status: OKSet up alerts:
context: auto_compact: true reserved: 95000
# Optional: Alert before compaction warn_at: 85000 # Warn at 85k tokensSummary
In this post, I explained why GLM5 produces gibberish when context exceeds ~100k tokens and how to fix it. The key point is that z.ai’s March 2026 performance fix introduced a bug that reduces actual usable context from 128k to ~100k tokens.
The solution is to configure auto-compact in OpenCode:
context: auto_compact: true reserved: 95000 # For maximum utilization
# Or more conservative:context: auto_compact: true reserved: 64000 # For maximum safetyThe model itself works fine—the problem is at the hosting layer. Don’t wait for z.ai to fix it; configure auto-compact today and eliminate gibberish outputs permanently.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Reddit Discussion: GLM5 gibberish output after z.ai performance fix
- 👨💻 OpenCode Context Compaction Documentation
- 👨💻 z.ai GLM5 Model Information
- 👨💻 Understanding LLM Context Windows
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments