Are Flat-Rate AI Coding Subscriptions Really Unlimited? Understanding Throughput Caps in OpenCode Go

Jun 5, 2026

I recently ran into something that made me stop and think. I was running a full 9-phase agentic workflow with OpenCode Go, and somewhere around the second week of the billing cycle, the responses started slowing down. Not dramatically — just enough to notice. No error messages, no “you’ve hit your limit” banners. Just… friction.

I went looking for answers and found a Reddit thread where others were asking the same thing: “How do you not run out of go quota on the second day of the second week tho?”

Stacked bar chart showing cumulative token usage across 10 rounds of agent tool calls

The Marketing vs. The Architecture

Flat-rate LLM plans are marketed as “unlimited.” But as one commenter put it: “flat-rate plans still ration under the hood. that $12 holds until you hit the throughput cap nobody lists.”

The honest answer: flat-rate in LLMs is mostly a UX layer on top of throttled compute. It’s not really unlimited, it’s just that the limits are moved from “price per token” to “opaque throughput caps + rate shaping.”

For OpenCode Go’s $10/month plan, here’s what you actually get:

Models included: DS V4 Flash, DS V4 Pro, Kimi K2.6, GLM-5.1
The real constraint: throughput, not total tokens
Rate shaping: the system slows down rather than cutting you off

Why Agentic Workflows Hit the Cap Faster

The caps exist for everyone, but they become visible first in agentic workflows. Here’s why: an agentic session makes many more sequential tool calls than a chat session. Each call consumes a chunk of your throughput budget.

In OpenCode Go’s SDD workflow, different phases have very different throughput demands:

Phase         Impact    Why
sdd-init      Low       Mostly reading, minimal tool calls
sdd-explore   High      Rapid sequential searches, file reads, browser fetches
sdd-propose   Medium    Some tool calls for drafting
sdd-spec      Medium    Documentation lookups
sdd-design    Medium    Analysis, moderate tool usage
sdd-tasks     Low       Mostly planning, few external calls
sdd-apply     Medium    File edits, code generation
sdd-verify    Med-High  Builds, tests, lint runs
sdd-archive   Low       Summarization, clean-up

Running all 9 phases in one continuous session is the fastest way to exhaust your daily throughput budget.

AI agent loop diagram showing plan, execute, observe, reflect steps with token cost annotations

The Pace-Control Strategy

If you want to stay productive without hitting throttling, the trick is to spread heavy phases across time. Instead of running everything in one session:

# Bad: rapid sequential phases (maximizes throughput consumption)
sdd-init → sdd-explore → sdd-propose → sdd-spec → ... (all in one session)

# Good: paced execution with delays between agentic phases
sdd-init (morning) → [pause 2h] → sdd-explore (afternoon)
  → sdd-propose (next morning) → sdd-spec (next afternoon)
  → sdd-design → sdd-tasks (following day)
  → sdd-apply (morning) → [pause] → sdd-verify (afternoon)
  → sdd-archive (end of day)

You can also use the orchestrator to introduce explicit delays between phases. This doesn’t just help with rate limits — it gives you time to review intermediate outputs.

Cost Comparison: Flat-Rate vs. Pay-Per-Token

To understand whether the trade-off is worth it, here’s what the alternatives look like:

Single model approach (Claude Opus):   $200-400/month at 5-6h/day
Multi-model harness (OpenCode Go):     $12-15/month flat

Trade-off: throughput cap vs. no cap but pay-per-token

For typical development work (5-6 hours/day), the caps are generous enough that I haven’t noticed throttling in normal use. It only became visible when I ran fully automated agentic pipelines back-to-back.

Common Misconceptions

“Flat-rate means infinite concurrency” — No. The throughput cap limits how fast you can consume tokens, not how many total.
“All unlimited plans are the same” — Some plans are throughput-limited (OpenCode Go), others are token-capped. They feel different under load.
“You’ll get an error if you hit the cap” — Usually not. You’ll just get progressive slowdowns. The system rate-shapes rather than rejecting requests.

Wrapping Up

In this post, I explained why flat-rate AI coding plans like OpenCode Go ($10/month) use throughput-based rationing rather than truly unlimited compute. The caps are generous for typical work but become visible in heavy agentic sessions. The solution isn’t to avoid flat-rate plans — it’s to understand the architecture and pace your workflows accordingly.

Put simply: the $10 plan is “unlimited at human scale, not automation scale.”

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Reddit Discussion: r/opencode - Flat-rate LLM limits

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!