How to Make OpenClaw Work Reliably: Sandbox Modes and Approval Chains Explained
I spent weeks fighting OpenClaw. My agents would run fine one day, then completely fall apart the next. Sometimes they’d execute the wrong tool. Other times they’d get stuck in loops asking for approvals I’d already granted. The unpredictability was killing my automation dreams.
Then I finally understood what was happening under the hood.
The problem wasn’t OpenClaw itself—it was my complete misunderstanding of how sandbox modes and approval chains actually work. Once I grasped these two mechanisms, my workflows transformed from fragile experiments into reliable daily automation.
Let me walk you through what I learned, including all the mistakes I made along the way.
The Core Problem: Mixed Execution Models
I kept treating OpenClaw like a simple script runner. Write some instructions, point it at tools, let it rip. That approach works for about five minutes, then everything falls apart.
Here’s what I didn’t understand: OpenClaw actually runs two completely different types of operations:
- Deterministic operations - Message delivery, channel bindings, state management. These are 100% predictable and have nothing to do with LLM reasoning.
- Reasoning operations - Agent decision-making, tool selection, output generation. These involve LLM inference and have inherent variability.
┌─────────────────────────────────────────────────────────┐│ OpenClaw Workflow │├─────────────────────────────────────────────────────────┤│ ││ Deterministic Layer Reasoning Layer ││ ┌─────────────────┐ ┌─────────────────┐ ││ │ Message Delivery │ │ Tool Selection │ ││ │ Channel Bindings │◄────────►│ Output Gen │ ││ │ State Management │ │ Decision Making │ ││ └─────────────────┘ └─────────────────┘ ││ ▲ ▲ ││ │ │ ││ 100% Predictable LLM Variability ││ │└─────────────────────────────────────────────────────────┘The insight from Reddit that clicked for me:
“Message delivery and channel bindings have nothing to do with agent reasoning either, they’re supposed to be 100% deterministic”
When I started structuring my workflows so critical operations used deterministic mechanisms—and reserved agent reasoning for genuine decision points—everything changed.
Sandbox Modes: Your Isolation Layer
OpenClaw’s sandbox modes control how isolated your agent is from the outside world. I initially thought this was just about security. It’s actually about reliability.
Full Sandbox (Maximum Isolation)
I started here because it seemed “safest.” My agent ran in complete isolation with no external access.
sandbox: mode: full permissions: network: none filesystem: read-only tools: []What I learned: Full sandbox is great for testing and debugging, but useless for production. My agent couldn’t do anything. I’d ask it to process a file, and it would fail because it couldn’t read the file system.
Use full sandbox for:
- Testing new workflows before deployment
- Processing untrusted input
- Debugging why an agent behaves unexpectedly
Avoid full sandbox for:
- Any production workflow that needs to actually do something
Partial Sandbox (The Sweet Spot)
This is where I found the balance. The agent has controlled access to specific tools and resources.
sandbox: mode: partial permissions: network: allow: - api.example.com - storage.googleapis.com filesystem: allow: - /data/input/ - /data/output/ tools: - file_reader - api_clientThe mistake I made: I initially tried to grant permissions broadly, thinking more access meant more flexibility. Wrong. Each permission I added created another failure point.
What actually worked: I started with zero permissions and added them one by one as my workflow needed them. This made failures obvious and debugging straightforward.
No Sandbox (Maximum Flexibility)
I avoided this mode for months, thinking it was reckless. Then I realized: if you control the environment completely, no sandbox is fine.
sandbox: mode: none approval_chain: production-criticalThe catch: You need a rock-solid approval chain (more on that next). Without sandbox restrictions, your approval chain becomes your only safety net.
Approval Chains: Your Control Layer
Approval chains define the sequence of checks and confirmations before an agent executes an action. This is where I finally gained control over reliability.
My First Approval Chain (It Was Terrible)
approval_chain: default: auto_approve # What could go wrong?Everything went wrong. My agent made API calls I didn’t expect, modified files I didn’t want touched, and generally ran wild.
Learning to Love Manual Approval
approval_chain: default: require_approval rules: - match: tool: file_reader action: auto_approve - match: operation: read action: auto_approveThis worked but was annoying. I had to approve every little thing. Good for learning, bad for production.
The Breakthrough: Conditional Approvals
The moment everything clicked was when I understood that I could layer approvals based on operation type.
approval_chain: name: production-reliable
rules: # Deterministic operations: auto-approve - match: type: message_delivery action: auto_approve
- match: type: channel_binding action: auto_approve
# Safe read operations: auto-approve - match: operation: read source: internal action: auto_approve
# External API calls: require approval - match: type: api_call external: true action: require_approval
# File modifications: require approval with context - match: operation: write action: require_approval message: "File modification: {filename}"
# Network requests outside whitelist: require approval - match: network_request: not_in_whitelist: true action: require_approvalThe pattern I follow now:
- Auto-approve deterministic operations - Message delivery, channel bindings, state reads
- Auto-approve safe reads - Internal data access that can’t cause side effects
- Require approval for:
- External API calls
- File modifications
- Network requests outside whitelist
- Any operation with side effects
My Configuration Strategy for Reliability
After months of trial and error, here’s the approach that finally works:
Step 1: Start Small, Build Custom Skills
The ClawHub ecosystem is tempting. Pre-built skills ready to drop in. I tried that route first.
Week 1: "Wow, so many skills available!"Week 2: "This skill doesn't quite do what I need..."Week 3: "Why is this making unexpected API calls?"Week 4: "I'm spending more time debugging than building."The Reddit wisdom that finally sank in:
“The skill ecosystem is rough though. Treat ClawHub stuff as a starting point you need to audit, not a drop-in”
“Building small skills from scratch works better”
Now I build small, focused skills from scratch:
skill: name: file_processor operations: - read_file - process_content - write_output approval_needs: - write_outputStep 2: Use Deterministic Operations for Critical Paths
This was the key insight. Any operation that absolutely must work should use deterministic mechanisms.
Critical Path: 1. Message routing → Deterministic (auto-approve) 2. State management → Deterministic (auto-approve) 3. Data transformation → Reasoning (needs approval) 4. External API call → Reasoning (needs approval) 5. Result delivery → Deterministic (auto-approve)The deterministic layers are 100% reliable. The reasoning layers are where I apply approval chains.
Step 3: Layer Your Approval Chains
I used to have one approval chain for everything. Now I have three:
┌─────────────────────────────────────────────────────────┐│ Approval Chain Layers │├─────────────────────────────────────────────────────────┤│ ││ Layer 1: Testing ││ - Everything requires approval ││ - Maximum visibility into agent behavior ││ - Purpose: Learn, debug, validate ││ ││ Layer 2: Staging ││ - Auto-approve safe operations ││ - Require approval for external calls ││ - Purpose: Validate workflow logic ││ ││ Layer 3: Production ││ - Auto-approve known-safe patterns ││ - Require approval only for anomalies ││ - Purpose: Run reliably with minimal interruption ││ │└─────────────────────────────────────────────────────────┘What Actually Works in Production
I now run daily automation on OpenClaw with the core loop working reliably. Here’s my actual configuration (simplified):
workflow: name: daily_data_sync
sandbox: mode: partial permissions: network: allow: - api.internal.company.com - storage.googleapis.com/my-bucket filesystem: allow: - /data/input/ - /data/output/
approval_chain: name: production-sync
rules: # All deterministic ops: auto-approve - match: type: [message_delivery, channel_binding, state_read] action: auto_approve
# Internal API: auto-approve (whitelisted) - match: api_host: api.internal.company.com action: auto_approve
# External storage: log and auto-approve - match: api_host: storage.googleapis.com action: type: auto_approve log: true
# File writes: require approval - match: operation: write action: require_approval
# Everything else: require approval - match: default: true action: require_approvalThis runs daily without issues. When something does go wrong, the approval chain catches it before damage is done.
Common Mistakes I Made
Mistake 1: Fighting Variability Instead of Controlling It
I kept trying to make LLM reasoning 100% predictable. That’s impossible by design. The breakthrough was accepting variability in reasoning but ensuring critical paths used deterministic operations.
Mistake 2: Over-permissioning the Sandbox
More permissions ≠ more capability. Each permission is a failure point. Start with nothing, add only what you need.
Mistake 3: Under-specifying Approval Chains
Vague approval rules lead to unpredictable behavior. Be explicit about what gets approved automatically and what needs human oversight.
Mistake 4: Using Third-Party Skills Blindly
Every ClawHub skill I used required auditing and modification. The time I saved was lost to debugging. Custom skills from scratch took longer upfront but worked reliably.
The Bottom Line
OpenClaw’s reliability isn’t magic—it’s configuration. Understanding the execution model, choosing the right sandbox mode, and building thoughtful approval chains transforms unpredictable agent behavior into deterministic, production-ready automation.
The key principles:
- Separate deterministic from reasoning operations - Structure workflows so critical paths use deterministic mechanisms
- Choose sandbox mode based on environment control - Partial sandbox for most production work
- Layer approval chains - Different chains for testing, staging, and production
- Build custom skills - Third-party skills are starting points, not drop-in solutions
Once I understood these mechanisms, OpenClaw went from frustrating to reliable. The core loop does work—it just needs proper configuration.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments