Skip to content

How to Make OpenClaw Work Reliably: Sandbox Modes and Approval Chains Explained

I spent weeks fighting OpenClaw. My agents would run fine one day, then completely fall apart the next. Sometimes they’d execute the wrong tool. Other times they’d get stuck in loops asking for approvals I’d already granted. The unpredictability was killing my automation dreams.

Then I finally understood what was happening under the hood.

The problem wasn’t OpenClaw itself—it was my complete misunderstanding of how sandbox modes and approval chains actually work. Once I grasped these two mechanisms, my workflows transformed from fragile experiments into reliable daily automation.

Let me walk you through what I learned, including all the mistakes I made along the way.

The Core Problem: Mixed Execution Models

I kept treating OpenClaw like a simple script runner. Write some instructions, point it at tools, let it rip. That approach works for about five minutes, then everything falls apart.

Here’s what I didn’t understand: OpenClaw actually runs two completely different types of operations:

  1. Deterministic operations - Message delivery, channel bindings, state management. These are 100% predictable and have nothing to do with LLM reasoning.
  2. Reasoning operations - Agent decision-making, tool selection, output generation. These involve LLM inference and have inherent variability.
OpenClaw Execution Model
┌─────────────────────────────────────────────────────────┐
│ OpenClaw Workflow │
├─────────────────────────────────────────────────────────┤
│ │
│ Deterministic Layer Reasoning Layer │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Message Delivery │ │ Tool Selection │ │
│ │ Channel Bindings │◄────────►│ Output Gen │ │
│ │ State Management │ │ Decision Making │ │
│ └─────────────────┘ └─────────────────┘ │
│ ▲ ▲ │
│ │ │ │
│ 100% Predictable LLM Variability │
│ │
└─────────────────────────────────────────────────────────┘

The insight from Reddit that clicked for me:

“Message delivery and channel bindings have nothing to do with agent reasoning either, they’re supposed to be 100% deterministic”

When I started structuring my workflows so critical operations used deterministic mechanisms—and reserved agent reasoning for genuine decision points—everything changed.

Sandbox Modes: Your Isolation Layer

OpenClaw’s sandbox modes control how isolated your agent is from the outside world. I initially thought this was just about security. It’s actually about reliability.

Full Sandbox (Maximum Isolation)

I started here because it seemed “safest.” My agent ran in complete isolation with no external access.

full-sandbox-config.yaml
sandbox:
mode: full
permissions:
network: none
filesystem: read-only
tools: []

What I learned: Full sandbox is great for testing and debugging, but useless for production. My agent couldn’t do anything. I’d ask it to process a file, and it would fail because it couldn’t read the file system.

Use full sandbox for:

  • Testing new workflows before deployment
  • Processing untrusted input
  • Debugging why an agent behaves unexpectedly

Avoid full sandbox for:

  • Any production workflow that needs to actually do something

Partial Sandbox (The Sweet Spot)

This is where I found the balance. The agent has controlled access to specific tools and resources.

partial-sandbox-config.yaml
sandbox:
mode: partial
permissions:
network:
allow:
- api.example.com
- storage.googleapis.com
filesystem:
allow:
- /data/input/
- /data/output/
tools:
- file_reader
- api_client

The mistake I made: I initially tried to grant permissions broadly, thinking more access meant more flexibility. Wrong. Each permission I added created another failure point.

What actually worked: I started with zero permissions and added them one by one as my workflow needed them. This made failures obvious and debugging straightforward.

No Sandbox (Maximum Flexibility)

I avoided this mode for months, thinking it was reckless. Then I realized: if you control the environment completely, no sandbox is fine.

no-sandbox-config.yaml
sandbox:
mode: none
approval_chain: production-critical

The catch: You need a rock-solid approval chain (more on that next). Without sandbox restrictions, your approval chain becomes your only safety net.

Approval Chains: Your Control Layer

Approval chains define the sequence of checks and confirmations before an agent executes an action. This is where I finally gained control over reliability.

My First Approval Chain (It Was Terrible)

bad-approval-chain.yaml
approval_chain:
default: auto_approve # What could go wrong?

Everything went wrong. My agent made API calls I didn’t expect, modified files I didn’t want touched, and generally ran wild.

Learning to Love Manual Approval

manual-approval-chain.yaml
approval_chain:
default: require_approval
rules:
- match:
tool: file_reader
action: auto_approve
- match:
operation: read
action: auto_approve

This worked but was annoying. I had to approve every little thing. Good for learning, bad for production.

The Breakthrough: Conditional Approvals

The moment everything clicked was when I understood that I could layer approvals based on operation type.

conditional-approval-chain.yaml
approval_chain:
name: production-reliable
rules:
# Deterministic operations: auto-approve
- match:
type: message_delivery
action: auto_approve
- match:
type: channel_binding
action: auto_approve
# Safe read operations: auto-approve
- match:
operation: read
source: internal
action: auto_approve
# External API calls: require approval
- match:
type: api_call
external: true
action: require_approval
# File modifications: require approval with context
- match:
operation: write
action: require_approval
message: "File modification: {filename}"
# Network requests outside whitelist: require approval
- match:
network_request:
not_in_whitelist: true
action: require_approval

The pattern I follow now:

  1. Auto-approve deterministic operations - Message delivery, channel bindings, state reads
  2. Auto-approve safe reads - Internal data access that can’t cause side effects
  3. Require approval for:
    • External API calls
    • File modifications
    • Network requests outside whitelist
    • Any operation with side effects

My Configuration Strategy for Reliability

After months of trial and error, here’s the approach that finally works:

Step 1: Start Small, Build Custom Skills

The ClawHub ecosystem is tempting. Pre-built skills ready to drop in. I tried that route first.

My ClawHub Journey
Week 1: "Wow, so many skills available!"
Week 2: "This skill doesn't quite do what I need..."
Week 3: "Why is this making unexpected API calls?"
Week 4: "I'm spending more time debugging than building."

The Reddit wisdom that finally sank in:

“The skill ecosystem is rough though. Treat ClawHub stuff as a starting point you need to audit, not a drop-in”

“Building small skills from scratch works better”

Now I build small, focused skills from scratch:

minimal-skill.yaml
skill:
name: file_processor
operations:
- read_file
- process_content
- write_output
approval_needs:
- write_output

Step 2: Use Deterministic Operations for Critical Paths

This was the key insight. Any operation that absolutely must work should use deterministic mechanisms.

Reliability Layering
Critical Path:
1. Message routing → Deterministic (auto-approve)
2. State management → Deterministic (auto-approve)
3. Data transformation → Reasoning (needs approval)
4. External API call → Reasoning (needs approval)
5. Result delivery → Deterministic (auto-approve)

The deterministic layers are 100% reliable. The reasoning layers are where I apply approval chains.

Step 3: Layer Your Approval Chains

I used to have one approval chain for everything. Now I have three:

Approval Chain Strategy
┌─────────────────────────────────────────────────────────┐
│ Approval Chain Layers │
├─────────────────────────────────────────────────────────┤
│ │
│ Layer 1: Testing │
│ - Everything requires approval │
│ - Maximum visibility into agent behavior │
│ - Purpose: Learn, debug, validate │
│ │
│ Layer 2: Staging │
│ - Auto-approve safe operations │
│ - Require approval for external calls │
│ - Purpose: Validate workflow logic │
│ │
│ Layer 3: Production │
│ - Auto-approve known-safe patterns │
│ - Require approval only for anomalies │
│ - Purpose: Run reliably with minimal interruption │
│ │
└─────────────────────────────────────────────────────────┘

What Actually Works in Production

I now run daily automation on OpenClaw with the core loop working reliably. Here’s my actual configuration (simplified):

production-config.yaml
workflow:
name: daily_data_sync
sandbox:
mode: partial
permissions:
network:
allow:
- api.internal.company.com
- storage.googleapis.com/my-bucket
filesystem:
allow:
- /data/input/
- /data/output/
approval_chain:
name: production-sync
rules:
# All deterministic ops: auto-approve
- match:
type: [message_delivery, channel_binding, state_read]
action: auto_approve
# Internal API: auto-approve (whitelisted)
- match:
api_host: api.internal.company.com
action: auto_approve
# External storage: log and auto-approve
- match:
api_host: storage.googleapis.com
action:
type: auto_approve
log: true
# File writes: require approval
- match:
operation: write
action: require_approval
# Everything else: require approval
- match:
default: true
action: require_approval

This runs daily without issues. When something does go wrong, the approval chain catches it before damage is done.

Common Mistakes I Made

Mistake 1: Fighting Variability Instead of Controlling It

I kept trying to make LLM reasoning 100% predictable. That’s impossible by design. The breakthrough was accepting variability in reasoning but ensuring critical paths used deterministic operations.

Mistake 2: Over-permissioning the Sandbox

More permissions ≠ more capability. Each permission is a failure point. Start with nothing, add only what you need.

Mistake 3: Under-specifying Approval Chains

Vague approval rules lead to unpredictable behavior. Be explicit about what gets approved automatically and what needs human oversight.

Mistake 4: Using Third-Party Skills Blindly

Every ClawHub skill I used required auditing and modification. The time I saved was lost to debugging. Custom skills from scratch took longer upfront but worked reliably.

The Bottom Line

OpenClaw’s reliability isn’t magic—it’s configuration. Understanding the execution model, choosing the right sandbox mode, and building thoughtful approval chains transforms unpredictable agent behavior into deterministic, production-ready automation.

The key principles:

  1. Separate deterministic from reasoning operations - Structure workflows so critical paths use deterministic mechanisms
  2. Choose sandbox mode based on environment control - Partial sandbox for most production work
  3. Layer approval chains - Different chains for testing, staging, and production
  4. Build custom skills - Third-party skills are starting points, not drop-in solutions

Once I understood these mechanisms, OpenClaw went from frustrating to reliable. The core loop does work—it just needs proper configuration.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments