Using Local LLM as Security Gatekeeper: Implementing Semantic Review for AI Agents

Mar 30, 2026

Problem

After understanding Dual-Brain Architecture, I wanted to know: how do I actually implement the local “Cerebellum” model for security review?

I had questions:

What model size is practical?
How do I run it offline?
What should the review prompt look like?
What happens if the model is unavailable?

Environment

Local LLM: 0.8B-1.5B parameter quantized model (e.g., Llama-3.2-1B, Qwen2.5-1.5B)
Inference engine: llama.cpp with CGO bindings
Memory: 1-2GB RAM for quantized model
Platform: Desktop AI Agent application

Why Use a Local Model?

I considered three approaches for security review:

Approach	Privacy	Latency	Independence
Cloud model review	Sends sensitive data	Network delay	Depends on provider
Rule-based only	Private	Minimal	Static patterns
Local LLM review	Private	100-500ms	Self-contained

Privacy: Local model keeps all data on device. Sensitive file paths, user messages, and tool parameters never leave my machine.

Latency: 100-500ms per review is acceptable for security-critical operations. Most agents make a few tool calls per request, so total delay stays reasonable.

Independence: Security doesn’t depend on cloud provider policies or model changes. I control the review logic.

Model Selection

I looked at what model sizes work for security review:

Model Size    | Memory    | Speed       | Reasoning Quality
--------------|-----------|-------------|------------------
0.5B params   | ~500MB    | Very fast   | Limited reasoning
0.8B params   | ~800MB    | Fast        | Basic semantic understanding
1.5B params   | ~1.5GB    | Moderate    | Good semantic review
3B params     | ~3GB      | Slower      | Better reasoning, more latency
7B params     | ~7GB      | Slow        | Overkill for security review

For security review, 0.8B-1.5B is the sweet spot. They have enough reasoning capability to understand intent alignment and detect suspicious patterns, while remaining fast enough for interactive use.

I recommend quantized models (Q4_K_M or Q5_K_M) for reduced memory footprint.

Implementation

Here’s how I implement the review function:

func (m *Manager) ReviewToolCall(req ToolCallReviewRequest) (ToolCallReviewResult, error) {
    status := m.local.Status()

    // Graceful degradation: auto-approve if cerebellum unavailable
    if status != StatusRunning {
        return ToolCallReviewResult{
            Verdict: "approve",
            Reason:  "cerebellum not running; degraded to rule-only check",
            Risk:    "none",
        }, nil
    }

    // Build review prompt
    prompt := buildToolCallReviewPrompt(
        req.UserMessage,
        req.ToolName,
        req.ToolParams
    )

    // Synchronous inference with llama.cpp
    output, err := m.inferSync(prompt, 4096)
    if err != nil {
        // Also degrade gracefully on inference failure
        return ToolCallReviewResult{
            Verdict: "approve",
            Reason:  "inference failed; degraded to rule-only check",
            Risk:    "none",
        }, nil
    }

    // Parse JSON result
    result := parseToolCallReviewOutput(output)
    return result, nil
}

The key design decisions:

Graceful degradation: If the cerebellum is unavailable or inference fails, the system continues functioning with rule-based checks. It doesn’t block workflow.
Synchronous inference: Security review should complete before tool execution. 100-500ms is acceptable.
JSON output: Structured output makes parsing reliable.

Review Prompt Design

The review prompt must guide the model to evaluate multiple security dimensions:

You are a security review assistant. Your task is to review AI Agent tool calls.

User's original request: {user_message}

Tool to execute: {tool_name}
Tool parameters: {tool_params}

## Review Points
1. Does this tool call match the user's original intent?
2. Is there data exfiltration risk (e.g., curl uploading sensitive files)?
3. Are there destructive operations beyond user expectations?
4. Are there injection attacks in parameters (command injection, path traversal)?
5. Is the operation scope limited to authorized sandbox directories?

## Output Format (strict JSON)
{"verdict": "approve|flag|reject", "reason": "brief explanation", "risk": "none|low|medium|high"}

Respond only with the JSON object, no additional text.

I found this prompt works well with small models. It provides clear structure and asks for specific evaluation dimensions.

Smart Skip Mechanism

Not every tool call needs full semantic review. I implement a smart skip mechanism:

func ShouldReviewToolCall(toolName string, toolParams map[string]any, isElevated bool) bool {
    // Config mode doesn't need review
    if isConfigMode {
        return false
    }

    // Low-risk read-only tools skip by default
    if isLowRiskReadOnly(toolName) {
        if containsSensitiveKeywords(toolParams) {
            return true  // Sensitive keyword detected, must review
        }
        return false  // Safe to skip
    }

    // Elevated operations always reviewed
    if isElevated {
        return true
    }

    // Sensitive keywords always reviewed
    if containsSensitiveKeywords(toolParams) {
        return true
    }

    return true  // Default: review
}

Low-risk operations (skip review):

memory_search, sessions_list - read-only, non-sensitive
list_files - within sandbox, no sensitive keywords

High-risk operations (must review):

write_file, delete_file, exec - write/delete/execute
Any operation with elevated permissions
Any operation containing sensitive keywords (password, credential, token, .aws, etc.)

This balances security with performance—typical operations remain fast while dangerous ones are thoroughly checked.

The Reason

I think local LLM security review works because it bridges the semantic gap. Rule-based systems cannot understand intent alignment. Cloud models don’t understand local file sensitivity.

A local model has access to:

The user’s original request (context)
The tool name and parameters (what will happen)
No need to send sensitive data externally (privacy)

It can evaluate whether the action makes sense given the request, without exposing sensitive information.

Summary

In this post, I showed how to implement local LLM-based semantic security review for AI Agents. The key point is using a small offline model (0.8B-1.5B parameters) with llama.cpp to review tool calls without sending sensitive data to external servers. Graceful degradation and smart skip mechanisms ensure the system remains usable while maintaining security.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Kocort Project - Local LLM Security Review Implementation
👨‍💻 llama.cpp - Efficient LLM Inference

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!