Using Local LLM as Security Gatekeeper: Implementing Semantic Review for AI Agents
Problem
After understanding Dual-Brain Architecture, I wanted to know: how do I actually implement the local “Cerebellum” model for security review?
I had questions:
- What model size is practical?
- How do I run it offline?
- What should the review prompt look like?
- What happens if the model is unavailable?
Environment
- Local LLM: 0.8B-1.5B parameter quantized model (e.g., Llama-3.2-1B, Qwen2.5-1.5B)
- Inference engine: llama.cpp with CGO bindings
- Memory: 1-2GB RAM for quantized model
- Platform: Desktop AI Agent application
Why Use a Local Model?
I considered three approaches for security review:
| Approach | Privacy | Latency | Independence |
|---|---|---|---|
| Cloud model review | Sends sensitive data | Network delay | Depends on provider |
| Rule-based only | Private | Minimal | Static patterns |
| Local LLM review | Private | 100-500ms | Self-contained |
Privacy: Local model keeps all data on device. Sensitive file paths, user messages, and tool parameters never leave my machine.
Latency: 100-500ms per review is acceptable for security-critical operations. Most agents make a few tool calls per request, so total delay stays reasonable.
Independence: Security doesn’t depend on cloud provider policies or model changes. I control the review logic.
Model Selection
I looked at what model sizes work for security review:
Model Size | Memory | Speed | Reasoning Quality--------------|-----------|-------------|------------------0.5B params | ~500MB | Very fast | Limited reasoning0.8B params | ~800MB | Fast | Basic semantic understanding1.5B params | ~1.5GB | Moderate | Good semantic review3B params | ~3GB | Slower | Better reasoning, more latency7B params | ~7GB | Slow | Overkill for security reviewFor security review, 0.8B-1.5B is the sweet spot. They have enough reasoning capability to understand intent alignment and detect suspicious patterns, while remaining fast enough for interactive use.
I recommend quantized models (Q4_K_M or Q5_K_M) for reduced memory footprint.
Implementation
Here’s how I implement the review function:
func (m *Manager) ReviewToolCall(req ToolCallReviewRequest) (ToolCallReviewResult, error) { status := m.local.Status()
// Graceful degradation: auto-approve if cerebellum unavailable if status != StatusRunning { return ToolCallReviewResult{ Verdict: "approve", Reason: "cerebellum not running; degraded to rule-only check", Risk: "none", }, nil }
// Build review prompt prompt := buildToolCallReviewPrompt( req.UserMessage, req.ToolName, req.ToolParams )
// Synchronous inference with llama.cpp output, err := m.inferSync(prompt, 4096) if err != nil { // Also degrade gracefully on inference failure return ToolCallReviewResult{ Verdict: "approve", Reason: "inference failed; degraded to rule-only check", Risk: "none", }, nil }
// Parse JSON result result := parseToolCallReviewOutput(output) return result, nil}The key design decisions:
-
Graceful degradation: If the cerebellum is unavailable or inference fails, the system continues functioning with rule-based checks. It doesn’t block workflow.
-
Synchronous inference: Security review should complete before tool execution. 100-500ms is acceptable.
-
JSON output: Structured output makes parsing reliable.
Review Prompt Design
The review prompt must guide the model to evaluate multiple security dimensions:
You are a security review assistant. Your task is to review AI Agent tool calls.
User's original request: {user_message}
Tool to execute: {tool_name}Tool parameters: {tool_params}
## Review Points1. Does this tool call match the user's original intent?2. Is there data exfiltration risk (e.g., curl uploading sensitive files)?3. Are there destructive operations beyond user expectations?4. Are there injection attacks in parameters (command injection, path traversal)?5. Is the operation scope limited to authorized sandbox directories?
## Output Format (strict JSON){"verdict": "approve|flag|reject", "reason": "brief explanation", "risk": "none|low|medium|high"}
Respond only with the JSON object, no additional text.I found this prompt works well with small models. It provides clear structure and asks for specific evaluation dimensions.
Smart Skip Mechanism
Not every tool call needs full semantic review. I implement a smart skip mechanism:
func ShouldReviewToolCall(toolName string, toolParams map[string]any, isElevated bool) bool { // Config mode doesn't need review if isConfigMode { return false }
// Low-risk read-only tools skip by default if isLowRiskReadOnly(toolName) { if containsSensitiveKeywords(toolParams) { return true // Sensitive keyword detected, must review } return false // Safe to skip }
// Elevated operations always reviewed if isElevated { return true }
// Sensitive keywords always reviewed if containsSensitiveKeywords(toolParams) { return true }
return true // Default: review}Low-risk operations (skip review):
memory_search,sessions_list- read-only, non-sensitivelist_files- within sandbox, no sensitive keywords
High-risk operations (must review):
write_file,delete_file,exec- write/delete/execute- Any operation with elevated permissions
- Any operation containing sensitive keywords (password, credential, token, .aws, etc.)
This balances security with performance—typical operations remain fast while dangerous ones are thoroughly checked.
The Reason
I think local LLM security review works because it bridges the semantic gap. Rule-based systems cannot understand intent alignment. Cloud models don’t understand local file sensitivity.
A local model has access to:
- The user’s original request (context)
- The tool name and parameters (what will happen)
- No need to send sensitive data externally (privacy)
It can evaluate whether the action makes sense given the request, without exposing sensitive information.
Summary
In this post, I showed how to implement local LLM-based semantic security review for AI Agents. The key point is using a small offline model (0.8B-1.5B parameters) with llama.cpp to review tool calls without sending sensitive data to external servers. Graceful degradation and smart skip mechanisms ensure the system remains usable while maintaining security.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Kocort Project - Local LLM Security Review Implementation
- 👨💻 llama.cpp - Efficient LLM Inference
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments