Skip to content

Using Local LLM as Security Gatekeeper: Implementing Semantic Review for AI Agents

Problem

After understanding Dual-Brain Architecture, I wanted to know: how do I actually implement the local “Cerebellum” model for security review?

I had questions:

  • What model size is practical?
  • How do I run it offline?
  • What should the review prompt look like?
  • What happens if the model is unavailable?

Environment

  • Local LLM: 0.8B-1.5B parameter quantized model (e.g., Llama-3.2-1B, Qwen2.5-1.5B)
  • Inference engine: llama.cpp with CGO bindings
  • Memory: 1-2GB RAM for quantized model
  • Platform: Desktop AI Agent application

Why Use a Local Model?

I considered three approaches for security review:

ApproachPrivacyLatencyIndependence
Cloud model reviewSends sensitive dataNetwork delayDepends on provider
Rule-based onlyPrivateMinimalStatic patterns
Local LLM reviewPrivate100-500msSelf-contained

Privacy: Local model keeps all data on device. Sensitive file paths, user messages, and tool parameters never leave my machine.

Latency: 100-500ms per review is acceptable for security-critical operations. Most agents make a few tool calls per request, so total delay stays reasonable.

Independence: Security doesn’t depend on cloud provider policies or model changes. I control the review logic.

Model Selection

I looked at what model sizes work for security review:

Local Model Size Trade-offs
Model Size | Memory | Speed | Reasoning Quality
--------------|-----------|-------------|------------------
0.5B params | ~500MB | Very fast | Limited reasoning
0.8B params | ~800MB | Fast | Basic semantic understanding
1.5B params | ~1.5GB | Moderate | Good semantic review
3B params | ~3GB | Slower | Better reasoning, more latency
7B params | ~7GB | Slow | Overkill for security review

For security review, 0.8B-1.5B is the sweet spot. They have enough reasoning capability to understand intent alignment and detect suspicious patterns, while remaining fast enough for interactive use.

I recommend quantized models (Q4_K_M or Q5_K_M) for reduced memory footprint.

Implementation

Here’s how I implement the review function:

Security Review Implementation
func (m *Manager) ReviewToolCall(req ToolCallReviewRequest) (ToolCallReviewResult, error) {
status := m.local.Status()
// Graceful degradation: auto-approve if cerebellum unavailable
if status != StatusRunning {
return ToolCallReviewResult{
Verdict: "approve",
Reason: "cerebellum not running; degraded to rule-only check",
Risk: "none",
}, nil
}
// Build review prompt
prompt := buildToolCallReviewPrompt(
req.UserMessage,
req.ToolName,
req.ToolParams
)
// Synchronous inference with llama.cpp
output, err := m.inferSync(prompt, 4096)
if err != nil {
// Also degrade gracefully on inference failure
return ToolCallReviewResult{
Verdict: "approve",
Reason: "inference failed; degraded to rule-only check",
Risk: "none",
}, nil
}
// Parse JSON result
result := parseToolCallReviewOutput(output)
return result, nil
}

The key design decisions:

  1. Graceful degradation: If the cerebellum is unavailable or inference fails, the system continues functioning with rule-based checks. It doesn’t block workflow.

  2. Synchronous inference: Security review should complete before tool execution. 100-500ms is acceptable.

  3. JSON output: Structured output makes parsing reliable.

Review Prompt Design

The review prompt must guide the model to evaluate multiple security dimensions:

Security Review Prompt Template
You are a security review assistant. Your task is to review AI Agent tool calls.
User's original request: {user_message}
Tool to execute: {tool_name}
Tool parameters: {tool_params}
## Review Points
1. Does this tool call match the user's original intent?
2. Is there data exfiltration risk (e.g., curl uploading sensitive files)?
3. Are there destructive operations beyond user expectations?
4. Are there injection attacks in parameters (command injection, path traversal)?
5. Is the operation scope limited to authorized sandbox directories?
## Output Format (strict JSON)
{"verdict": "approve|flag|reject", "reason": "brief explanation", "risk": "none|low|medium|high"}
Respond only with the JSON object, no additional text.

I found this prompt works well with small models. It provides clear structure and asks for specific evaluation dimensions.

Smart Skip Mechanism

Not every tool call needs full semantic review. I implement a smart skip mechanism:

Smart Skip Mechanism
func ShouldReviewToolCall(toolName string, toolParams map[string]any, isElevated bool) bool {
// Config mode doesn't need review
if isConfigMode {
return false
}
// Low-risk read-only tools skip by default
if isLowRiskReadOnly(toolName) {
if containsSensitiveKeywords(toolParams) {
return true // Sensitive keyword detected, must review
}
return false // Safe to skip
}
// Elevated operations always reviewed
if isElevated {
return true
}
// Sensitive keywords always reviewed
if containsSensitiveKeywords(toolParams) {
return true
}
return true // Default: review
}

Low-risk operations (skip review):

  • memory_search, sessions_list - read-only, non-sensitive
  • list_files - within sandbox, no sensitive keywords

High-risk operations (must review):

  • write_file, delete_file, exec - write/delete/execute
  • Any operation with elevated permissions
  • Any operation containing sensitive keywords (password, credential, token, .aws, etc.)

This balances security with performance—typical operations remain fast while dangerous ones are thoroughly checked.

The Reason

I think local LLM security review works because it bridges the semantic gap. Rule-based systems cannot understand intent alignment. Cloud models don’t understand local file sensitivity.

A local model has access to:

  • The user’s original request (context)
  • The tool name and parameters (what will happen)
  • No need to send sensitive data externally (privacy)

It can evaluate whether the action makes sense given the request, without exposing sensitive information.

Summary

In this post, I showed how to implement local LLM-based semantic security review for AI Agents. The key point is using a small offline model (0.8B-1.5B parameters) with llama.cpp to review tool calls without sending sensitive data to external servers. Graceful degradation and smart skip mechanisms ensure the system remains usable while maintaining security.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments