Skip to content

How to Set Up a Local OpenAI API Proxy for Privacy and Control

Problem

My OpenAI API key was leaking everywhere. I had it in multiple client applications, environment files, and logs. When I needed to debug an issue, I couldn’t find which request consumed the most tokens. And worst of all—I was making identical API calls repeatedly, paying for the same responses over and over.

Here’s what my setup looked like:

leaky-setup.txt
# My API key was in:
# 1. .env files in multiple projects
# 2. Client-side config (exposed to users)
# 3. CI/CD pipelines (visible in logs)
# 4. Multiple team members' local configs
# When I tried to audit usage:
$ curl https://api.openai.com/v1/usage
ERROR: No detailed usage logs available
# When I made the same request twice:
$ curl https://api.openai.com/v1/chat/completions -d '{"model": "gpt-4o", "messages": [...]}'
$ # Paid $0.05 for first call
$ curl https://api.openai.com/v1/chat/completions -d '{"model": "gpt-4o", "messages": [...]}'
$ # Paid $0.05 again for identical response

I was losing money and security with no visibility.

What happened?

I searched for solutions and found that most “API gateways” were complex cloud services requiring me to send my keys to yet another third party. That defeated the purpose—I wanted to keep keys local, not share them with more providers.

Then I discovered the BYOK (Bring Your Own Key) pattern and a tool called ai-menshen (门神, Chinese for “door god”). It’s a local-first proxy that:

  • Keeps your upstream API keys on your server only
  • Injects authentication transparently to clients
  • Audits all requests (even streaming)
  • Caches responses to avoid duplicate charges
  • Provides a built-in dashboard with zero external CDN calls

The architecture is simple:

architecture-diagram.txt
BEFORE
┌─────────────┐ ┌─────────────┐
│ Client A │──key────▶│ OpenAI │
│ Client B │──key────▶│ API │
│ Client C │──key────▶│ │
└─────────────┘ └─────────────┘
Keys everywhere No audit logs
No caching Repeated costs
AFTER (with ai-menshen)
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Client A │──token──▶│ ai-menshen │──key────▶│ OpenAI │
│ Client B │──token──▶│ (local) │ │ API │
│ Client C │──token──▶│ │ │ │
└─────────────┘ └─────────────┘ └─────────────┘
Clients use proxy token Key stays here only Audited + cached
Dashboard at localhost SQLite audit logs No duplicate costs

How to solve it?

Step 1: Install ai-menshen

I installed the binary in seconds:

install.sh
# One-liner for Linux and macOS
curl -fsSL https://raw.githubusercontent.com/jiacai2050/ai-menshen/main/install.sh | sh
# The binary ends up in ~/.local/bin/ai-menshen
# Verify it works
ai-menshen -version

The tool is a standalone Go binary with zero external dependencies (only SQLite). No Docker, no npm, no pip install.

Step 2: Generate configuration

I created the config directory and generated a default config:

setup-config.sh
mkdir -p ~/.config/ai-menshen
ai-menshen -gen-config > ~/.config/ai-menshen/config.toml

The generated config looked like this:

config.toml
listen = ":8080"
[auth]
enable = true
token = "your-proxy-token" # Clients use this, NOT your OpenAI key
[providers.openai]
base_url = "https://api.openai.com"
api_key = "sk-proj-your-real-openai-key" # Stays here only
[storage]
retention_days = 90 # Auto-purge old logs
[cache]
enable = true
max_age = 3600 # Cache TTL: 1 hour
[logging]
log_request_body = true
log_response_body = true

I edited the config to add my real OpenAI API key:

edit-config.sh
vi ~/.config/ai-menshen/config.toml
# Replace api_key with your actual OpenAI key
# Replace token with a proxy token for clients

Step 3: Run the proxy

I started ai-menshen:

run-proxy.sh
ai-menshen -config ~/.config/ai-menshen/config.toml
# Output
2026-03-30 10:00:00 INFO Server listening on :8080
2026-03-30 10:00:00 INFO Dashboard available at http://localhost:8080/

The dashboard appeared at http://localhost:8080/ immediately. All JS/CSS is embedded—zero external CDN calls, perfect for offline or private environments.

Step 4: Connect clients

I updated my Python client to use the proxy:

client.py
from openai import OpenAI
# OLD: Direct connection (key exposed)
# client = OpenAI(api_key="sk-proj-xxx")
# NEW: Use local proxy
client = OpenAI(
base_url="http://localhost:8080",
api_key="your-proxy-token" # Proxy token, NOT your OpenAI key
)
# All calls are now proxied, audited, and cached
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
stream=True
)

For REST API calls:

rest-call.sh
curl http://localhost:8080/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-proxy-token" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}'

I tested and got a successful response. The request was logged in the dashboard.

Step 5: Verify caching works

I made the same request twice:

test-cache.sh
# First call
curl http://localhost:8080/chat/completions -H "Authorization: Bearer token" -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "What is 2+2?"}]}'
# Second call (identical)
curl http://localhost:8080/chat/completions -H "Authorization: Bearer token" -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "What is 2+2?"}]}'

In the dashboard, I saw:

  • First request: Called upstream OpenAI API (paid)
  • Second request: Returned cached response (free)

I saved money on identical requests.

The reason

Why does this approach work better than direct API calls?

Security through separation: Clients never see your upstream API key. They only get a proxy token that you control. If a client is compromised, you rotate the proxy token—not your expensive OpenAI key.

Built-in observability: Every request is logged to SQLite, including streaming responses. The dashboard shows:

  • Total token usage by model
  • Request/response pairs for debugging
  • Cost trends over time

Automatic caching: Responses are cached with configurable TTL. Identical requests return cached results, cutting costs significantly for repeated queries.

Zero external dependencies: The dashboard has no CDN calls. Everything runs locally. Your logs and keys never leave your machine.

I also learned some common mistakes:

Mistake 1: Skipping authentication

auth-mistake.toml
# BAD: No auth
[auth]
enable = false # Anyone can use your proxy
# GOOD: Enable auth
[auth]
enable = true
token = "secure-proxy-token"

Mistake 2: Not configuring cache TTL

cache-mistake.toml
# BAD: Cache never expires
[cache]
enable = true
max_age = 0 # Responses cached forever
# GOOD: Set reasonable TTL
[cache]
enable = true
max_age = 3600 # 1 hour

Mistake 3: Ignoring retention

retention-mistake.toml
# BAD: Logs grow unbounded
[storage]
retention_days = 0 # Keep everything forever
# GOOD: Auto-purge old logs
[storage]
retention_days = 90

Running as a background service (macOS)

I set up ai-menshen as a launchd service so it runs automatically:

launchd-setup.sh
# Copy the plist
cp configs/net.liujiacai.ai-menshen.plist ~/Library/LaunchAgents/
# Load and start
launchctl load ~/Library/LaunchAgents/net.liujiacai.ai-menshen.plist
# Check status
launchctl list | grep ai-menshen
# View logs
tail -f /tmp/ai-menshen-stderr.log

The service:

  • Starts automatically on login
  • Restarts on crash
  • Runs in the background

Summary

In this post, I showed how to set up a local OpenAI API proxy using ai-menshen. The key point is keeping your API keys secure while gaining auditing, caching, and centralized control. Install the binary, configure your upstream provider, point clients to localhost, and let the proxy handle authentication injection, usage logging, and response caching automatically.

I went from having keys scattered across multiple clients with zero visibility to having a single secure gateway with full audit logs and cost savings through caching. The setup took five minutes and the dashboard showed everything I needed.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments