How to Set Up a Local OpenAI API Proxy for Privacy and Control
Problem
My OpenAI API key was leaking everywhere. I had it in multiple client applications, environment files, and logs. When I needed to debug an issue, I couldn’t find which request consumed the most tokens. And worst of all—I was making identical API calls repeatedly, paying for the same responses over and over.
Here’s what my setup looked like:
# My API key was in:# 1. .env files in multiple projects# 2. Client-side config (exposed to users)# 3. CI/CD pipelines (visible in logs)# 4. Multiple team members' local configs
# When I tried to audit usage:$ curl https://api.openai.com/v1/usageERROR: No detailed usage logs available
# When I made the same request twice:$ curl https://api.openai.com/v1/chat/completions -d '{"model": "gpt-4o", "messages": [...]}'$ # Paid $0.05 for first call$ curl https://api.openai.com/v1/chat/completions -d '{"model": "gpt-4o", "messages": [...]}'$ # Paid $0.05 again for identical responseI was losing money and security with no visibility.
What happened?
I searched for solutions and found that most “API gateways” were complex cloud services requiring me to send my keys to yet another third party. That defeated the purpose—I wanted to keep keys local, not share them with more providers.
Then I discovered the BYOK (Bring Your Own Key) pattern and a tool called ai-menshen (门神, Chinese for “door god”). It’s a local-first proxy that:
- Keeps your upstream API keys on your server only
- Injects authentication transparently to clients
- Audits all requests (even streaming)
- Caches responses to avoid duplicate charges
- Provides a built-in dashboard with zero external CDN calls
The architecture is simple:
BEFORE ┌─────────────┐ ┌─────────────┐ │ Client A │──key────▶│ OpenAI │ │ Client B │──key────▶│ API │ │ Client C │──key────▶│ │ └─────────────┘ └─────────────┘ Keys everywhere No audit logs No caching Repeated costs
AFTER (with ai-menshen) ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ Client A │──token──▶│ ai-menshen │──key────▶│ OpenAI │ │ Client B │──token──▶│ (local) │ │ API │ │ Client C │──token──▶│ │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ Clients use proxy token Key stays here only Audited + cached Dashboard at localhost SQLite audit logs No duplicate costsHow to solve it?
Step 1: Install ai-menshen
I installed the binary in seconds:
# One-liner for Linux and macOScurl -fsSL https://raw.githubusercontent.com/jiacai2050/ai-menshen/main/install.sh | sh
# The binary ends up in ~/.local/bin/ai-menshen# Verify it worksai-menshen -versionThe tool is a standalone Go binary with zero external dependencies (only SQLite). No Docker, no npm, no pip install.
Step 2: Generate configuration
I created the config directory and generated a default config:
mkdir -p ~/.config/ai-menshenai-menshen -gen-config > ~/.config/ai-menshen/config.tomlThe generated config looked like this:
listen = ":8080"
[auth]enable = truetoken = "your-proxy-token" # Clients use this, NOT your OpenAI key
[providers.openai]base_url = "https://api.openai.com"api_key = "sk-proj-your-real-openai-key" # Stays here only
[storage]retention_days = 90 # Auto-purge old logs
[cache]enable = truemax_age = 3600 # Cache TTL: 1 hour
[logging]log_request_body = truelog_response_body = trueI edited the config to add my real OpenAI API key:
vi ~/.config/ai-menshen/config.toml# Replace api_key with your actual OpenAI key# Replace token with a proxy token for clientsStep 3: Run the proxy
I started ai-menshen:
ai-menshen -config ~/.config/ai-menshen/config.toml
# Output2026-03-30 10:00:00 INFO Server listening on :80802026-03-30 10:00:00 INFO Dashboard available at http://localhost:8080/The dashboard appeared at http://localhost:8080/ immediately. All JS/CSS is embedded—zero external CDN calls, perfect for offline or private environments.
Step 4: Connect clients
I updated my Python client to use the proxy:
from openai import OpenAI
# OLD: Direct connection (key exposed)# client = OpenAI(api_key="sk-proj-xxx")
# NEW: Use local proxyclient = OpenAI( base_url="http://localhost:8080", api_key="your-proxy-token" # Proxy token, NOT your OpenAI key)
# All calls are now proxied, audited, and cachedresponse = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}], stream=True)For REST API calls:
curl http://localhost:8080/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer your-proxy-token" \ -d '{ "model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}] }'I tested and got a successful response. The request was logged in the dashboard.
Step 5: Verify caching works
I made the same request twice:
# First callcurl http://localhost:8080/chat/completions -H "Authorization: Bearer token" -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "What is 2+2?"}]}'
# Second call (identical)curl http://localhost:8080/chat/completions -H "Authorization: Bearer token" -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "What is 2+2?"}]}'In the dashboard, I saw:
- First request: Called upstream OpenAI API (paid)
- Second request: Returned cached response (free)
I saved money on identical requests.
The reason
Why does this approach work better than direct API calls?
Security through separation: Clients never see your upstream API key. They only get a proxy token that you control. If a client is compromised, you rotate the proxy token—not your expensive OpenAI key.
Built-in observability: Every request is logged to SQLite, including streaming responses. The dashboard shows:
- Total token usage by model
- Request/response pairs for debugging
- Cost trends over time
Automatic caching: Responses are cached with configurable TTL. Identical requests return cached results, cutting costs significantly for repeated queries.
Zero external dependencies: The dashboard has no CDN calls. Everything runs locally. Your logs and keys never leave your machine.
I also learned some common mistakes:
Mistake 1: Skipping authentication
# BAD: No auth[auth]enable = false # Anyone can use your proxy
# GOOD: Enable auth[auth]enable = truetoken = "secure-proxy-token"Mistake 2: Not configuring cache TTL
# BAD: Cache never expires[cache]enable = truemax_age = 0 # Responses cached forever
# GOOD: Set reasonable TTL[cache]enable = truemax_age = 3600 # 1 hourMistake 3: Ignoring retention
# BAD: Logs grow unbounded[storage]retention_days = 0 # Keep everything forever
# GOOD: Auto-purge old logs[storage]retention_days = 90Running as a background service (macOS)
I set up ai-menshen as a launchd service so it runs automatically:
# Copy the plistcp configs/net.liujiacai.ai-menshen.plist ~/Library/LaunchAgents/
# Load and startlaunchctl load ~/Library/LaunchAgents/net.liujiacai.ai-menshen.plist
# Check statuslaunchctl list | grep ai-menshen
# View logstail -f /tmp/ai-menshen-stderr.logThe service:
- Starts automatically on login
- Restarts on crash
- Runs in the background
Summary
In this post, I showed how to set up a local OpenAI API proxy using ai-menshen. The key point is keeping your API keys secure while gaining auditing, caching, and centralized control. Install the binary, configure your upstream provider, point clients to localhost, and let the proxy handle authentication injection, usage logging, and response caching automatically.
I went from having keys scattered across multiple clients with zero visibility to having a single secure gateway with full audit logs and cost savings through caching. The setup took five minutes and the dashboard showed everything I needed.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments