How to Set Up a Local OpenAI API Proxy for Privacy and Control

Mar 30, 2026

Problem

My OpenAI API key was leaking everywhere. I had it in multiple client applications, environment files, and logs. When I needed to debug an issue, I couldn’t find which request consumed the most tokens. And worst of all—I was making identical API calls repeatedly, paying for the same responses over and over.

Here’s what my setup looked like:

# My API key was in:
# 1. .env files in multiple projects
# 2. Client-side config (exposed to users)
# 3. CI/CD pipelines (visible in logs)
# 4. Multiple team members' local configs

# When I tried to audit usage:
$ curl https://api.openai.com/v1/usage
ERROR: No detailed usage logs available

# When I made the same request twice:
$ curl https://api.openai.com/v1/chat/completions -d '{"model": "gpt-4o", "messages": [...]}'
$ # Paid $0.05 for first call
$ curl https://api.openai.com/v1/chat/completions -d '{"model": "gpt-4o", "messages": [...]}'
$ # Paid $0.05 again for identical response

I was losing money and security with no visibility.

What happened?

I searched for solutions and found that most “API gateways” were complex cloud services requiring me to send my keys to yet another third party. That defeated the purpose—I wanted to keep keys local, not share them with more providers.

Then I discovered the BYOK (Bring Your Own Key) pattern and a tool called ai-menshen (门神, Chinese for “door god”). It’s a local-first proxy that:

Keeps your upstream API keys on your server only
Injects authentication transparently to clients
Audits all requests (even streaming)
Caches responses to avoid duplicate charges
Provides a built-in dashboard with zero external CDN calls

The architecture is simple:

                    BEFORE
    ┌─────────────┐          ┌─────────────┐
    │   Client A  │──key────▶│   OpenAI    │
    │   Client B  │──key────▶│    API      │
    │   Client C  │──key────▶│             │
    └─────────────┘          └─────────────┘
    Keys everywhere          No audit logs
    No caching               Repeated costs

                    AFTER (with ai-menshen)
    ┌─────────────┐          ┌─────────────┐          ┌─────────────┐
    │   Client A  │──token──▶│ ai-menshen  │──key────▶│   OpenAI    │
    │   Client B  │──token──▶│   (local)   │          │    API      │
    │   Client C  │──token──▶│             │          │             │
    └─────────────┘          └─────────────┘          └─────────────┘
    Clients use proxy token  Key stays here only      Audited + cached
    Dashboard at localhost   SQLite audit logs        No duplicate costs

How to solve it?

Step 1: Install ai-menshen

I installed the binary in seconds:

# One-liner for Linux and macOS
curl -fsSL https://raw.githubusercontent.com/jiacai2050/ai-menshen/main/install.sh | sh

# The binary ends up in ~/.local/bin/ai-menshen
# Verify it works
ai-menshen -version

The tool is a standalone Go binary with zero external dependencies (only SQLite). No Docker, no npm, no pip install.

Step 2: Generate configuration

I created the config directory and generated a default config:

mkdir -p ~/.config/ai-menshen
ai-menshen -gen-config > ~/.config/ai-menshen/config.toml

The generated config looked like this:

listen = ":8080"

[auth]
enable = true
token = "your-proxy-token"  # Clients use this, NOT your OpenAI key

[providers.openai]
base_url = "https://api.openai.com"
api_key = "sk-proj-your-real-openai-key"  # Stays here only

[storage]
retention_days = 90  # Auto-purge old logs

[cache]
enable = true
max_age = 3600  # Cache TTL: 1 hour

[logging]
log_request_body = true
log_response_body = true

I edited the config to add my real OpenAI API key:

vi ~/.config/ai-menshen/config.toml
# Replace api_key with your actual OpenAI key
# Replace token with a proxy token for clients

Step 3: Run the proxy

I started ai-menshen:

ai-menshen -config ~/.config/ai-menshen/config.toml

# Output
2026-03-30 10:00:00 INFO Server listening on :8080
2026-03-30 10:00:00 INFO Dashboard available at http://localhost:8080/

The dashboard appeared at http://localhost:8080/ immediately. All JS/CSS is embedded—zero external CDN calls, perfect for offline or private environments.

Step 4: Connect clients

I updated my Python client to use the proxy:

from openai import OpenAI

# OLD: Direct connection (key exposed)
# client = OpenAI(api_key="sk-proj-xxx")

# NEW: Use local proxy
client = OpenAI(
    base_url="http://localhost:8080",
    api_key="your-proxy-token"  # Proxy token, NOT your OpenAI key
)

# All calls are now proxied, audited, and cached
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True
)

For REST API calls:

curl http://localhost:8080/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-proxy-token" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

I tested and got a successful response. The request was logged in the dashboard.

Step 5: Verify caching works

I made the same request twice:

# First call
curl http://localhost:8080/chat/completions -H "Authorization: Bearer token" -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "What is 2+2?"}]}'

# Second call (identical)
curl http://localhost:8080/chat/completions -H "Authorization: Bearer token" -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "What is 2+2?"}]}'

In the dashboard, I saw:

First request: Called upstream OpenAI API (paid)
Second request: Returned cached response (free)

I saved money on identical requests.

The reason

Why does this approach work better than direct API calls?

Security through separation: Clients never see your upstream API key. They only get a proxy token that you control. If a client is compromised, you rotate the proxy token—not your expensive OpenAI key.

Built-in observability: Every request is logged to SQLite, including streaming responses. The dashboard shows:

Total token usage by model
Request/response pairs for debugging
Cost trends over time

Automatic caching: Responses are cached with configurable TTL. Identical requests return cached results, cutting costs significantly for repeated queries.

Zero external dependencies: The dashboard has no CDN calls. Everything runs locally. Your logs and keys never leave your machine.

I also learned some common mistakes:

Mistake 1: Skipping authentication

# BAD: No auth
[auth]
enable = false  # Anyone can use your proxy

# GOOD: Enable auth
[auth]
enable = true
token = "secure-proxy-token"

Mistake 2: Not configuring cache TTL

# BAD: Cache never expires
[cache]
enable = true
max_age = 0  # Responses cached forever

# GOOD: Set reasonable TTL
[cache]
enable = true
max_age = 3600  # 1 hour

Mistake 3: Ignoring retention

# BAD: Logs grow unbounded
[storage]
retention_days = 0  # Keep everything forever

# GOOD: Auto-purge old logs
[storage]
retention_days = 90

Running as a background service (macOS)

I set up ai-menshen as a launchd service so it runs automatically:

# Copy the plist
cp configs/net.liujiacai.ai-menshen.plist ~/Library/LaunchAgents/

# Load and start
launchctl load ~/Library/LaunchAgents/net.liujiacai.ai-menshen.plist

# Check status
launchctl list | grep ai-menshen

# View logs
tail -f /tmp/ai-menshen-stderr.log

The service:

Starts automatically on login
Restarts on crash
Runs in the background

Summary

In this post, I showed how to set up a local OpenAI API proxy using ai-menshen. The key point is keeping your API keys secure while gaining auditing, caching, and centralized control. Install the binary, configure your upstream provider, point clients to localhost, and let the proxy handle authentication injection, usage logging, and response caching automatically.

I went from having keys scattered across multiple clients with zero visibility to having a single secure gateway with full audit logs and cost savings through caching. The setup took five minutes and the dashboard showed everything I needed.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 ai-menshen GitHub Repository
👨‍💻 BYOK Security Pattern

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!