Security Best Practices When Using Codex: What to Trust AI With and What Not To
The Problem
When I started using Codex for more of my coding work, I wondered: can I trust it with security-critical code?
Then I read a Reddit thread where a developer shared this warning:
"When it comes to security (assuming you mean auth?), I would not trust an LLMto do that properly. You'd be better off offloading auth to some third-partyservice like Clerk."That made me pause. If AI can write code, why shouldn’t it write authentication? I dug deeper into the discussion and found practical advice on what to delegate and what to handle yourself.
What Happened
I had been asking Codex to help me build various features. Most worked fine. But when I asked it to implement user authentication from scratch, I got code that looked reasonable but had subtle issues:
// What Codex generated for password verificationasync function verifyPassword(plainPassword, hashedPassword) { // This looks correct but uses a weak hashing approach const hash = await bcrypt.hash(plainPassword, 8); // Low salt rounds return hash === hashedPassword; // Wrong comparison method!}The code looked plausible. But it had two problems:
- Salt rounds of 8 is too low for modern security standards (should be 12+)
- Direct string comparison on hashes is vulnerable to timing attacks
I wouldn’t have caught these issues without manual review. That’s the core problem: AI generates plausible-looking code that may have security vulnerabilities you won’t notice until it’s too late.
Why AI Struggles With Security
AI coding assistants have specific limitations when it comes to security:
They optimize for functionality, not security. When I ask Codex to “add login,” it produces working login code. But working doesn’t mean secure.
They don’t understand your threat model. AI doesn’t know who your users are, what data you’re protecting, or what attacks you’re worried about.
They can’t reason about system-wide security. Each piece of code is generated in isolation, without considering how it fits into your overall security architecture.
They may use outdated patterns. Training data includes old code with outdated security practices.
What NOT to Trust AI With
Based on the Reddit discussion and my own experience, here are the areas I never delegate to AI:
1. Authentication Implementation
BAD: "Implement user login with JWT tokens and password hashing"GOOD: "Integrate Clerk for authentication"Authentication has too many edge cases:
- Session management
- Token rotation
- Password reset flows
- Multi-factor authentication
- Rate limiting
- Brute force protection
Use established services: Clerk, Auth0, Supabase Auth, Firebase Auth.
2. Payment Processing
BAD: "Build a payment system with credit card processing"GOOD: "Integrate Stripe checkout"Payment security includes:
- PCI compliance
- Fraud detection
- Secure card storage
- Webhook verification
- Refund handling
Use Stripe, PayPal, or similar services.
3. API Key and Secret Management
// NEVER let AI write code like thisconst apiKey = "sk-proj-xxxxx" // Hardcoded secret!
// ALWAYS use environment variablesconst apiKey = process.env.STRIPE_SECRET_KEYif (!apiKey) { throw new Error('STRIPE_SECRET_KEY not configured')}AI might accidentally expose secrets in:
- Log statements
- Error messages
- Debug output
- Git commits
4. Encryption Implementation
Rolling your own encryption is dangerous even for experienced developers. AI-generated encryption code might:
- Use weak algorithms
- Generate predictable random values
- Misuse initialization vectors
- Leak information through error messages
Use libraries and services that handle encryption properly.
5. Access Control Logic
// AI might generate code like thisif (user.role === 'admin') { return allData // Too broad!}
// What you actually needif (user.role === 'admin' && hasPermission(user, 'read:all')) { return filterByOrganization(allData, user.orgId)}Access control requires understanding your business rules, data relationships, and compliance requirements.
What IS Safe to Delegate
Not everything is security-critical. I happily let AI handle:
UI Components - React components, styling, responsive layouts
Non-sensitive CRUD operations - Blog posts, comments, public content
Business logic (with review) - Validation rules, calculations, workflows
Tests - Unit tests, integration tests, test fixtures
Documentation - README files, API docs, comments
Refactoring - Code cleanup, extraction, naming improvements
The Multi-Layer Review Approach
From the Reddit discussion, I learned a practical approach:
"I also sometimes use the mid-tier models like Codex Medium or Sonnet tocomplete a task, then ask Codex High or Opus to review the changes, inaddition to manually code reviewing."This creates multiple layers of review:
+-------------------+ +-------------------+ +-------------------+| Codex Medium | --> | Codex High | --> | Manual Review || (writes code) | | (reviews code) | | (security check) |+-------------------+ +-------------------+ +-------------------+For security-critical paths, I add one more layer:
+-------------------+| Security Audit || (Snyk, npm audit)|+-------------------+Tools for Security Auditing
The Reddit thread mentioned tools that catch vulnerabilities:
Snyk - Scans code and dependencies for known vulnerabilities
# Install and run Snyknpm install -g snyksnyk testnpm audit - Checks for vulnerable dependencies
npm auditnpm audit fixOWASP ZAP - Web application security scanner
These tools catch issues AI might introduce:
- Known vulnerable dependencies
- Common security misconfigurations
- Exposed secrets in code
- Missing security headers
A Practical Security Workflow
When I use Codex for any feature, I follow this workflow:
Step 1: Identify Security Sensitivity
Is this code handling:- User credentials? -> Use auth service- Payment data? -> Use payment service- Sensitive data? -> Manual review required- Access control? -> Manual review required- Public data only? -> Safe to delegateStep 2: Delegate or Build
For non-sensitive code, I let Codex write it. For security-critical code, I:
- Research the recommended approach
- Use established services or libraries
- Have Codex help with integration, not implementation
Step 3: Review with Stronger Model
After Codex completes a task, I ask a stronger model to review:
Prompt: "Review this code for security issues, focusing on:- Input validation- Authentication/authorization- Data exposure- Error handling that might leak information"Step 4: Run Security Tools
# Check for vulnerabilitiessnyk testnpm audit
# Check for secrets in codegit diff --staged | grep -i "api_key\|password\|secret\|token"Step 5: Manual Review for Critical Paths
For authentication, payments, and data access code, I always read every line myself.
Common Mistakes I’ve Made
Mistake 1: Implementing custom auth
I once asked Codex to implement password reset. The code worked, but it didn’t handle:
- Token expiration
- Rate limiting reset attempts
- Logging for audit trails
- Email template security
Now I use Clerk or Auth0 for everything auth-related.
Mistake 2: Hardcoded secrets in AI-generated code
Codex once added an API key directly in the code because I didn’t specify environment variables. It looked like:
const response = await fetch('https://api.example.com', { headers: { 'Authorization': 'Bearer sk-test-1234' }})I now always specify: “Use environment variables for all secrets.”
Mistake 3: Trusting AI to understand access control
I asked Codex to “add admin functionality.” It created routes that checked if a user was admin, but didn’t verify they could access specific resources. Admin from Organization A could see Organization B’s data.
Security Checklist Before Committing AI-Generated Code
## Secrets Check- [ ] No API keys in code- [ ] No passwords in code- [ ] No tokens in code- [ ] .env file is in .gitignore
## Input Validation- [ ] All user inputs are validated- [ ] SQL queries use parameters (not string concatenation)- [ ] File uploads are validated and limited
## Access Control- [ ] Sensitive routes are protected- [ ] User can only access their own data- [ ] Admin routes check admin status
## Dependencies- [ ] npm audit shows no critical vulnerabilities- [ ] Dependencies are up to dateSummary
In this post, I explained what parts of your application you should never delegate to AI coding assistants like Codex. Authentication, payment processing, encryption, and access control are too important to trust to AI-generated code.
Instead:
- Use established third-party services for auth (Clerk, Auth0) and payments (Stripe)
- Run security audits with tools like Snyk
- Have stronger models review code changes
- Always manually review security-critical implementations
AI coding assistants are powerful for many tasks. But security requires human judgment, established best practices, and defense in depth that AI simply can’t provide on its own.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Reddit Discussion: Best way to use Codex
- 👨💻 Clerk Authentication
- 👨💻 Auth0 Documentation
- 👨💻 Snyk Security Scanner
- 👨💻 OWASP Top 10 Security Risks
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments