Skip to content

Should LLM-Assisted Code Be Allowed in Critical Infrastructure? A Governance Framework

The Problem

I saw a GitHub petition recently that caught my attention. A Node.js core contributor filed a formal protest after a 19,000-line LLM-generated pull request attempted to modify Node.js filesystem internals.

The PR would fundamentally change how Node.js handles file operations—code that powers millions of applications worldwide. A single bug could cascade to countless downstream users.

But here’s what struck me most: the debate wasn’t really about AI-generated code. It was about review capacity versus change scope.

When I read through the Reddit discussion that followed, one comment stood out:

“A 19k LoC PR is too big whether it is AI or human-written. Enforce maximum PR sizes, test coverage, code style, and security. From there it won’t matter if it’s AI or not.” (132 votes)

This comment had 132 upvotes. The community understands the real issue isn’t the origin of the code—it’s whether anyone can actually review it.

What’s Really at Stake

Critical infrastructure projects like Node.js, OpenSSL, and the Linux kernel have disproportionate impact. A vulnerability in any of these affects millions of systems.

The Node.js PR #61478 I mentioned demonstrates the tension between innovation velocity and code quality assurance. The contributor wanted to add a virtual file system feature. That’s valuable. But the approach—dumping 19k lines of code at once—overwhelmed reviewers regardless of whether those lines came from an AI or a human.

I see three core problems:

Problem 1: Size, Not Source

A 19k-line PR is problematic because:

  • No single reviewer can fully comprehend it
  • Edge cases get missed
  • Maintenance burden increases exponentially
  • Rollback becomes nearly impossible

Problem 2: Detection Impossibility

One commenter put it bluntly:

“There’s literally no consistent way of knowing whether a PR was AI-assisted. Can contributors use GitHub Copilot? Where do you draw the line?” (16 votes)

Banning AI assistance is unenforceable. Copilot, ChatGPT, and local models are everywhere. Contributors will use them regardless of policy.

Problem 3: Quality Gate Confusion

“The code origin shouldn’t matter. If it passes review, it passes review. We already have a quality gate—the review process. Humans write bad, unmaintainable code all the time.” (27 votes)

The reviewer’s job isn’t to judge where code came from. It’s to verify correctness, maintainability, and security. These checks should apply equally to all contributions.

A Pragmatic Governance Framework

Rather than banning LLM-assisted code, I believe projects should enforce robust governance. Here’s a framework that addresses the real issues:

Governance Pipeline
+------------------+ +-------------------+ +------------------+
| Contribution | | Automated | | Human Review |
| Guidelines |---->| Quality Gates |---->| & Approval |
+------------------+ +-------------------+ +------------------+
| | |
v v v
- Max PR size - Test coverage - Disclosure review
- Disclosure req - Code style - Domain expertise
- Test requirements - Security scans - Maintenance plan

Policy 1: Size Limits, Not Source Bans

The most effective approach I’ve seen caps PRs at reasonable sizes:

.github/pull-request-policy.yml
pr_limits:
default:
max_additions: 1000
max_files_changed: 20
require_tests: true
critical_paths:
- "lib/fs/**"
- "lib/crypto/**"
- "src/tcp_wrap.cc"
max_additions: 500
require_domain_expert: true
require_design_doc: true

For critical modules like filesystem or crypto, the limits are even stricter. Large architectural changes require design documents and multiple reviewers before code is even written.

Policy 2: Mandatory AI Disclosure

One comment captured this well:

“A disclosure requirement makes more sense than a ban—helps reviewers calibrate how hard to push on understanding the code vs just checking correctness.” (7 votes)

Here’s a disclosure template that works:

PR Template - AI Disclosure Section
## AI Assistance Disclosure
- [ ] This PR includes AI-generated code
- Estimated AI contribution: ___%
- AI tools used: [Copilot / ChatGPT / Claude / Other]
### Understanding Confirmation
I have reviewed all AI-generated code and can explain:
- [ ] The purpose of each generated section
- [ ] How edge cases are handled
- [ ] Why this approach was chosen over alternatives

The key is no penalty for honest disclosure. Reviewers just need to calibrate their approach.

Policy 3: Enhanced Review Requirements

For critical paths, I recommend:

RequirementStandard CodeCritical Path
Test coverage80%90%+
Reviewers12+ with domain expertise
Design docOptionalRequired for >500 LoC
Security scanAutomatedAutomated + manual review
Performance testOptionalRequired

Policy 4: Quality Gates That Matter

Automated checks should catch what humans miss:

ci-pipeline.yml
quality_gates:
- name: test_coverage
threshold: 90
critical_paths: true
- name: security_scan
tools: [snyk, dependabot, codeql]
- name: performance_regression
benchmark_comparison: main
- name: compatibility_check
node_versions: [18, 20, 22]
os: [linux, macos, windows]

These gates apply to all code. AI-generated or human-written—the checks don’t care.

Why This Matters for Different Stakeholders

For Open Source Projects

Trust is everything. Downstream users need confidence that:

  1. Code has been properly reviewed
  2. Edge cases are handled
  3. Security vulnerabilities are caught
  4. Maintenance won’t be a nightmare

A 19k-line PR from any source undermines that trust.

For Enterprise Adoption

Enterprises using Node.js or OpenSSL in production have real concerns:

  • Compliance requirements demand code provenance
  • Security vulnerabilities have real financial costs
  • Audit trails must exist for regulatory reasons

Governance frameworks satisfy these requirements without requiring source tracking.

For the AI Development Community

This sets precedent for responsible AI tooling. Rather than fighting AI adoption, we’re encouraging:

  • Tools that enhance review, not bypass it
  • Sustainable AI-assisted development practices
  • Transparency about AI involvement

Common Mistakes I See

When projects try to address this issue, they often make these errors:

Mistake 1: Banning AI Assistance Entirely

This is unenforceable. Copilot is embedded in editors. ChatGPT is a tab away. Local models run offline. Contributors will use AI regardless of policy, and a ban just drives it underground.

Mistake 2: Treating All PRs Equally

A 50-line bug fix is not the same as a 19k-line architectural overhaul. Critical paths need higher scrutiny. New contributors may need more guidance. Context matters.

Mistake 3: Skipping Review for “Clean” Code

Well-formatted, passing tests does not equal correct logic. AI code can be syntactically perfect but semantically wrong. Review must verify understanding, not just correctness.

Mistake 4: Ignoring the Disclosure Problem

Without honest disclosure, reviewers can’t calibrate. Trust requires transparency. Creating shame culture around AI use only hurts the project.

A Practical Example

Let’s say a contributor wants to add a new stream API method to Node.js. Here’s how the governance framework would work:

Step 1: Check Contribution Guidelines

The contributor reads that stream API changes require design docs for changes over 200 lines.

Step 2: Submit Design Doc

Before writing code, they submit an RFC explaining the change, its impact, and alternative approaches.

Step 3: Get Feedback

Domain experts review the design, not the implementation. They identify edge cases, security concerns, and compatibility issues early.

Step 4: Implement in Phases

The 19k-line mega-PR becomes 10 smaller PRs of 500-1000 lines each, all following the approved design.

Step 5: Each PR Goes Through Quality Gates

  • Automated tests pass
  • Coverage exceeds 90%
  • Security scan clean
  • Performance benchmarks stable

Step 6: Human Review

Reviewers can actually understand each PR. They focus on edge cases, maintainability, and whether the implementation matches the approved design.

The result: 19k lines of code still get merged, but they’re reviewed properly.

Why Process Over Origin Works

The governance approach has several advantages over an AI ban:

  1. Enforceable: You can measure PR size, test coverage, and reviewer sign-off
  2. Fair: Applies equally to all contributors regardless of their tools
  3. Practical: Addresses the real problem (review capacity) directly
  4. Future-proof: Works regardless of how AI tools evolve

Most importantly, it focuses reviewers on what matters: correctness, maintainability, and security—not where the code came from.

Summary

In this post, I explored how critical infrastructure projects should handle LLM-generated code. The key point is that governance policies—PR size limits, mandatory disclosure, comprehensive test coverage, and rigorous code review standards—address the real issues without trying to enforce untraceable AI bans.

Rather than banning LLM-assisted code, projects should ensure every line of code is understood by a human before it ships. The origin doesn’t matter. The review does.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments