Skip to content

Why Windows Treats .com Files as MS-DOS Applications

Problem: “Porn in Conda”

I was searching through my Anaconda directory and Windows Explorer showed something alarming:

www.youporn.com
Type: MS-DOS Application
Location: C:\anaconda3\Lib\site-packages\protego\tests

My antivirus started flagging it. Windows was recommending I run this MS-DOS application. I panicked.

I opened the file in Notepad. It wasn’t an executable at all:

User-agent: *
Disallow: /login
Disallow: /signup

It was just a robots.txt test fixture from the Protego Python library. The library uses real domain names for testing robots.txt parsing.

But why did Windows think this text file was an MS-DOS application?

The answer: 40 years of legacy.

The .com Extension: A Brief History

The .com extension isn’t about websites. It’s about “command” files.

In 1974, Digital Research’s CP/M operating system used .com for command files. These were simple executables:

  • Raw machine code that loaded at a specific memory address
  • No file header or metadata
  • Maximum size: 64KB
  • Faster load times than more complex formats

When Microsoft created MS-DOS in 1981, they inherited the .com format from CP/M. DOS supported two executable types:

.com files:

  • Simple binary images
  • Loaded at offset 0x100
  • Size limited to ~65KB (64KB minus PSP)
  • No relocation data

.exe files:

  • Complex format with headers
  • Relocation data for flexible memory positioning
  • No practical size limit
  • Slower to load (header parsing)

Early DOS developers preferred .com for small utilities:

  • Smaller memory footprint (critical with 640KB RAM limit)
  • Faster execution (no header parsing overhead)
  • Simpler compilation process

The .com format dominated the DOS era. Windows inherited this legacy.

How Windows Determines File Types

Windows uses the PATHEXT environment variable to decide which extensions are executable:

PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC

The order matters. When you type program in Command Prompt, Windows searches:

1. program.com
2. program.exe
3. program.bat
4. program.cmd
... (rest of PATHEXT)

The .com extension comes first. This isn’t random—it’s for backwards compatibility with DOS batch scripts that assumed .com files would execute first.

Windows also uses registry-based file associations:

HKEY_CLASSES_ROOT\.com
(default) = "comfile"
PerceivedType = "application"

This registry key tells Windows Explorer to display .com files with the MS-DOS Application icon and type label.

The problem: Windows checks file extensions, not file contents.

The Namespace Collision

Here’s where history clashes with the modern web:

  • 1985: The .com top-level domain is established
  • 1985: MS-DOS 3.2 is released (peak .com executable usage)
  • 2024: Developers use domain names as test file names

Web developers create test files named after domains:

  • www.example.com (robots.txt test fixture)
  • test.google.com (scraping test data)
  • production.cloudflare.com (config file)

Windows sees the .com extension and thinks “DOS executable.”

Real-World Impact

This isn’t just theoretical. The Protego Python library includes www.youporn.com as a test fixture because that domain has an interesting robots.txt configuration. Legitimate choice for testing.

But on Windows:

  • Explorer shows it as “MS-DOS Application”
  • Search results recommend running the file
  • Antivirus flags suspicious executables
  • Version control systems show wrong file types
  • Security scanners detect “potential threats”

The file is harmless text. Windows just misclassifies it.

What Makes a .com File Actually Executable?

Here’s the technical reality:

Real .com executables:

  • Binary data (machine code)
  • First bytes are executable x86 instructions
  • File size under 65,280 bytes
  • Contains no readable text

Text files with .com extension:

  • ASCII/UTF-8 readable text
  • First bytes are text characters
  • Can be any size
  • Human-readable content

Windows doesn’t validate this. It checks the extension, not the content.

If you double-click a text .com file, Windows tries to execute it:

Error: www.example.com is not a valid Win32 application

The execution fails. But the misclassification remains.

Security Implications

This legacy behavior creates real security risks:

Actual threat: Malicious .com executables can run if double-clicked. Email attachments with .com extension historically dangerous.

False confidence: Developers see “MS-DOS Application” for their test files and ignore the warning. Then a real malicious .com file appears, and they dismiss it as “another test file.”

Windows SmartScreen helps by blocking unknown .com downloads. But the classification problem remains.

Practical Solutions

I explored several fixes. Here’s what worked and what didn’t.

Option 1: Rename Test Files (Simplest)

Change domain-named test files:

# Before
www.example.com
# After
www.example.com.txt
www.example.com.fixture
test-www.example.com

Pros: No system changes, works everywhere

Cons: Less realistic test data naming, requires code changes

Option 2: Modify PATHEXT (Don’t Do This)

Remove .com from PATHEXT:

Terminal window
set PATHEXT=.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC

Pros: Windows stops treating .com as executable

Cons:

  • Breaks DOS application compatibility
  • Requires system restart
  • Can cause unexpected behavior in legacy tools
  • May revert after Windows updates

I don’t recommend this.

Option 3: Registry Tweak (Risky)

Modify the file association:

[HKEY_CLASSES_ROOT\.com]
"PerceivedType"="text"
"Content Type"="text/plain"

Pros: All .com files show as text

Cons:

  • Affects system .com files (like debug.com)
  • Windows updates may revert changes
  • Requires admin privileges
  • Can break legacy tools

Also don’t recommend this.

Work around the limitation:

  1. Document in README:
Test fixtures use domain names for realism.
On Windows, these appear as "MS-DOS Application"
due to legacy DOS file handling.
Files are plain text and safe.
  1. Add .gitignore comments:
# Test fixtures - domain names for realistic robots.txt testing
# Windows shows these as MS-DOS Application (legacy DOS behavior)
protego/tests/www.example.com
  1. Educate your team:
  • Explain the .com extension history
  • Show how Windows determines file types
  • Document why test files use domain names

This is the safest approach.

The Root Cause: Backwards Compatibility

Why doesn’t Microsoft fix this?

Backwards compatibility.

Windows still supports:

  • MS-DOS applications (via NTVDM)
  • 16-bit Windows executables
  • Legacy batch scripts with .com assumptions
  • 40-year-old file extension behaviors

Removing .com executable support would break:

  • Thousands of legacy applications
  • Corporate tools still in use
  • Scripts that rely on PATHEXT order
  • Institutional dependencies on DOS tools

Microsoft prioritizes compatibility over fixing edge cases. The .com extension will remain executable for the foreseeable future.

What I Learned

The “Porn in Conda” incident taught me:

  1. File extensions matter more than content on Windows. The OS doesn’t validate file format, only extension.

  2. Legacy systems run deep. A 1974 CP/M decision still affects Windows 11 in 2024.

  3. Developer choices collide with history. Using domain names as test files makes sense, but conflicts with 40-year-old OS behavior.

  4. Work around, don’t fight. Document the quirk, explain it to your team, move on.

When you see a .com file showing as “MS-DOS Application” in Windows Explorer, remember: you’re seeing history, not malware. Probably.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments