What Is the Difference Between False Positives and False Negatives in Python Type Checkers?

Mar 17, 2026

Problem

I spent a week debugging a production crash that mypy said was impossible. The type checker had approved my code - no errors, no warnings. But at runtime, an IndexError brought down our API.

This is the false negative problem, and I learned it’s far more dangerous than the annoying false positives I’d been dealing with.

What False Positives Look Like

False positives are the noisy errors that make you want to disable the type checker:

from typing import Union

def process(value: Union[int, str]) -> str:
    if isinstance(value, int):
        # Some type checkers complain here:
        # "int has no attribute .upper()"
        # But this is actually safe - we convert to str first
        return str(value).upper()
    return value.upper()

When I ran this through mypy with strict settings, it flagged the first branch incorrectly. The code was correct - I was converting the integer to a string before calling .upper(). This is a false positive: the checker reports an error where none exists.

Annoying? Yes. But I could add a type assertion and move on:

from typing import Union, cast

def process(value: Union[int, str]) -> str:
    if isinstance(value, int):
        result = str(value).upper()  # Works fine
        return result
    return value.upper()

The noise was frustrating, but at least I knew there was a potential issue to investigate.

What False Negatives Look Like

False negatives are silent killers. The type checker approves code that will crash:

from typing import List

def get_first(items: List[int]) -> int:
    # Type checker says: "Looks good! Returns int as expected"
    # Reality: Will crash with IndexError on empty list
    return items[0]

# This passes type checking but crashes at runtime
result = get_first([])  # IndexError: list index out of range

I had code like this in production. Mypy saw no problems. My tests passed (because they used non-empty lists). But when an empty list came through in production, everything crashed.

Another example that slipped past my type checker:

from typing import Dict

def unsafe_access(data: Dict[str, int]) -> int:
    # Type checker assumes the key exists
    # No error reported, but KeyError possible
    return data["count"]

# This type-checks perfectly
result = unsafe_access({"other": 1})  # KeyError: 'count'

No warnings. No errors. Just a runtime crash waiting to happen.

Why False Negatives Are More Dangerous

I compared the tradeoff data from the Python typing specification conformance tests:

Type Checker False Positive vs False Negative Tradeoffs
==========================================================

| Checker  | False Positives | False Negatives |
|----------|-----------------|-----------------|
| zuban    | 10              | 0               |
| pyright  | 15              | 4               |
| pyrefly  | 52              | 21              |
| mypy     | 231             | 76              |
| ty       | 159             | 211             |

The insight from experienced developers was clear:

“The zero false negatives from Zuban is really impressive. In my experience, false negatives are way more dangerous than false positives in a type checker since they silently let bugs through.”

Here’s why false negatives are worse:

1. Silent Failures

False positives scream at you. False negatives stay quiet:

# False positive: Annoying but visible
x: int = "string"  # Type checker: ERROR

# False negative: Silent and deadly
def risky_operation(data: dict) -> int:
    return data["key"]  # Type checker: OK (but KeyError at runtime)

2. False Confidence

When my type checker passes, I trust the code. I skip manual testing. I deploy to production. Then the crash happens.

3. Debugging Difficulty

With false positives, I know there’s an issue to investigate. With false negatives, I only discover the bug when users report crashes.

The Tradeoff in Practice

Type checkers must balance between two extremes:

                    Type Checker Spectrum
==========================================================

STRICT                                    LENIENT
   |                                         |
   v                                         v
More False Positives         More False Negatives
   |                                         |
   v                                         v
Annoying but safe            Quiet but dangerous
   |                                         |
   v                                         v
Catch everything             Miss real bugs
   |                                         |
   v                                         v
Add type annotations         Trust and pray

I used to prefer lenient checkers because they were quieter. Now I understand: the noise of false positives is the price of safety.

How I Configure My Checkers Now

I’ve changed my approach to minimize false negatives:

{
  "typeCheckingMode": "strict",
  "reportUnnecessaryTypeIgnoreComment": true,
  "reportMissingImports": true,
  "reportMissingTypeStubs": false
}

The strict mode produces more false positives, but it catches more real bugs.

For mypy, I enabled stricter settings:

[mypy]
strict = True
warn_return_any = True
warn_unused_ignores = True
disallow_untyped_defs = True

Yes, I get more warnings now. But those warnings represent real potential issues, not just noise.

Common Mistakes I Made

1. Using `# type: ignore` Too Much

# Before: Silencing all errors
result = process(data)  # type: ignore
items = transform(result)  # type: ignore
final = combine(items)  # type: ignore

# This creates hidden false negatives!
# One of these might actually be a real error

Now I only use # type: ignore when I’ve verified the code is correct and the checker is wrong:

# After: Only ignore specific, verified false positives
result = process(data)  # type: ignore[arg-type]  # Union type not inferred correctly

2. Choosing Based on Popularity

I used mypy because it was popular. But popularity doesn’t mean accuracy:

GitHub Stars vs False Negatives
===================================

mypy:    18k stars, 76 false negatives
pyright: 12k stars, 4 false negatives
zuban:   0.5k stars, 0 false negatives

The newer, less popular checkers are actually more accurate.

3. Not Running Multiple Checkers

I now run both pyright and mypy in CI:

# .github/workflows/typecheck.yml
jobs:
  typecheck:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install dependencies
        run: pip install pyright mypy
      - name: Run pyright
        run: pyright --strict src/
      - name: Run mypy
        run: mypy --strict src/

Different checkers catch different issues. The overlap provides defense in depth.

What I Check For Now

When evaluating a type checker, I look at:

False negative rate - Most important. Zero is ideal.
False positive rate - Important for developer experience.
Spec conformance - Does it follow the Python typing spec?
Maintenance activity - Is it actively developed?

Based on the conformance data, pyright and zuban lead with only 4 and 0 false negatives respectively.

Summary

False negatives in Python type checkers are significantly more dangerous than false positives. They give a false sense of security while letting real bugs slip through to production.

After my production crash, I changed my approach:

Prioritize checkers with low false negative rates (pyright, zuban)
Accept more false positives as the cost of safety
Run multiple type checkers in CI
Avoid # type: ignore unless I’ve verified the code is correct

The noise of false positives is annoying. But silence from false negatives is deadly.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Pyrefly Blog: Typing Conformance Comparison

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!