Why Does mypy Have Low Typing Spec Conformance Despite High Adoption?

Mar 17, 2026

I ran mypy on my Python project and assumed it was catching all type errors according to the official Python typing specification. Then I discovered mypy only has 58.3% conformance to the typing spec—passing just 81 out of 139 tests in the official conformance suite.

This seemed alarming. How could the most popular Python type checker miss so much?

The Problem

I was reviewing type checker options for a new project and found these statistics:

Passed:   81/139 tests (58.3%)
False positives:  231 (reporting errors that shouldn't exist)
False negatives:   76 (missing actual errors)

This raised an immediate question: Should I be worried? Is mypy unreliable?

The numbers suggest mypy is barely passing half the official typing specification tests. Meanwhile, pyright scores significantly higher on conformance.

But then I saw this comment from a developer:

“Most of my projects are using Django, Pydantic and can be only used with mypy”

This highlights a tension: mypy has the best framework support, but lower spec conformance. Which matters more?

Why Mypy Has Lower Conformance

The key insight is historical: mypy predates the formal typing specification.

2012: Mypy created (before typing spec existed)
2014: PEP 484 proposed (mypy helped define it)
2015: typing module added to Python 3.5
2020s: Typing spec continues evolving

Mypy wasn’t built to conform to a spec—it helped create Python’s type system. This creates two consequences:

Legacy behavior: Mypy maintains backward compatibility with codebases that adopted it early
Spec divergence: As the formal spec evolved, mypy’s original behavior sometimes differs

The Adoption Reality

Here’s what matters more than conformance scores:

┌─────────────────────────────────────────────────────┐
│ Framework/Tool    │ Mypy Support │ Pyright Support │
├─────────────────────────────────────────────────────┤
│ Django            │ Plugin       │ Limited        │
│ Pydantic          │ Plugin       │ Basic          │
│ SQLAlchemy        │ Plugin       │ Basic          │
│ Dataclasses       │ Full         │ Full           │
│ Third-party stubs │ Extensive    │ Growing        │
└─────────────────────────────────────────────────────┘

When your Django project needs type checking, mypy’s plugin ecosystem outweighs theoretical conformance scores.

Practical Implications

False Positives: The Noise Problem

Mypy’s 231 false positives mean it reports errors that shouldn’t exist. I encountered this with generic edge cases:

from typing import TypeVar, Generic

T = TypeVar('T')

class Container(Generic[T]):
    def __init__(self, value: T) -> None:
        self.value = value

    def get(self) -> T:
        return self.value

def process(container: Container[int]) -> int:
    # Mypy might flag issues here that are valid per spec
    return container.get() + 1

When this happens, I have two options:

# Option 1: Suppress known false positives
result = some_edge_case()  # type: ignore[arg-type]

# Option 2: Configure mypy to be less strict
# In mypy.ini or pyproject.toml:

[tool.mypy]
strict = false
warn_return_any = false
ignore_missing_imports = true

False Negatives: The Missed Bugs

Mypy’s 76 false negatives mean it fails to catch real errors. This is arguably worse than false positives.

from typing import Any

def risky_function(data: Any) -> str:
    # Mypy might not flag this type violation
    return data.nonexistent_method()  # Should error but might not

# Correct approach: use explicit typing
def safer_function(data: dict[str, int]) -> str:
    return str(data.get("key", 0))

The solution here is using multiple type checkers:

# Run mypy for ecosystem compatibility
mypy src/ --python-version=3.11

# Run pyright for spec conformance
pyright src/

# CI pipeline can run both

When to Choose Which Checker

I’ve developed a decision framework:

┌────────────────────────────────────────────────────────┐
│ Your Situation                    │ Recommended Tool   │
├────────────────────────────────────────────────────────┤
│ Django project                    │ mypy + django-stubs│
│ Pydantic-heavy codebase           │ mypy + pydantic    │
│ Pure Python, no frameworks        │ pyright or mypy   │
│ Need strict spec conformance      │ pyright            │
│ Existing mypy codebase            │ stay with mypy     │
│ Starting fresh, no dependencies   │ pyright            │
│ Both available                    │ run both in CI     │
└────────────────────────────────────────────────────────┘

Configuration for Running Both

{
  "typeCheckingMode": "strict",
  "pythonVersion": "3.11"
}

name: Type Check

on: [push, pull_request]

jobs:
  typecheck:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install mypy
        run: pip install mypy

      - name: Run mypy
        run: mypy src/

      - name: Install pyright
        run: npm install -g pyright

      - name: Run pyright
        run: pyright src/

Common Mistakes I’ve Made

Mistake 1: Choosing Based Only on Conformance

I initially thought higher conformance = better tool. This ignores:

Framework plugin support
Existing team expertise
CI/CD integration complexity
Community resources and stubs availability

Mistake 2: Ignoring False Positives

231 false positives create noise. Without a strategy, developers start ignoring all type errors:

# Don't do this - too broad
# type: ignore

# Do this instead - specific suppression
result = problematic_call()  # type: ignore[arg-type]

Mistake 3: Assuming Spec = Truth

The typing specification is still evolving. Mypy’s behavior often influences future spec versions:

Real-world usage → Tool behavior → Spec updates
      ↑                                    ↓
      └──────── Formalizes what works ←────┘

Pragmatic behavior sometimes precedes formal specification.

Mistake 4: Binary Choice Between Checkers

Many successful projects use both:

# mypy catches Django-specific issues
mypy src/ --plugins=mypy_django_plugin.main

# pyright catches spec-conformance issues
pyright src/

# Different tools catch different problems

Why This Matters for Real Projects

The conformance debate raises a philosophical question:

“If 50%+ of the ecosystem fixes code according to mypy’s behavior, does conformance in theory matter?”

Practical considerations:

1. Working with existing stack (Django, Pydantic, etc.)
2. Catching actual bugs in your codebase
3. Developer experience and CI noise
4. Spec conformance (important but not primary)

What I Do Now

For new projects:

Check framework requirements first - If using Django or Pydantic, mypy is often the only viable option
Start with strict mode - Enable strict = true in mypy or typeCheckingMode: strict in pyright
Suppress specific errors, not categories - Use targeted # type: ignore[specific-error]
Consider both checkers - Run mypy for ecosystem support, pyright for conformance
Monitor updates - Both tools actively improve conformance

[tool.mypy]
python_version = "3.11"
strict = true
warn_return_any = true
warn_unused_configs = true
disallow_untyped_defs = true
plugins = ["mypy_django_plugin.main", "pydantic.mypy"]

[[tool.mypy.overrides]]
module = "third_party.*"
ignore_missing_imports = true

PEP 484: The original type hints specification that mypy helped shape
PEP 560: Core support for typing module and generic types
PEP 612: Parameter specification variables (improved by mypy feedback)
Typeshed: Shared type stub repository that benefits both checkers

Summary

Mypy’s 58.3% conformance reflects its role as Python’s pioneering type checker, not a failure. It shaped the spec before the spec existed.

The real question isn’t “which checker has higher conformance?” but “which checker works for my project’s constraints?”

For Django or Pydantic projects, mypy’s ecosystem support outweighs conformance scores. For spec-critical code, pyright complements mypy. The best approach often uses both.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Pyrefly Blog: Typing Conformance Comparison

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!