Skip to content

Why Does mypy Have Low Typing Spec Conformance Despite High Adoption?

I ran mypy on my Python project and assumed it was catching all type errors according to the official Python typing specification. Then I discovered mypy only has 58.3% conformance to the typing spec—passing just 81 out of 139 tests in the official conformance suite.

This seemed alarming. How could the most popular Python type checker miss so much?

The Problem

I was reviewing type checker options for a new project and found these statistics:

Mypy Conformance Results
Passed: 81/139 tests (58.3%)
False positives: 231 (reporting errors that shouldn't exist)
False negatives: 76 (missing actual errors)

This raised an immediate question: Should I be worried? Is mypy unreliable?

The numbers suggest mypy is barely passing half the official typing specification tests. Meanwhile, pyright scores significantly higher on conformance.

But then I saw this comment from a developer:

“Most of my projects are using Django, Pydantic and can be only used with mypy”

This highlights a tension: mypy has the best framework support, but lower spec conformance. Which matters more?

Why Mypy Has Lower Conformance

The key insight is historical: mypy predates the formal typing specification.

Timeline
2012: Mypy created (before typing spec existed)
2014: PEP 484 proposed (mypy helped define it)
2015: typing module added to Python 3.5
2020s: Typing spec continues evolving

Mypy wasn’t built to conform to a spec—it helped create Python’s type system. This creates two consequences:

  1. Legacy behavior: Mypy maintains backward compatibility with codebases that adopted it early
  2. Spec divergence: As the formal spec evolved, mypy’s original behavior sometimes differs

The Adoption Reality

Here’s what matters more than conformance scores:

Ecosystem Dependencies
┌─────────────────────────────────────────────────────┐
│ Framework/Tool │ Mypy Support │ Pyright Support │
├─────────────────────────────────────────────────────┤
│ Django │ Plugin │ Limited │
│ Pydantic │ Plugin │ Basic │
│ SQLAlchemy │ Plugin │ Basic │
│ Dataclasses │ Full │ Full │
│ Third-party stubs │ Extensive │ Growing │
└─────────────────────────────────────────────────────┘

When your Django project needs type checking, mypy’s plugin ecosystem outweighs theoretical conformance scores.

Practical Implications

False Positives: The Noise Problem

Mypy’s 231 false positives mean it reports errors that shouldn’t exist. I encountered this with generic edge cases:

mypy_false_positive.py
from typing import TypeVar, Generic
T = TypeVar('T')
class Container(Generic[T]):
def __init__(self, value: T) -> None:
self.value = value
def get(self) -> T:
return self.value
def process(container: Container[int]) -> int:
# Mypy might flag issues here that are valid per spec
return container.get() + 1

When this happens, I have two options:

handling_false_positives.py
# Option 1: Suppress known false positives
result = some_edge_case() # type: ignore[arg-type]
# Option 2: Configure mypy to be less strict
# In mypy.ini or pyproject.toml:
pyproject.toml (mypy config)
[tool.mypy]
strict = false
warn_return_any = false
ignore_missing_imports = true

False Negatives: The Missed Bugs

Mypy’s 76 false negatives mean it fails to catch real errors. This is arguably worse than false positives.

mypy_false_negative.py
from typing import Any
def risky_function(data: Any) -> str:
# Mypy might not flag this type violation
return data.nonexistent_method() # Should error but might not
# Correct approach: use explicit typing
def safer_function(data: dict[str, int]) -> str:
return str(data.get("key", 0))

The solution here is using multiple type checkers:

running_multiple_checkers.sh
# Run mypy for ecosystem compatibility
mypy src/ --python-version=3.11
# Run pyright for spec conformance
pyright src/
# CI pipeline can run both

When to Choose Which Checker

I’ve developed a decision framework:

Type Checker Selection
┌────────────────────────────────────────────────────────┐
│ Your Situation │ Recommended Tool │
├────────────────────────────────────────────────────────┤
│ Django project │ mypy + django-stubs│
│ Pydantic-heavy codebase │ mypy + pydantic │
│ Pure Python, no frameworks │ pyright or mypy │
│ Need strict spec conformance │ pyright │
│ Existing mypy codebase │ stay with mypy │
│ Starting fresh, no dependencies │ pyright │
│ Both available │ run both in CI │
└────────────────────────────────────────────────────────┘

Configuration for Running Both

pyrightconfig.json
{
"typeCheckingMode": "strict",
"pythonVersion": "3.11"
}
GitHub Actions (both checkers)
name: Type Check
on: [push, pull_request]
jobs:
typecheck:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install mypy
run: pip install mypy
- name: Run mypy
run: mypy src/
- name: Install pyright
run: npm install -g pyright
- name: Run pyright
run: pyright src/

Common Mistakes I’ve Made

Mistake 1: Choosing Based Only on Conformance

I initially thought higher conformance = better tool. This ignores:

  • Framework plugin support
  • Existing team expertise
  • CI/CD integration complexity
  • Community resources and stubs availability

Mistake 2: Ignoring False Positives

231 false positives create noise. Without a strategy, developers start ignoring all type errors:

bad_practice.py
# Don't do this - too broad
# type: ignore
# Do this instead - specific suppression
result = problematic_call() # type: ignore[arg-type]

Mistake 3: Assuming Spec = Truth

The typing specification is still evolving. Mypy’s behavior often influences future spec versions:

Spec Evolution Cycle
Real-world usage → Tool behavior → Spec updates
↑ ↓
└──────── Formalizes what works ←────┘

Pragmatic behavior sometimes precedes formal specification.

Mistake 4: Binary Choice Between Checkers

Many successful projects use both:

dual_checker_workflow.sh
# mypy catches Django-specific issues
mypy src/ --plugins=mypy_django_plugin.main
# pyright catches spec-conformance issues
pyright src/
# Different tools catch different problems

Why This Matters for Real Projects

The conformance debate raises a philosophical question:

“If 50%+ of the ecosystem fixes code according to mypy’s behavior, does conformance in theory matter?”

Practical considerations:

Real-World Priorities
1. Working with existing stack (Django, Pydantic, etc.)
2. Catching actual bugs in your codebase
3. Developer experience and CI noise
4. Spec conformance (important but not primary)

What I Do Now

For new projects:

  1. Check framework requirements first - If using Django or Pydantic, mypy is often the only viable option
  2. Start with strict mode - Enable strict = true in mypy or typeCheckingMode: strict in pyright
  3. Suppress specific errors, not categories - Use targeted # type: ignore[specific-error]
  4. Consider both checkers - Run mypy for ecosystem support, pyright for conformance
  5. Monitor updates - Both tools actively improve conformance
recommended_mypy_config.toml
[tool.mypy]
python_version = "3.11"
strict = true
warn_return_any = true
warn_unused_configs = true
disallow_untyped_defs = true
plugins = ["mypy_django_plugin.main", "pydantic.mypy"]
[[tool.mypy.overrides]]
module = "third_party.*"
ignore_missing_imports = true
  • PEP 484: The original type hints specification that mypy helped shape
  • PEP 560: Core support for typing module and generic types
  • PEP 612: Parameter specification variables (improved by mypy feedback)
  • Typeshed: Shared type stub repository that benefits both checkers

Summary

Mypy’s 58.3% conformance reflects its role as Python’s pioneering type checker, not a failure. It shaped the spec before the spec existed.

The real question isn’t “which checker has higher conformance?” but “which checker works for my project’s constraints?”

For Django or Pydantic projects, mypy’s ecosystem support outweighs conformance scores. For spec-critical code, pyright complements mypy. The best approach often uses both.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments