Why Does mypy Have Low Typing Spec Conformance Despite High Adoption?
I ran mypy on my Python project and assumed it was catching all type errors according to the official Python typing specification. Then I discovered mypy only has 58.3% conformance to the typing spec—passing just 81 out of 139 tests in the official conformance suite.
This seemed alarming. How could the most popular Python type checker miss so much?
The Problem
I was reviewing type checker options for a new project and found these statistics:
Passed: 81/139 tests (58.3%)False positives: 231 (reporting errors that shouldn't exist)False negatives: 76 (missing actual errors)This raised an immediate question: Should I be worried? Is mypy unreliable?
The numbers suggest mypy is barely passing half the official typing specification tests. Meanwhile, pyright scores significantly higher on conformance.
But then I saw this comment from a developer:
“Most of my projects are using Django, Pydantic and can be only used with mypy”
This highlights a tension: mypy has the best framework support, but lower spec conformance. Which matters more?
Why Mypy Has Lower Conformance
The key insight is historical: mypy predates the formal typing specification.
2012: Mypy created (before typing spec existed)2014: PEP 484 proposed (mypy helped define it)2015: typing module added to Python 3.52020s: Typing spec continues evolvingMypy wasn’t built to conform to a spec—it helped create Python’s type system. This creates two consequences:
- Legacy behavior: Mypy maintains backward compatibility with codebases that adopted it early
- Spec divergence: As the formal spec evolved, mypy’s original behavior sometimes differs
The Adoption Reality
Here’s what matters more than conformance scores:
┌─────────────────────────────────────────────────────┐│ Framework/Tool │ Mypy Support │ Pyright Support │├─────────────────────────────────────────────────────┤│ Django │ Plugin │ Limited ││ Pydantic │ Plugin │ Basic ││ SQLAlchemy │ Plugin │ Basic ││ Dataclasses │ Full │ Full ││ Third-party stubs │ Extensive │ Growing │└─────────────────────────────────────────────────────┘When your Django project needs type checking, mypy’s plugin ecosystem outweighs theoretical conformance scores.
Practical Implications
False Positives: The Noise Problem
Mypy’s 231 false positives mean it reports errors that shouldn’t exist. I encountered this with generic edge cases:
from typing import TypeVar, Generic
T = TypeVar('T')
class Container(Generic[T]): def __init__(self, value: T) -> None: self.value = value
def get(self) -> T: return self.value
def process(container: Container[int]) -> int: # Mypy might flag issues here that are valid per spec return container.get() + 1When this happens, I have two options:
# Option 1: Suppress known false positivesresult = some_edge_case() # type: ignore[arg-type]
# Option 2: Configure mypy to be less strict# In mypy.ini or pyproject.toml:[tool.mypy]strict = falsewarn_return_any = falseignore_missing_imports = trueFalse Negatives: The Missed Bugs
Mypy’s 76 false negatives mean it fails to catch real errors. This is arguably worse than false positives.
from typing import Any
def risky_function(data: Any) -> str: # Mypy might not flag this type violation return data.nonexistent_method() # Should error but might not
# Correct approach: use explicit typingdef safer_function(data: dict[str, int]) -> str: return str(data.get("key", 0))The solution here is using multiple type checkers:
# Run mypy for ecosystem compatibilitymypy src/ --python-version=3.11
# Run pyright for spec conformancepyright src/
# CI pipeline can run bothWhen to Choose Which Checker
I’ve developed a decision framework:
┌────────────────────────────────────────────────────────┐│ Your Situation │ Recommended Tool │├────────────────────────────────────────────────────────┤│ Django project │ mypy + django-stubs││ Pydantic-heavy codebase │ mypy + pydantic ││ Pure Python, no frameworks │ pyright or mypy ││ Need strict spec conformance │ pyright ││ Existing mypy codebase │ stay with mypy ││ Starting fresh, no dependencies │ pyright ││ Both available │ run both in CI │└────────────────────────────────────────────────────────┘Configuration for Running Both
{ "typeCheckingMode": "strict", "pythonVersion": "3.11"}name: Type Check
on: [push, pull_request]
jobs: typecheck: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: python-version: '3.11'
- name: Install mypy run: pip install mypy
- name: Run mypy run: mypy src/
- name: Install pyright run: npm install -g pyright
- name: Run pyright run: pyright src/Common Mistakes I’ve Made
Mistake 1: Choosing Based Only on Conformance
I initially thought higher conformance = better tool. This ignores:
- Framework plugin support
- Existing team expertise
- CI/CD integration complexity
- Community resources and stubs availability
Mistake 2: Ignoring False Positives
231 false positives create noise. Without a strategy, developers start ignoring all type errors:
# Don't do this - too broad# type: ignore
# Do this instead - specific suppressionresult = problematic_call() # type: ignore[arg-type]Mistake 3: Assuming Spec = Truth
The typing specification is still evolving. Mypy’s behavior often influences future spec versions:
Real-world usage → Tool behavior → Spec updates ↑ ↓ └──────── Formalizes what works ←────┘Pragmatic behavior sometimes precedes formal specification.
Mistake 4: Binary Choice Between Checkers
Many successful projects use both:
# mypy catches Django-specific issuesmypy src/ --plugins=mypy_django_plugin.main
# pyright catches spec-conformance issuespyright src/
# Different tools catch different problemsWhy This Matters for Real Projects
The conformance debate raises a philosophical question:
“If 50%+ of the ecosystem fixes code according to mypy’s behavior, does conformance in theory matter?”
Practical considerations:
1. Working with existing stack (Django, Pydantic, etc.)2. Catching actual bugs in your codebase3. Developer experience and CI noise4. Spec conformance (important but not primary)What I Do Now
For new projects:
- Check framework requirements first - If using Django or Pydantic, mypy is often the only viable option
- Start with strict mode - Enable
strict = truein mypy ortypeCheckingMode: strictin pyright - Suppress specific errors, not categories - Use targeted
# type: ignore[specific-error] - Consider both checkers - Run mypy for ecosystem support, pyright for conformance
- Monitor updates - Both tools actively improve conformance
[tool.mypy]python_version = "3.11"strict = truewarn_return_any = truewarn_unused_configs = truedisallow_untyped_defs = trueplugins = ["mypy_django_plugin.main", "pydantic.mypy"]
[[tool.mypy.overrides]]module = "third_party.*"ignore_missing_imports = trueRelated Knowledge
- PEP 484: The original type hints specification that mypy helped shape
- PEP 560: Core support for typing module and generic types
- PEP 612: Parameter specification variables (improved by mypy feedback)
- Typeshed: Shared type stub repository that benefits both checkers
Summary
Mypy’s 58.3% conformance reflects its role as Python’s pioneering type checker, not a failure. It shaped the spec before the spec existed.
The real question isn’t “which checker has higher conformance?” but “which checker works for my project’s constraints?”
For Django or Pydantic projects, mypy’s ecosystem support outweighs conformance scores. For spec-critical code, pyright complements mypy. The best approach often uses both.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments