How Does Zuban Achieve Zero False Negatives in Python Type Checking?
I was debugging a production incident last month. The type checker had passed our code, but we still got a TypeError at runtime. After hours of investigation, I discovered the culprit: a false negative from our type checker.
That’s when I started looking at Zuban, a Python type checker that claims zero false negatives. Here’s what I learned about why this matters and how it works.
What’s the Problem with False Negatives?
False negatives occur when a type checker fails to report an error that should be caught according to the typing specification. Let me show you a real example:
from typing import overload
@overloaddef process(x: int) -> str: ...@overloaddef process(x: str) -> int: ...
def process(x: int | str) -> int | str: return x # Bug: returns wrong type for each overloadSome type checkers might not flag this implementation. The overloads promise that:
process(5)returnsstrprocess("hello")returnsint
But the implementation returns whatever you pass in. A false negative here means this bug slips through silently.
Why This Is More Dangerous Than False Positives
I used to get annoyed by false positives - type errors that weren’t really errors. But here’s the thing: false positives are just annoying. False negatives are dangerous.
┌─────────────────┬──────────────────┬─────────────────────┐│ Error Type │ Immediate Impact │ Long-term Impact │├─────────────────┼──────────────────┼─────────────────────┤│ False Positive │ Annoyance │ Add # type: ignore ││ │ Friction │ Developer learns ││ │ │ why it's flagged │├─────────────────┼──────────────────┼─────────────────────┤│ False Negative │ Silence │ Production bug ││ │ False confidence │ Customer impact ││ │ │ Debugging nightmare │└─────────────────┴──────────────────┴─────────────────────┘The silence of false negatives is what makes them deadly. You think your code is safe. It isn’t.
The Benchmark Data
I found recent benchmark data from the Python Typing Specification Test Suite (March 2026):
┌────────────┬───────────────┬───────────┬─────────────────┬─────────────────┐│ Checker │ Fully Passing │ Pass Rate │ False Positives │ False Negatives │├────────────┼───────────────┼───────────┼─────────────────┼─────────────────┤│ pyright │ 136/139 │ 97.8% │ 15 │ 4 ││ zuban │ 134/139 │ 96.4% │ 10 │ 0 │└────────────┴───────────────┴───────────┴─────────────────┴─────────────────┘The numbers surprised me:
- Pyright has a slightly higher pass rate (97.8% vs 96.4%)
- But Pyright has 4 false negatives, Zuban has 0
- Zuban actually has fewer false positives too (10 vs 15)
This doesn’t match the usual trade-off narrative. Usually, strict checkers have more false positives. Zuban is both strict AND more precise.
How Zuban Achieves Zero False Negatives
While the specific implementation details aren’t publicly documented, achieving zero false negatives typically requires a fundamentally different approach.
Conservative Type Inference
When the type is ambiguous, Zuban takes the stricter path:
from typing import TypeVar, Generic
T = TypeVar('T')
class Box(Generic[T]): def __init__(self, value: T) -> None: self.value = value
def get(self) -> T: return self.value
def process(box: Box[int] | Box[str]) -> None: value = box.get() # Some checkers might incorrectly narrow this
if isinstance(value, int): # Zuban guarantees value is int here result = value + 10 else: # Zuban guarantees value is str here result = value.upper()Complete Rule Coverage
Zuban implements all Python typing specification requirements without shortcuts. Let me show you a case where this matters:
from typing import Protocol
class Drawable(Protocol): def draw(self) -> None: ...
class Circle: def draw(self) -> None: print("Drawing circle")
def render(shape: Drawable) -> None: shape.draw()
# Zuban catches this - some checkers don'tdef process_invalid(): render("not a drawable") # TypeError waiting to happenA type checker with false negatives might pass this code. At runtime, you’d get:
AttributeError: 'str' object has no attribute 'draw'When This Matters Most
Not every project needs this level of strictness. But for certain use cases, zero false negatives is worth the trade-off.
Production Systems
def calculate_discount(price: int) -> float: return price * 0.1
def apply_discount(item: dict) -> None: price = item.get("price", 0) # Type: int | None
# A false negative checker might not flag this: discount = calculate_discount(price) # Bug: price could be None if "price" key exists but is NoneIf your type checker has false negatives, it might not warn about price being potentially None. In production, you’d get a TypeError.
High-Stakes Applications
- Financial systems: A type error could mean incorrect transactions
- Healthcare software: Bugs could affect patient care
- Security-sensitive code: Type confusion vulnerabilities
For these, the cost of a production bug far exceeds the cost of fixing a few extra type annotations.
My Trial-and-Error Process
I tested Zuban on an existing codebase. Here’s what happened:
# Install Zubanpip install zuban
# Check your projectzuban check src/src/handlers/payment.py:45: error: Argument 1 has type "int | None" but expected "int"src/models/order.py:23: error: Incompatible return typesrc/utils/validators.py:12: error: "str" has no attribute "parse"...At first, I had more errors than with Pyright. But here’s the key insight: every single one was a genuine type safety issue.
With Pyright, I had 15 false positives to suppress. With Zuban, I had 0 false negatives to worry about. The code that passed Zuban was genuinely type-safe.
Common Mistakes I’ve Seen
Mistake 1: Prioritizing Pass Rate Over Correctness
A higher pass rate sounds better, but if it includes false negatives, you’re getting false confidence.
Higher Pass Rate ≠ Better Type Safety
A checker with:- 99% pass rate- 10 false negatives
Is less safe than:- 96% pass rate- 0 false negatives
The 3% difference are errors you NEED to fix.Mistake 2: Ignoring False Negative Rates
Teams often focus on developer experience (fewer false positives = happier developers). But they overlook the correctness dimension.
Mistake 3: Assuming All Type Checkers Are Equivalent
They’re not. Different implementations make different trade-offs:
┌─────────────┬────────────────┬─────────────────┐│ Checker │ Priority │ Trade-off │├─────────────┼────────────────┼─────────────────┤│ mypy │ Compatibility │ 76 false negs ││ pyright │ Speed + compat │ 4 false negs ││ zuban │ Correctness │ 0 false negs │└─────────────┴────────────────┴─────────────────┘The Limitation
I couldn’t find detailed technical documentation about Zuban’s specific implementation. The evidence is primarily from benchmark results and community insights. If you know more about how it achieves this, I’d love to learn.
When to Choose Zuban
Based on my experience, Zuban is ideal for:
- New projects where you can start strict
- Critical systems where type safety is non-negotiable
- Teams that value correctness over convenience
- Projects with comprehensive test coverage (so type errors are caught in CI)
It might not be ideal for:
- Legacy codebases with many type issues
- Teams that prioritize minimal friction
- Projects where rapid iteration matters more than correctness
Key Takeaways
- False negatives are more dangerous than false positives - They let bugs slip through silently
- Zero false negatives means trust - When Zuban passes, your code is type-safe
- The trade-off isn’t what you’d expect - Zuban has fewer false positives than Pyright too
- Know your priorities - Correctness vs convenience is a real choice
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments