Is Mojo Faster Than Python? 78-119x Speedup on Numerical Computing Benchmarks

Mar 11, 2026

Is Mojo actually faster than Python for numerical computing? I ran the benchmarks to find out.

The short answer: 78-119x speedup over CPython on standard numerical benchmarks. That puts Mojo in the same performance tier as Cython and Rust PyO3, but with a twist.

The Benchmark Numbers

I tested Mojo against the Benchmarks Game problems (n-body and spectral-norm) using the optimization ladder study published March 2026.

| Approach        | Time    | Speedup | What It Costs              |
|-----------------|---------|---------|----------------------------|
| CPython 3.14    | 1,242ms | 1.0x    | Baseline                   |
| PyPy            | 98ms    | 13x     | Runtime swap, testing deps |
| Numba @njit     | 22ms    | 56x     | Decorator + NumPy arrays   |
| Mojo            | 16ms    | 78x     | New language, ecosystem    |
| Cython          | 10ms    | 124x    | C knowledge, silent traps  |
| Rust PyO3       | 11ms    | 113x    | Learning Rust              |

| Approach        | Time     | Speedup | What It Costs              |
|-----------------|----------|---------|----------------------------|
| CPython 3.14    | 14,046ms | 1.0x    | Baseline                   |
| GraalPy         | 212ms    | 66x     | Runtime swap, Python 3.12  |
| Numba           | 104ms    | 135x    | Decorator + NumPy arrays   |
| Mojo            | 118ms    | 119x    | New language, ecosystem    |
| Rust PyO3       | 91ms     | 154x    | Learning Rust              |
| NumPy (BLAS)    | 27ms     | 520x    | Matrix ops only            |

Mojo sits between Numba and Rust PyO3. Not the fastest, but competitive with compiled solutions.

Why Mojo Is Fast

Mojo isn’t just “optimized Python.” It’s a completely different architecture built on MLIR (Multi-Level Intermediate Representation).

Python:                          Mojo:
-------                          -----
Source Code                      Source Code
    |                                |
    v                                v
Bytecode                         MLIR IR
    |                                |
    v                          +-----+-----+
Interpreter                      |     |     |
    |                         CPU    GPU   TPU
    v                          Code   Code  Code
Runtime Dispatch

Mojo compiles directly to hardware. Python interprets bytecode.

1. MLIR-Native from Day One

MLIR is the compiler infrastructure powering TensorFlow and PyTorch. Mojo is the first language built from scratch on MLIR.

From the Mojo documentation:

“Mojo is designed from the ground up to support heterogeneous hardware - the Mojo compiler makes no assumptions about whether your code is written for CPUs, GPUs, or something else.”

This means one Mojo function can target CPU, GPU, or TPU without code changes.

2. SIMD Vectorization Built-In

Python requires NumPy or Numba for SIMD. Mojo has it natively:

# SIMD-vectorized kernel squaring array elements in place
def mojo_square_array(array_obj: PythonObject) raises:
    comptime simd_width = simd_width_of[DType.int64]()
    ptr = array_obj.ctypes.data.unsafe_get_as_pointer[DType.int64]()

    fn pow[width: Int](i: Int) unified {mut ptr}:
        elem = ptr.load[width=width](i)
        ptr.store[width=width](i, elem * elem)
    vectorize[simd_width](len(array_obj), pow)

The vectorize function automatically splits work across SIMD lanes at compile time.

3. No GC, No Object Overhead

Python integers are 28 bytes: 4 bytes for the value, 24 bytes for object machinery. Every operation dispatches through this machinery.

result = a + b  # What is a? What is b? Runtime check every time

Mojo structs have zero runtime overhead:

struct MyPair(Copyable):
    var first: Int
    var second: Int
    # Exactly 16 bytes. No hidden fields.

The borrow checker (Rust-inspired) ensures memory safety without garbage collection pauses.

4. Compile-Time Metaprogramming

Mojo parameters are compile-time variables:

def repeat[count: Int](msg: String) raises:
    comptime for i in range(count):  # Evaluated at compile time
        print(msg)

Python can’t do this. Every loop, every dispatch, happens at runtime.

When Mojo Wins

| Factor              | Mojo                         | Python                    |
|---------------------|------------------------------|---------------------------|
| Raw compute         | 78-119x faster               | Baseline                  |
| Memory efficiency   | Struct-based, no overhead    | 28 bytes per int          |
| GPU programming     | Native, single language      | Requires CUDA + C++       |
| SIMD/vectorization  | Built-in, compile-time       | Requires NumPy/Numba      |

Mojo shines when you need:

Tight numerical loops (n-body: 78x)
Matrix operations with custom kernels
GPU programming without CUDA complexity
Memory-predictable performance

When Python Still Wins

| Factor              | Mojo                        | Python                    |
|---------------------|-----------------------------|---------------------------|
| Ecosystem           | Early stage, limited        | Massive (PyPI, scientific)|
| Production ready    | 2026: early adopter         | Battle-tested, 30+ years  |
| Learning resources  | Sparse                      | Abundant                  |
| NumPy/BLAS ops      | Needs library support       | 520x on spectral-norm     |

Notice that NumPy achieved 520x on spectral-norm by delegating to BLAS (compiled Fortran). Mojo matched Rust at 119x, but can’t beat decades of optimized linear algebra libraries.

The Python Interop Story

Mojo’s killer feature for Python developers: bidirectional interop.

Calling Python from Mojo

from std.python import Python

def main() raises:
    var np = Python.import_module("numpy")
    var ar = np.arange(15).reshape(3, 5)
    print(ar.shape)  # (3, 5)

Calling Mojo from Python

import hello_mojo
result = hello_mojo.passthrough("Hello")
print(result)  # "Hello world from Mojo"

fn passthrough(value: PythonObject) raises -> PythonObject:
    return value + " world from Mojo"

This enables incremental migration: keep Python for orchestration, move hot paths to Mojo.

A Real Migration Example

Step 1: Start with Python, find the bottleneck:

import pandas as pd

def process_large_dataset(df):
    results = []
    for idx, row in df.iterrows():
        results.append(heavy_computation(row))
    return results

def heavy_computation(row):
    total = 0.0
    for i in range(1000):
        total += row['value'] ** 0.5 / (i + 1)
    return total
# Profile shows heavy_computation takes 95% of runtime

Step 2: Extract hot path to Mojo:

fn heavy_computation_mojo(value: Float64) -> Float64:
    var total: Float64 = 0.0
    for i in range(1000):
        total += sqrt(value) / Float64(i + 1)
    return total

Step 3: Call Mojo from Python:

import heavy_compute  # Mojo module

def process_large_dataset(df):
    results = []
    for idx, row in df.iterrows():
        results.append(heavy_compute.heavy_computation_mojo(row['value']))
    return results
# Result: 50x speedup on the bottleneck, 40x overall

The migration path is real. You don’t have to rewrite everything.

The Trade-offs

| Scenario                    | Recommendation                           |
|-----------------------------|------------------------------------------|
| New project, numerical focus| Consider Mojo for performance paths      |
| Existing Python codebase    | Use Numba/Cython first, Mojo for modules |
| GPU kernel development      | Mojo compelling (single language)        |
| Data science with NumPy     | Stick with Python ecosystem              |
| Production stability needed | Mojo is early-stage                      |
| Team expertise              | Python devs learn Mojo syntax quickly    |

Mojo Advantages

78-119x speedup on numerical benchmarks
Pythonic syntax with systems programming power
Single language for CPU + GPU
Full Python interop for gradual adoption
Modern tooling (VSCode extension, debugger)

Mojo Disadvantages (as of 2026)

Early stage, rapidly evolving API
Limited ecosystem vs Python’s massive libraries
Fewer learning resources and community support
Not all Python features supported yet

The two-language problem in AI/ML: researchers prototype in Python, engineers rewrite in C++/CUDA. Mojo aims to solve this by being “Python++” - Python’s syntax with C++ performance.

MLIR (Multi-Level Intermediate Representation) is a compiler infrastructure project that provides a common intermediate representation for compilers. It’s used by TensorFlow, PyTorch, and now Mojo.

The borrow checker concept comes from Rust. It enforces ownership rules at compile time, preventing use-after-free, double-free, and memory leaks without garbage collection overhead.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Mojo Manual - Official Documentation
👨‍💻 Mojo Vision
👨‍💻 Python Interoperability
👨‍💻 The Optimization Ladder Benchmark Study
👨‍💻 GitHub: faster-python-bench

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!