Skip to content

Is Mojo Faster Than Python? 78-119x Speedup on Numerical Computing Benchmarks

Is Mojo actually faster than Python for numerical computing? I ran the benchmarks to find out.

The short answer: 78-119x speedup over CPython on standard numerical benchmarks. That puts Mojo in the same performance tier as Cython and Rust PyO3, but with a twist.

The Benchmark Numbers

I tested Mojo against the Benchmarks Game problems (n-body and spectral-norm) using the optimization ladder study published March 2026.

N-body Simulation Results (500K iterations)
| Approach | Time | Speedup | What It Costs |
|-----------------|---------|---------|----------------------------|
| CPython 3.14 | 1,242ms | 1.0x | Baseline |
| PyPy | 98ms | 13x | Runtime swap, testing deps |
| Numba @njit | 22ms | 56x | Decorator + NumPy arrays |
| Mojo | 16ms | 78x | New language, ecosystem |
| Cython | 10ms | 124x | C knowledge, silent traps |
| Rust PyO3 | 11ms | 113x | Learning Rust |
Spectral-norm Results (N=2000, matrix operations)
| Approach | Time | Speedup | What It Costs |
|-----------------|----------|---------|----------------------------|
| CPython 3.14 | 14,046ms | 1.0x | Baseline |
| GraalPy | 212ms | 66x | Runtime swap, Python 3.12 |
| Numba | 104ms | 135x | Decorator + NumPy arrays |
| Mojo | 118ms | 119x | New language, ecosystem |
| Rust PyO3 | 91ms | 154x | Learning Rust |
| NumPy (BLAS) | 27ms | 520x | Matrix ops only |

Mojo sits between Numba and Rust PyO3. Not the fastest, but competitive with compiled solutions.

Why Mojo Is Fast

Mojo isn’t just “optimized Python.” It’s a completely different architecture built on MLIR (Multi-Level Intermediate Representation).

Mojo vs Python Architecture
Python: Mojo:
------- -----
Source Code Source Code
| |
v v
Bytecode MLIR IR
| |
v +-----+-----+
Interpreter | | |
| CPU GPU TPU
v Code Code Code
Runtime Dispatch
Mojo compiles directly to hardware. Python interprets bytecode.

1. MLIR-Native from Day One

MLIR is the compiler infrastructure powering TensorFlow and PyTorch. Mojo is the first language built from scratch on MLIR.

From the Mojo documentation:

“Mojo is designed from the ground up to support heterogeneous hardware - the Mojo compiler makes no assumptions about whether your code is written for CPUs, GPUs, or something else.”

This means one Mojo function can target CPU, GPU, or TPU without code changes.

2. SIMD Vectorization Built-In

Python requires NumPy or Numba for SIMD. Mojo has it natively:

simd_vectorization.mojo
# SIMD-vectorized kernel squaring array elements in place
def mojo_square_array(array_obj: PythonObject) raises:
comptime simd_width = simd_width_of[DType.int64]()
ptr = array_obj.ctypes.data.unsafe_get_as_pointer[DType.int64]()
fn pow[width: Int](i: Int) unified {mut ptr}:
elem = ptr.load[width=width](i)
ptr.store[width=width](i, elem * elem)
vectorize[simd_width](len(array_obj), pow)

The vectorize function automatically splits work across SIMD lanes at compile time.

3. No GC, No Object Overhead

Python integers are 28 bytes: 4 bytes for the value, 24 bytes for object machinery. Every operation dispatches through this machinery.

python_dispatch.py
result = a + b # What is a? What is b? Runtime check every time

Mojo structs have zero runtime overhead:

mojo_struct.mojo
struct MyPair(Copyable):
var first: Int
var second: Int
# Exactly 16 bytes. No hidden fields.

The borrow checker (Rust-inspired) ensures memory safety without garbage collection pauses.

4. Compile-Time Metaprogramming

Mojo parameters are compile-time variables:

compile_time.mojo
def repeat[count: Int](msg: String) raises:
comptime for i in range(count): # Evaluated at compile time
print(msg)

Python can’t do this. Every loop, every dispatch, happens at runtime.

When Mojo Wins

Mojo Advantages
| Factor | Mojo | Python |
|---------------------|------------------------------|---------------------------|
| Raw compute | 78-119x faster | Baseline |
| Memory efficiency | Struct-based, no overhead | 28 bytes per int |
| GPU programming | Native, single language | Requires CUDA + C++ |
| SIMD/vectorization | Built-in, compile-time | Requires NumPy/Numba |

Mojo shines when you need:

  • Tight numerical loops (n-body: 78x)
  • Matrix operations with custom kernels
  • GPU programming without CUDA complexity
  • Memory-predictable performance

When Python Still Wins

Python Advantages
| Factor | Mojo | Python |
|---------------------|-----------------------------|---------------------------|
| Ecosystem | Early stage, limited | Massive (PyPI, scientific)|
| Production ready | 2026: early adopter | Battle-tested, 30+ years |
| Learning resources | Sparse | Abundant |
| NumPy/BLAS ops | Needs library support | 520x on spectral-norm |

Notice that NumPy achieved 520x on spectral-norm by delegating to BLAS (compiled Fortran). Mojo matched Rust at 119x, but can’t beat decades of optimized linear algebra libraries.

The Python Interop Story

Mojo’s killer feature for Python developers: bidirectional interop.

Calling Python from Mojo

python_from_mojo.mojo
from std.python import Python
def main() raises:
var np = Python.import_module("numpy")
var ar = np.arange(15).reshape(3, 5)
print(ar.shape) # (3, 5)

Calling Mojo from Python

python_calling_mojo.py
import hello_mojo
result = hello_mojo.passthrough("Hello")
print(result) # "Hello world from Mojo"
hello_mojo.mojo
fn passthrough(value: PythonObject) raises -> PythonObject:
return value + " world from Mojo"

This enables incremental migration: keep Python for orchestration, move hot paths to Mojo.

A Real Migration Example

Step 1: Start with Python, find the bottleneck:

data_pipeline.py
import pandas as pd
def process_large_dataset(df):
results = []
for idx, row in df.iterrows():
results.append(heavy_computation(row))
return results
def heavy_computation(row):
total = 0.0
for i in range(1000):
total += row['value'] ** 0.5 / (i + 1)
return total
# Profile shows heavy_computation takes 95% of runtime

Step 2: Extract hot path to Mojo:

heavy_compute.mojo
fn heavy_computation_mojo(value: Float64) -> Float64:
var total: Float64 = 0.0
for i in range(1000):
total += sqrt(value) / Float64(i + 1)
return total

Step 3: Call Mojo from Python:

data_pipeline_optimized.py
import heavy_compute # Mojo module
def process_large_dataset(df):
results = []
for idx, row in df.iterrows():
results.append(heavy_compute.heavy_computation_mojo(row['value']))
return results
# Result: 50x speedup on the bottleneck, 40x overall

The migration path is real. You don’t have to rewrite everything.

The Trade-offs

Decision Matrix
| Scenario | Recommendation |
|-----------------------------|------------------------------------------|
| New project, numerical focus| Consider Mojo for performance paths |
| Existing Python codebase | Use Numba/Cython first, Mojo for modules |
| GPU kernel development | Mojo compelling (single language) |
| Data science with NumPy | Stick with Python ecosystem |
| Production stability needed | Mojo is early-stage |
| Team expertise | Python devs learn Mojo syntax quickly |

Mojo Advantages

  • 78-119x speedup on numerical benchmarks
  • Pythonic syntax with systems programming power
  • Single language for CPU + GPU
  • Full Python interop for gradual adoption
  • Modern tooling (VSCode extension, debugger)

Mojo Disadvantages (as of 2026)

  • Early stage, rapidly evolving API
  • Limited ecosystem vs Python’s massive libraries
  • Fewer learning resources and community support
  • Not all Python features supported yet

The two-language problem in AI/ML: researchers prototype in Python, engineers rewrite in C++/CUDA. Mojo aims to solve this by being “Python++” - Python’s syntax with C++ performance.

MLIR (Multi-Level Intermediate Representation) is a compiler infrastructure project that provides a common intermediate representation for compilers. It’s used by TensorFlow, PyTorch, and now Mojo.

The borrow checker concept comes from Rust. It enforces ownership rules at compile time, preventing use-after-free, double-free, and memory leaks without garbage collection overhead.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments