Skip to content

Why CPython's 10% Frame Pointer Overhead Vanished: The Bytecode Dispatch Story

When I first read about PEP 831 and its frame pointer proposal, I stumbled upon a curious historical footnote: CPython used to have a ~10% performance overhead when frame pointers were enabled. That’s unusual. Most C programs see only ~2% overhead with frame pointers. Why was Python such an outlier? And more importantly, how did it get fixed?

The 10% Problem

Frame pointer overhead is a well-known tradeoff. Enabling frame pointers (-fno-omit-frame-pointer) makes stack walking and profiling easier, but it reserves one CPU register for the frame pointer, reducing available registers for other optimizations. For most C programs, this costs around 2% performance.

But CPython was different. Historical benchmarks showed ~10% overhead—a 5x increase over the typical cost. This made frame pointer adoption impractical for earlier Python versions.

┌─────────────────────────────────────────────────────────────────┐
│ FRAME POINTER OVERHEAD │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Typical C Program ████████░░░░░░░░░░░░░░░░░░░░ ~2% │
│ │
│ CPython (Historical) ████████████████████████████ ~10% │
│ │
│ CPython (Current) ████████░░░░░░░░░░░░░░░░░░░░ ~2% │
│ │
└─────────────────────────────────────────────────────────────────┘

I wanted to understand why CPython was so unusual.

The Culprit: Bytecode Dispatch

The answer lies in CPython’s eval loop—the heart of Python’s execution engine. This function, _PyEval_EvalFrameDefault, is responsible for executing Python bytecode. It’s structured as a massive switch-case statement with over 100 opcode handlers.

Conceptual structure of the eval loop
PyObject* _PyEval_EvalFrameDefault(PyFrameObject *f) {
// This function is huge - thousands of lines
// The compiler sees this as one enormous function
switch (opcode) {
case LOAD_CONST:
// handle loading constants
break;
case BINARY_ADD:
// handle addition
break;
case BINARY_SUBTRACT:
// handle subtraction
break;
// ... ~100 more opcode cases
}
}

When the compiler optimizes with -fomit-frame-pointer, it makes assumptions about how the code will execute. But the eval loop’s structure—its massive size, the frequent jumps between cases, the specific patterns of local variable access—didn’t match those assumptions well.

The result? The compiler generated code that was unusually sensitive to frame pointer presence. Enabling frame pointers disrupted its optimization strategy in ways that rippled through the entire eval loop.

The Timeline

FRAME POINTER OVERHEAD IN CPYTHON
2020 2023 2025
│ │ │
│ ~10% overhead │ Eval loop │ ~2% overhead
│ (outlier status) │ restructured │ (normalized)
│ │ for PEP 659 │
▼ ▼ ▼
┌─────────┐ ┌─────────────┐ ┌─────────────┐
│ Problem │ ─────────▶ │ Coincidental│ ───────▶ │ Resolved │
│ Known │ │ Fix │ │ │
└─────────┘ └─────────────┘ └─────────────┘
│ No targeted effort
│ to fix the 10%
┌──────┴──────┐
│ PEP 659 │
│ (Adaptive │
│ Interpreter)│
└─────────────┘

How It Got Fixed (Without Trying)

Here’s the surprising part: there was never a targeted effort to fix the 10% overhead. It resolved itself as a side effect of other work.

PEP 659 introduced the adaptive interpreter to CPython. This required significant restructuring of the eval loop—splitting it into smaller functions, changing how opcodes are dispatched, and generally modernizing the codebase.

How restructuring changed the generated code
// Before restructuring:
// - One massive function
// - Compiler struggled with optimization assumptions
// - Frame pointer omission had unpredictable effects
// After restructuring:
// - Better function boundaries
// - Compiler generates more predictable code
// - Base pointer naturally used where appropriate
// - Frame pointer cost normalized to ~2%

After this restructuring, the compiler started generating code that naturally used a base pointer in ways that aligned with its optimization assumptions. The eval loop was no longer an outlier.

As one commenter noted in the PEP 831 discussion: “AFAICT there never was a targeted effort to fix this, it just resulted from other works.”

Why This Matters

This historical outlier was actually a blocker for earlier frame pointer adoption efforts. If CPython had tried to enable frame pointers in 2020, the 10% overhead would have been unacceptable.

┌──────────────────────────────────────────────────────────────────┐
│ DECISION MATRIX │
├─────────────────────┬────────────────────┬───────────────────────┤
│ │ With 10% Overhead │ With 2% Overhead │
├─────────────────────┼────────────────────┼───────────────────────┤
│ Enable Frame Pointer│ Unacceptable cost │ Acceptable tradeoff │
│ for Profiling │ (rejected) │ (adopted in PEP 831) │
├─────────────────────┼────────────────────┼───────────────────────┤
│ Performance Impact │ Significant │ Minimal │
│ │ regression │ regression │
├─────────────────────┼────────────────────┼───────────────────────┤
│ Debugging Benefit │ Not worth cost │ Worth the cost │
└─────────────────────┴────────────────────┴───────────────────────┘

By the time PEP 831 was proposed, the problem had already disappeared. The eval loop restructuring that shipped with Python 3.11+ (as part of the adaptive interpreter work) had resolved the outlier status.

Common Mistake: Citing Old Benchmarks

If you’re researching Python performance or frame pointers today, be careful about historical data. Articles and discussions from before 2023 might still reference the 10% overhead. That information is now outdated.

Benchmark comparison
# Historical benchmark (before Python 3.11)
# CPython with frame pointers: ~10% slower than without
# Typical C program with frame pointers: ~2% slower
# Current benchmark (Python 3.11+)
# CPython with frame pointers: ~2% slower (matches typical)
# The outlier status is resolved

The 10% number is a historical artifact, not a current concern.

The Lesson

Sometimes the best fix is the one you didn’t plan. CPython’s frame pointer overhead was a real problem with a surprising resolution: it fixed itself as a side effect of unrelated optimization work.

This is a good reminder that performance characteristics can change significantly across versions. When evaluating whether to adopt something like frame pointers, always benchmark against your actual target version—historical data may no longer apply.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments