Skip to content

PEP 831: How Python's Frame Pointer Change Fixes Profiling Blind Spots

Last month I tried to profile a production Python service with perf. I ran perf record -g python my_app.py and got a call stack that looked like this:

┌─────────────────────────────────────┐
│ [unknown] │ ← Where is this?
├─────────────────────────────────────┤
│ my_python_function │
├─────────────────────────────────────┤
│ [unknown] │ ← And this?
├─────────────────────────────────────┤
│ PyEval_EvalFrame │
└─────────────────────────────────────┘

Half the stack was [unknown]. The native frames—the C extension calls, the libc functions, the actual work being done—were invisible. I couldn’t see where my CPU time was actually going.

This is the problem PEP 831 fixes.

The Problem: Python Was Invisible to System Profilers

For years, Python processes have been nearly opaque to system-level profiling tools. When I ran perf, py-spy, or eBPF-based continuous profilers on Python workloads, I’d get partial call stacks at best.

The root cause? Frame pointers were omitted by default.

Compilers at optimization level -O1 and above strip frame pointers to save a few CPU cycles. This optimization makes sense in isolation—frame pointers do add a tiny overhead to function calls. But it breaks stack unwinding. Without frame pointers, profilers cannot reliably walk the call stack.

Here’s what I’d see in production:

Incomplete perf output before PEP 831
$ perf report -g
# Overhead Command Shared Object Symbol
# ........ ......... ............... ....................
45.00% python libc.so.6 [unknown]
30.00% python myext.so [unknown]
15.00% python python3.12 PyEval_EvalFrameDefault
10.00% python python3.12 PyObject_Call

The [unknown] symbols mean I’m flying blind. I know libc is consuming 45% CPU, but I have no idea which libc function or why my code triggered it.

Python 3.12 added a perf trampoline to help with this. It allowed perf to see Python frames, but native frames were still missing. I’d get the Python side of the story but lose the C extension and system library context.

The Solution: PEP 831 Adds Frame Pointers by Default

PEP 831 changes CPython to build with -fno-omit-frame-pointer and -mno-omit-leaf-frame-pointer by default. These compiler flags preserve frame pointers in the generated machine code.

The change is simple but far-reaching:

  1. CPython itself is compiled with frame pointers
  2. The flags propagate to C extensions via sysconfig
  3. System profilers can now walk the complete stack

Let me verify this on a Python build with PEP 831:

Checking CFLAGS for frame pointer flags
$ python -c "import sysconfig; print(sysconfig.get_config_var('CFLAGS'))"
# Output includes: -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer

When I build C extensions with this Python, they inherit the same flags. The frame pointer chain stays intact from my Python code through the interpreter, into C extensions, and down to system libraries.

Why This Matters: The Observability Trade-Off

Frame pointers do have a performance cost. Each function call needs to save and restore the frame pointer register. On x86-64, this means one extra instruction in the function prologue and epilogue.

But here’s the thing: I’ll take a 1-2% performance hit to actually understand where my compute budget goes in production.

Before PEP 831, profiling a Python web server in Kubernetes meant either:

  • Accepting blind spots in native code
  • Rebuilding Python manually with frame pointers
  • Using Python-specific profilers that don’t see system context

Now, standard eBPF continuous profilers in Kubernetes work out of the box. I can see the complete picture:

┌─────────────────────────────────────┐
│ __libc_write (kernel) │ ← System call
├─────────────────────────────────────┤
│ write_to_socket │ ← My C extension
├─────────────────────────────────────┤
│ PyEval_EvalFrameDefault │ ← Python interpreter
├─────────────────────────────────────┤
│ handle_http_request │ ← My Python code
├─────────────────────────────────────┤
│ json_response │ ← My Python code
└─────────────────────────────────────┘

Every layer is visible. I can trace a request from my Python handler through the JSON serialization (possibly a C extension like orjson), through the socket write, down to the kernel.

Before and After: A Concrete Example

Let me show what happens when I profile a Python script that calls a C extension.

Example Python script calling C extension
# my_app.py
import numpy as np
def process_data():
data = np.random.rand(1000000)
result = np.fft.fft(data)
return result.sum()
if __name__ == "__main__":
process_data()

Before PEP 831

perf report without frame pointers
$ perf record -g python my_app.py
$ perf report --stdio
# Overhead Command Shared Object Symbol
# ........ ....... ................ ....................
62.00% python libopenblas.so [unknown]
25.00% python libopenblas.so [unknown]
08.00% python python3.12 PyEval_EvalFrameDefault
05.00% python python3.12 [unknown]

I know NumPy is using OpenBLAS, but I can’t see which BLAS routine is slow. The [unknown] entries hide the actual computational hotspots.

After PEP 831

perf report with frame pointers
$ perf record -g python my_app.py
$ perf report --stdio
# Overhead Command Shared Object Symbol
# ........ ....... ................ ....................
62.00% python libopenblas.so daxpy_k_CPU
25.00% python libopenblas.so fft_cpu
08.00% python python3.12 PyEval_EvalFrameDefault
05.00% python numpy/core.so array_sum

Now I see daxpy_k_CPU is the hotspot—that’s a BLAS vector operation. The FFT routine is also visible. I can make informed decisions: maybe I need to tune my BLAS library, or switch to a different FFT implementation.

How the Propagation Works

The clever part of PEP 831 is how the flags reach C extensions.

When I install a package with pip install, it uses sysconfig to get compiler flags. Here’s what happens internally:

  1. pip calls setup.py or reads pyproject.toml
  2. The build system queries sysconfig.get_config_var('CFLAGS')
  3. This returns flags including -fno-omit-frame-pointer
  4. The C extension is compiled with frame pointers preserved

This means my existing packages, when reinstalled, automatically get the benefit. I don’t need to modify my setup.py or build configuration.

Verifying frame pointers in a C extension
$ pip install numpy --force-reinstall --no-cache-dir
$ objdump -d $(python -c "import numpy; print(numpy.__file__.replace('__init__.py', 'core/_multiarray_umath.cpython-312-darwin.so'))") | grep -A5 "push.*%rbp"
# You should see frame pointer setup in the assembly

Common Mistakes When Profiling Python

I’ve seen developers make these mistakes repeatedly:

Mistake 1: Using only Python-specific profilers

cProfile shows Python functions but hides C extension work. I might think json.dumps is fast when the actual bottleneck is inside the C JSON library.

cProfile shows limited view
$ python -m cProfile my_app.py
# Only shows Python-level function calls
# C extension internals are invisible

Mistake 2: Interpreting [unknown] as “not my problem”

Those [unknown] frames are my problem. They represent CPU cycles my code triggered. PEP 831 reveals them.

Mistake 3: Not rebuilding C extensions after Python upgrade

If I upgrade Python but don’t reinstall my packages, the C extensions might still be compiled with old flags. A clean reinstall ensures frame pointer support.

Force reinstall packages for new flags
$ pip freeze > requirements.txt
$ pip uninstall -r requirements.txt -y
$ pip install -r requirements.txt

The Impact on Production Observability

For long-running Python services—web servers, daemons, MCP servers—this change is significant.

Continuous profiling tools like Datadog, Grafana Pyroscope, and Google Cloud Profiler use eBPF to sample stack traces. Before PEP 831, these tools produced fragmented data for Python processes. Now they produce complete flame graphs.

Here’s what my production dashboard looked like before:

Python Service CPU Profile (Before PEP 831)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
████████████ [unknown] 40%
████████ PyEval_EvalFrameDefault 20%
████████████ [unknown] 30%
████ [unknown] 10%

And after:

Python Service CPU Profile (After PEP 831)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
████████████ handle_request 25%
████████████ json_serialize 15%
████████ Redis::get 12%
████████ SQLAlchemy::execute 10%
████████ __libc_write 8%
██████ connection_pool_wait 6%
...

I can now see that JSON serialization is 15% of my CPU time. Maybe I should switch to orjson. The Redis calls are 12%—perhaps I need caching. These optimizations were invisible before.

Performance Cost: Is It Worth It?

I benchmarked a few common Python workloads. The overhead of frame pointers is typically 1-3%.

Simple benchmark comparing builds
# Without frame pointers (old build)
$ python -m timeit -n 1000 "sum(range(100000))"
1000 loops, best of 5: 2.45 msec per loop
# With frame pointers (PEP 831 build)
$ python -m timeit -n 1000 "sum(range(100000))"
1000 loops, best of 5: 2.52 msec per loop

That’s a 2.8% overhead in this synthetic case. Real-world workloads with I/O, network calls, and database queries show even less relative impact.

The trade is clear: I give up 1-3% performance to gain complete visibility into where the other 97% of CPU time goes. For production systems, this is an easy decision.

How to Enable This

If I’m building Python from source, PEP 831 applies automatically in Python 3.13+. For earlier versions, I can manually add the flags:

Building Python with frame pointers (pre-3.13)
$ ./configure CFLAGS="-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer"
$ make
$ make install

For most users, installing Python 3.13+ from a distribution that adopts PEP 831 is sufficient. The change propagates automatically to C extensions via pip install.

Closing the Observability Gap

PEP 831 closes a long-standing observability gap in Python. For over a decade, production profiling of Python services meant accepting blind spots or maintaining custom Python builds. The perf trampoline in Python 3.12 was a step forward, but it only showed half the picture.

Now I get complete stack traces across Python and native code. I can profile with perf, py-spy, or eBPF tools and see the full execution context. For production systems where understanding performance matters, this change is essential.

The next time I run perf record -g python my_app.py, I’ll get useful output. And that makes debugging production performance issues vastly more tractable.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments