How Does Asyncio Concurrency Differ from Threading in Python?

Apr 24, 2026

I thought the GIL made threading “safe” - until my counter returned the wrong value. Then I switched to asyncio and wondered why my async function froze everything. Turns out, Python’s concurrency models are fundamentally different, and understanding that difference saved me from countless production bugs.

The Confusion: Threading vs Asyncio

I had three misconceptions that caused real problems:

Misconception 1: “The GIL prevents race conditions”

counter = 0

def increment():
    global counter
    for _ in range(1_000_000):
        counter += 1  # Race condition!

# With 4 threads, expected: 4,000,000
# Actual result: 3,721,584 (varies each run!)

Misconception 2: “Asyncio automatically makes everything concurrent”

async def handler():
    result = heavy_computation()  # Blocks the entire event loop!
    return result

Misconception 3: “Threading and asyncio are interchangeable” I chose threading for 10,000 concurrent connections. Memory usage exploded to 80GB.

The Core Difference

After much debugging, I finally understood the fundamental distinction:

┌─────────────────────────────────────────────────────────────────┐
│                     CONCURRENCY MODELS                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  THREADING (Preemptive)          ASYNCIO (Cooperative)         │
│  ─────────────────────           ───────────────────────        │
│                                                                 │
│  ┌─────────┐   ┌─────────┐       ┌─────────────────────────┐   │
│  │ Thread1 │   │ Thread2 │       │    Event Loop (1)       │   │
│  │  ●──────┼───┼──────●  │       │                         │   │
│  └────┬────┘   └────┬────┘       │  ┌───┐  ┌───┐  ┌───┐   │   │
│       │             │            │  │ C1│  │ C2│  │ C3│   │   │
│       ▼             ▼            │  └─┬─┘  └─┬─┘  └─┬─┘   │   │
│  ┌─────────────────────┐         │    │      │      │      │   │
│  │   OS Scheduler      │         │    │      │      │      │   │
│  │  (decides when)     │         │    ▼      ▼      ▼      │   │
│  └─────────────────────┘         │  await  await  await     │   │
│       │                          │   (you decide when)     │   │
│       ▼                          └─────────────────────────┘   │
│  ┌─────────────────────┐                                       │
│  │        GIL          │         NO GIL CONTENTION            │
│  │  (only 1 at a time) │         (only 1 thread exists)       │
│  └─────────────────────┘                                       │
│                                                                 │
│  Switch: ANY bytecode           Switch: ONLY at await          │
│  Memory: ~8MB per thread        Memory: ~KB per coroutine      │
│  Locks: REQUIRED                Locks: OFTEN optional           │
└─────────────────────────────────────────────────────────────────┘

Threading: The OS scheduler can switch threads at ANY bytecode instruction. This is “preemptive” - the OS preempts your code.

Asyncio: The event loop switches coroutines ONLY at explicit await points. This is “cooperative” - your code cooperates by yielding control.

Why This Matters: The Switching Point

The key insight is where switches happen:

THREADING:
┌──────────────────────────────────────────────────────────────┐
│ Bytecode execution (Thread 1)                                │
│ LOAD_GLOBAL    ◄── OS can switch HERE                        │
│ LOAD_FAST      ◄── OS can switch HERE                        │
│ INPLACE_ADD    ◄── OS can switch HERE (race condition!)      │
│ STORE_FAST     ◄── OS can switch HERE                        │
│ LOAD_GLOBAL    ◄── OS can switch HERE                        │
└──────────────────────────────────────────────────────────────┘

ASYNCIO:
┌──────────────────────────────────────────────────────────────┐
│ Bytecode execution (Coroutine 1)                             │
│ LOAD_GLOBAL    ◄── Cannot switch                            │
│ LOAD_FAST      ◄── Cannot switch                            │
│ INPLACE_ADD    ◄── Cannot switch (SAFE!)                     │
│ STORE_FAST     ◄── Cannot switch                            │
│ AWAIT          ◄── ONLY HERE can switch to another coroutine │
└──────────────────────────────────────────────────────────────┘

This means with asyncio, I know EXACTLY when context switches happen. With threading, switches can occur anywhere.

The Race Condition Experiment

I ran this experiment to truly understand the difference:

Threading Version (Race Condition)

import threading
import time

counter = 0
lock = threading.Lock()

def increment_unsafe():
    global counter
    for _ in range(1_000_000):
        counter += 1  # Read, modify, write - can interleave!

def increment_safe():
    global counter
    for _ in range(1_000_000):
        with lock:  # Explicit lock required
            counter += 1

# Test unsafe
counter = 0
threads = [threading.Thread(target=increment_unsafe) for _ in range(4)]
for t in threads:
    t.start()
for t in threads:
    t.join()
print(f"Unsafe: {counter}/4,000,000")  # Wrong!

# Test safe
counter = 0
threads = [threading.Thread(target=increment_safe) for _ in range(4)]
for t in threads:
    t.start()
for t in threads:
    t.join()
print(f"Safe: {counter}/4,000,000")  # Correct: 4,000,000

Output:

Unsafe: 2,847,293/4,000,000
Safe: 4,000,000/4,000,000

Asyncio Version (No Race Condition)

import asyncio

counter = 0  # No lock needed!

async def increment(name):
    global counter
    for i in range(1_000_000):
        # This block is ATOMIC - no switch possible
        counter += 1

        # Only here can another coroutine run
        if i % 100_000 == 0:
            await asyncio.sleep(0)  # Yield point

async def main():
    await asyncio.gather(
        increment("C1"), increment("C2"),
        increment("C3"), increment("C4"),
    )
    print(f"Result: {counter}/4,000,000")  # Always correct!

asyncio.run(main())

Output:

Result: 4,000,000/4,000,000

No lock needed because counter += 1 cannot be interrupted by another coroutine - only await statements yield control.

When I Chose Wrong (And How I Fixed It)

Mistake 1: Threading for High Concurrency

I tried to handle 10,000 concurrent downloads with threading:

import threading
import urllib.request

def download(url):
    return urllib.request.urlopen(url).read()

# BAD: 10,000 threads = ~80GB virtual memory!
threads = [threading.Thread(target=download, args=(url,))
           for url in urls * 10000]

# Each thread stack: ~8MB
# 10,000 × 8MB = 80GB virtual memory

Fix: Use asyncio for high concurrency:

import asyncio
import aiohttp

async def download(session, url):
    async with session.get(url) as response:
        return await response.read()

async def main():
    urls = ["https://example.com"] * 10000
    async with aiohttp.ClientSession() as session:
        # 10,000 coroutines = ~50MB total
        # Each coroutine: ~KB
        await asyncio.gather(
            *[download(session, url) for url in urls]
        )

asyncio.run(main())

Mistake 2: Asyncio with Blocking Code

I used requests (blocking) in an async function:

import requests  # Blocking library!

async def fetch(url):
    # This BLOCKS the entire event loop!
    # All other coroutines freeze
    response = requests.get(url)
    return response.json()

# Everything stops while this runs

Fix 1: Use async-compatible library:

import aiohttp

async def fetch(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            return await response.json()

Fix 2: Offload to executor:

import asyncio
import requests
from concurrent.futures import ThreadPoolExecutor

async def fetch(url):
    loop = asyncio.get_event_loop()
    # Run blocking code in thread pool
    response = await loop.run_in_executor(
        None,  # Default executor
        lambda: requests.get(url)
    )
    return response.json()

Mistake 3: CPU-bound Work in Asyncio

I expected asyncio to speed up CPU-intensive calculations:

import asyncio

async def cpu_heavy():
    total = 0
    for i in range(10_000_000):
        total += i ** 2  # Pure CPU work
        if i % 100_000 == 0:
            await asyncio.sleep(0)  # Yield, but doesn't help!
    return total

# Still single-threaded! No parallelism for CPU work

Fix: Use multiprocessing for CPU-bound:

from concurrent.futures import ProcessPoolExecutor
import asyncio

def cpu_heavy():  # Regular function
    total = 0
    for i in range(10_000_000):
        total += i ** 2
    return total

async def main():
    loop = asyncio.get_event_loop()
    # Each process has its own GIL!
    with ProcessPoolExecutor() as pool:
        results = await asyncio.gather(
            loop.run_in_executor(pool, cpu_heavy),
            loop.run_in_executor(pool, cpu_heavy),
            loop.run_in_executor(pool, cpu_heavy),
            loop.run_in_executor(pool, cpu_heavy),
        )
    return results

The Decision Matrix

I created this flowchart to decide which approach to use:

┌─────────────────────────────────────────────────────────────────┐
│                    CONCURRENCY DECISION TREE                    │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
                    ┌─────────────────┐
                    │  CPU-bound or    │
                    │   I/O-bound?    │
                    └────────┬────────┘
                             │
              ┌──────────────┴──────────────┐
              ▼                             ▼
        CPU-bound                      I/O-bound
              │                             │
              ▼                             ▼
    ┌─────────────────┐           ┌─────────────────┐
    │ Multiprocessing │           │ Concurrency OK? │
    │ (separate GIL)  │           └────────┬────────┘
    └─────────────────┘                    │
                                   ┌──────┴──────┐
                                   ▼             ▼
                              Yes               No
                                   │             │
                                   ▼             ▼
                          ┌────────────────┐  ┌────────────────┐
                          │  >100 tasks?   │  │ Multiprocessing│
                          └────────┬───────┘  └────────────────┘
                                   │
                        ┌──────────┴──────────┐
                        ▼                     ▼
                      Yes                    No
                        │                     │
                        ▼                     ▼
               ┌────────────────┐    ┌────────────────┐
               │ Async library  │    │ Async library  │
               │ available?     │    │ available?     │
               └────────┬───────┘    └────────┬───────┘
                        │                     │
                 ┌──────┴──────┐       ┌──────┴──────┐
                 ▼             ▼       ▼             ▼
               Yes            No     Yes            No
                 │             │       │             │
                 ▼             ▼       ▼             ▼
            ┌─────────┐  ┌────────┐  ┌─────────┐  ┌────────┐
            │ ASYNCIO │  │ ASYNCIO│  │ ASYNCIO │  │THREAD- │
            │         │  │+executor│  │         │  │  ING   │
            └─────────┘  └────────┘  └─────────┘  └────────┘

Memory Comparison

This table shows why asyncio scales better:

Tasks	Threading (Memory)	Asyncio (Memory)	Ratio
100	~800 MB	~5 MB	160:1
1,000	~8 GB	~50 MB	160:1
10,000	~80 GB	~500 MB	160:1
100,000	~800 GB	~5 GB	160:1

Each thread needs ~8MB stack. Each coroutine needs only kilobytes.

The GIL Reality Check

Both threading and asyncio are limited by the GIL for CPU-bound work:

┌────────────────────────────────────────────────────────────────┐
│                    GIL BEHAVIOR                                │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│  THREADING + CPU-BOUND:                                        │
│  ┌─────────────────────────────────────────┐                  │
│  │ Thread 1: ████████░░░░░░░░████████░░░░ │ (GIL contention) │
│  │ Thread 2: ░░░░░░░░████████░░░░░░░░████ │                  │
│  │ Thread 3: ░░░░████████░░░░░░░░████████░ │                  │
│  └─────────────────────────────────────────┘                  │
│  Result: No speedup, just overhead                           │
│                                                                │
│  THREADING + I/O-BOUND:                                        │
│  ┌─────────────────────────────────────────┐                  │
│  │ Thread 1: ████████░░░░░░░░░░░░░░░░░░░░ │                  │
│  │ Thread 2: ░░░░░░░░████████░░░░░░░░░░░░ │ (I/O releases   │
│  │ Thread 3: ░░░░░░░░░░░░░░░░████████░░░ │  GIL)           │
│  └─────────────────────────────────────────┘                  │
│  Result: Effective concurrency                               │
│                                                                │
│  ASYNCIO:                                                     │
│  ┌─────────────────────────────────────────┐                  │
│  │ Single Thread: █████░░░░░██████░░░░░░░ │                  │
│  │                 │      │      │         │                  │
│  │                 ▼      ▼      ▼         │                  │
│  │              await   await  await       │                  │
│  └─────────────────────────────────────────┘                  │
│  Result: No GIL contention (only 1 thread)                   │
│                                                                │
└────────────────────────────────────────────────────────────────┘

Practical Guidelines

After all my mistakes, here’s my decision process:

Use Threading When:

Fewer than 100 concurrent I/O operations
Using blocking libraries (requests, sqlite3, etc.)
Working with existing synchronous codebases
Simple scripts where asyncio overhead isn’t justified

Use Asyncio When:

More than 100 concurrent connections
HTTP APIs, websockets, chat servers
Async-compatible libraries available (aiohttp, aiopg, etc.)
Predictable race-condition behavior matters
Memory efficiency is critical

Use Multiprocessing When:

CPU-bound work (calculations, image processing)
True parallelism needed
Each task is independent

Common Patterns

Pattern 1: Threading with Bounded Pool

from concurrent.futures import ThreadPoolExecutor
import urllib.request

def fetch(url):
    return urllib.request.urlopen(url).read()

urls = [...]  # 1000 URLs

# Limit to 50 concurrent threads
with ThreadPoolExecutor(max_workers=50) as executor:
    results = list(executor.map(fetch, urls))

Pattern 2: Asyncio with Rate Limiting

import asyncio
import aiohttp

class RateLimiter:
    def __init__(self, rate_per_second):
        self.interval = 1.0 / rate_per_second
        self.last_time = 0

    async def acquire(self):
        now = asyncio.get_event_loop().time()
        wait_time = self.last_time + self.interval - now
        if wait_time > 0:
            await asyncio.sleep(wait_time)
        self.last_time = now

async def fetch(session, url, limiter):
    await limiter.acquire()  # Rate limit
    async with session.get(url) as response:
        return await response.text()

async def main():
    limiter = RateLimiter(10)  # 10 requests/second
    async with aiohttp.ClientSession() as session:
        tasks = [fetch(session, url, limiter) for url in urls]
        results = await asyncio.gather(*tasks)

Pattern 3: Hybrid Asyncio + Multiprocessing

import asyncio
from concurrent.futures import ProcessPoolExecutor

def cpu_intensive(data):
    # Pure CPU work - runs in separate process
    return sum(x ** 2 for x in data)

async def fetch_and_process(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            data = await response.json()

    # Offload CPU work to process pool
    loop = asyncio.get_event_loop()
    with ProcessPoolExecutor() as pool:
        result = await loop.run_in_executor(
            pool, cpu_intensive, data
        )
    return result

Summary Table

Aspect	Threading	Asyncio	Multiprocessing
Threads	Multiple OS threads	Single thread	Multiple processes
Switching	Preemptive (OS)	Cooperative (await)	N/A
GIL	Limited	Not applicable	Bypassed
Memory/task	~8 MB	~KB	~MB per process
Locks needed	Yes	Often no	Inter-process only
Best for	I/O, blocking libs	High concurrency I/O	CPU-bound
Race conditions	Unpredictable	Predictable	Per-process

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!