Skip to content

How Does Asyncio Concurrency Differ from Threading in Python?

I thought the GIL made threading “safe” - until my counter returned the wrong value. Then I switched to asyncio and wondered why my async function froze everything. Turns out, Python’s concurrency models are fundamentally different, and understanding that difference saved me from countless production bugs.

The Confusion: Threading vs Asyncio

I had three misconceptions that caused real problems:

Misconception 1: “The GIL prevents race conditions”

counter_race.py
counter = 0
def increment():
global counter
for _ in range(1_000_000):
counter += 1 # Race condition!
# With 4 threads, expected: 4,000,000
# Actual result: 3,721,584 (varies each run!)

Misconception 2: “Asyncio automatically makes everything concurrent”

blocking_async.py
async def handler():
result = heavy_computation() # Blocks the entire event loop!
return result

Misconception 3: “Threading and asyncio are interchangeable” I chose threading for 10,000 concurrent connections. Memory usage exploded to 80GB.

The Core Difference

After much debugging, I finally understood the fundamental distinction:

┌─────────────────────────────────────────────────────────────────┐
│ CONCURRENCY MODELS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ THREADING (Preemptive) ASYNCIO (Cooperative) │
│ ───────────────────── ─────────────────────── │
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────────────────────┐ │
│ │ Thread1 │ │ Thread2 │ │ Event Loop (1) │ │
│ │ ●──────┼───┼──────● │ │ │ │
│ └────┬────┘ └────┬────┘ │ ┌───┐ ┌───┐ ┌───┐ │ │
│ │ │ │ │ C1│ │ C2│ │ C3│ │ │
│ ▼ ▼ │ └─┬─┘ └─┬─┘ └─┬─┘ │ │
│ ┌─────────────────────┐ │ │ │ │ │ │
│ │ OS Scheduler │ │ │ │ │ │ │
│ │ (decides when) │ │ ▼ ▼ ▼ │ │
│ └─────────────────────┘ │ await await await │ │
│ │ │ (you decide when) │ │
│ ▼ └─────────────────────────┘ │
│ ┌─────────────────────┐ │
│ │ GIL │ NO GIL CONTENTION │
│ │ (only 1 at a time) │ (only 1 thread exists) │
│ └─────────────────────┘ │
│ │
│ Switch: ANY bytecode Switch: ONLY at await │
│ Memory: ~8MB per thread Memory: ~KB per coroutine │
│ Locks: REQUIRED Locks: OFTEN optional │
└─────────────────────────────────────────────────────────────────┘

Threading: The OS scheduler can switch threads at ANY bytecode instruction. This is “preemptive” - the OS preempts your code.

Asyncio: The event loop switches coroutines ONLY at explicit await points. This is “cooperative” - your code cooperates by yielding control.

Why This Matters: The Switching Point

The key insight is where switches happen:

switching_comparison.txt
THREADING:
┌──────────────────────────────────────────────────────────────┐
│ Bytecode execution (Thread 1) │
│ LOAD_GLOBAL ◄── OS can switch HERE │
│ LOAD_FAST ◄── OS can switch HERE │
│ INPLACE_ADD ◄── OS can switch HERE (race condition!) │
│ STORE_FAST ◄── OS can switch HERE │
│ LOAD_GLOBAL ◄── OS can switch HERE │
└──────────────────────────────────────────────────────────────┘
ASYNCIO:
┌──────────────────────────────────────────────────────────────┐
│ Bytecode execution (Coroutine 1) │
│ LOAD_GLOBAL ◄── Cannot switch │
│ LOAD_FAST ◄── Cannot switch │
│ INPLACE_ADD ◄── Cannot switch (SAFE!) │
│ STORE_FAST ◄── Cannot switch │
│ AWAIT ◄── ONLY HERE can switch to another coroutine │
└──────────────────────────────────────────────────────────────┘

This means with asyncio, I know EXACTLY when context switches happen. With threading, switches can occur anywhere.

The Race Condition Experiment

I ran this experiment to truly understand the difference:

Threading Version (Race Condition)

threading_race.py
import threading
import time
counter = 0
lock = threading.Lock()
def increment_unsafe():
global counter
for _ in range(1_000_000):
counter += 1 # Read, modify, write - can interleave!
def increment_safe():
global counter
for _ in range(1_000_000):
with lock: # Explicit lock required
counter += 1
# Test unsafe
counter = 0
threads = [threading.Thread(target=increment_unsafe) for _ in range(4)]
for t in threads:
t.start()
for t in threads:
t.join()
print(f"Unsafe: {counter}/4,000,000") # Wrong!
# Test safe
counter = 0
threads = [threading.Thread(target=increment_safe) for _ in range(4)]
for t in threads:
t.start()
for t in threads:
t.join()
print(f"Safe: {counter}/4,000,000") # Correct: 4,000,000

Output:

threading_output.txt
Unsafe: 2,847,293/4,000,000
Safe: 4,000,000/4,000,000

Asyncio Version (No Race Condition)

asyncio_safe.py
import asyncio
counter = 0 # No lock needed!
async def increment(name):
global counter
for i in range(1_000_000):
# This block is ATOMIC - no switch possible
counter += 1
# Only here can another coroutine run
if i % 100_000 == 0:
await asyncio.sleep(0) # Yield point
async def main():
await asyncio.gather(
increment("C1"), increment("C2"),
increment("C3"), increment("C4"),
)
print(f"Result: {counter}/4,000,000") # Always correct!
asyncio.run(main())

Output:

asyncio_output.txt
Result: 4,000,000/4,000,000

No lock needed because counter += 1 cannot be interrupted by another coroutine - only await statements yield control.

When I Chose Wrong (And How I Fixed It)

Mistake 1: Threading for High Concurrency

I tried to handle 10,000 concurrent downloads with threading:

threading_memory.py
import threading
import urllib.request
def download(url):
return urllib.request.urlopen(url).read()
# BAD: 10,000 threads = ~80GB virtual memory!
threads = [threading.Thread(target=download, args=(url,))
for url in urls * 10000]
# Each thread stack: ~8MB
# 10,000 × 8MB = 80GB virtual memory

Fix: Use asyncio for high concurrency:

asyncio_memory.py
import asyncio
import aiohttp
async def download(session, url):
async with session.get(url) as response:
return await response.read()
async def main():
urls = ["https://example.com"] * 10000
async with aiohttp.ClientSession() as session:
# 10,000 coroutines = ~50MB total
# Each coroutine: ~KB
await asyncio.gather(
*[download(session, url) for url in urls]
)
asyncio.run(main())

Mistake 2: Asyncio with Blocking Code

I used requests (blocking) in an async function:

blocking_event_loop.py
import requests # Blocking library!
async def fetch(url):
# This BLOCKS the entire event loop!
# All other coroutines freeze
response = requests.get(url)
return response.json()
# Everything stops while this runs

Fix 1: Use async-compatible library:

aiohttp_fix.py
import aiohttp
async def fetch(url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
return await response.json()

Fix 2: Offload to executor:

executor_fix.py
import asyncio
import requests
from concurrent.futures import ThreadPoolExecutor
async def fetch(url):
loop = asyncio.get_event_loop()
# Run blocking code in thread pool
response = await loop.run_in_executor(
None, # Default executor
lambda: requests.get(url)
)
return response.json()

Mistake 3: CPU-bound Work in Asyncio

I expected asyncio to speed up CPU-intensive calculations:

async_cpu.py
import asyncio
async def cpu_heavy():
total = 0
for i in range(10_000_000):
total += i ** 2 # Pure CPU work
if i % 100_000 == 0:
await asyncio.sleep(0) # Yield, but doesn't help!
return total
# Still single-threaded! No parallelism for CPU work

Fix: Use multiprocessing for CPU-bound:

multiprocessing_fix.py
from concurrent.futures import ProcessPoolExecutor
import asyncio
def cpu_heavy(): # Regular function
total = 0
for i in range(10_000_000):
total += i ** 2
return total
async def main():
loop = asyncio.get_event_loop()
# Each process has its own GIL!
with ProcessPoolExecutor() as pool:
results = await asyncio.gather(
loop.run_in_executor(pool, cpu_heavy),
loop.run_in_executor(pool, cpu_heavy),
loop.run_in_executor(pool, cpu_heavy),
loop.run_in_executor(pool, cpu_heavy),
)
return results

The Decision Matrix

I created this flowchart to decide which approach to use:

┌─────────────────────────────────────────────────────────────────┐
│ CONCURRENCY DECISION TREE │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────┐
│ CPU-bound or │
│ I/O-bound? │
└────────┬────────┘
┌──────────────┴──────────────┐
▼ ▼
CPU-bound I/O-bound
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Multiprocessing │ │ Concurrency OK? │
│ (separate GIL) │ └────────┬────────┘
└─────────────────┘ │
┌──────┴──────┐
▼ ▼
Yes No
│ │
▼ ▼
┌────────────────┐ ┌────────────────┐
│ >100 tasks? │ │ Multiprocessing│
└────────┬───────┘ └────────────────┘
┌──────────┴──────────┐
▼ ▼
Yes No
│ │
▼ ▼
┌────────────────┐ ┌────────────────┐
│ Async library │ │ Async library │
│ available? │ │ available? │
└────────┬───────┘ └────────┬───────┘
│ │
┌──────┴──────┐ ┌──────┴──────┐
▼ ▼ ▼ ▼
Yes No Yes No
│ │ │ │
▼ ▼ ▼ ▼
┌─────────┐ ┌────────┐ ┌─────────┐ ┌────────┐
│ ASYNCIO │ │ ASYNCIO│ │ ASYNCIO │ │THREAD- │
│ │ │+executor│ │ │ │ ING │
└─────────┘ └────────┘ └─────────┘ └────────┘

Memory Comparison

This table shows why asyncio scales better:

TasksThreading (Memory)Asyncio (Memory)Ratio
100~800 MB~5 MB160:1
1,000~8 GB~50 MB160:1
10,000~80 GB~500 MB160:1
100,000~800 GB~5 GB160:1

Each thread needs ~8MB stack. Each coroutine needs only kilobytes.

The GIL Reality Check

Both threading and asyncio are limited by the GIL for CPU-bound work:

gil_behavior.txt
┌────────────────────────────────────────────────────────────────┐
│ GIL BEHAVIOR │
├────────────────────────────────────────────────────────────────┤
│ │
│ THREADING + CPU-BOUND: │
│ ┌─────────────────────────────────────────┐ │
│ │ Thread 1: ████████░░░░░░░░████████░░░░ │ (GIL contention) │
│ │ Thread 2: ░░░░░░░░████████░░░░░░░░████ │ │
│ │ Thread 3: ░░░░████████░░░░░░░░████████░ │ │
│ └─────────────────────────────────────────┘ │
│ Result: No speedup, just overhead │
│ │
│ THREADING + I/O-BOUND: │
│ ┌─────────────────────────────────────────┐ │
│ │ Thread 1: ████████░░░░░░░░░░░░░░░░░░░░ │ │
│ │ Thread 2: ░░░░░░░░████████░░░░░░░░░░░░ │ (I/O releases │
│ │ Thread 3: ░░░░░░░░░░░░░░░░████████░░░ │ GIL) │
│ └─────────────────────────────────────────┘ │
│ Result: Effective concurrency │
│ │
│ ASYNCIO: │
│ ┌─────────────────────────────────────────┐ │
│ │ Single Thread: █████░░░░░██████░░░░░░░ │ │
│ │ │ │ │ │ │
│ │ ▼ ▼ ▼ │ │
│ │ await await await │ │
│ └─────────────────────────────────────────┘ │
│ Result: No GIL contention (only 1 thread) │
│ │
└────────────────────────────────────────────────────────────────┘

Practical Guidelines

After all my mistakes, here’s my decision process:

Use Threading When:

  • Fewer than 100 concurrent I/O operations
  • Using blocking libraries (requests, sqlite3, etc.)
  • Working with existing synchronous codebases
  • Simple scripts where asyncio overhead isn’t justified

Use Asyncio When:

  • More than 100 concurrent connections
  • HTTP APIs, websockets, chat servers
  • Async-compatible libraries available (aiohttp, aiopg, etc.)
  • Predictable race-condition behavior matters
  • Memory efficiency is critical

Use Multiprocessing When:

  • CPU-bound work (calculations, image processing)
  • True parallelism needed
  • Each task is independent

Common Patterns

Pattern 1: Threading with Bounded Pool

bounded_threading.py
from concurrent.futures import ThreadPoolExecutor
import urllib.request
def fetch(url):
return urllib.request.urlopen(url).read()
urls = [...] # 1000 URLs
# Limit to 50 concurrent threads
with ThreadPoolExecutor(max_workers=50) as executor:
results = list(executor.map(fetch, urls))

Pattern 2: Asyncio with Rate Limiting

asyncio_rate_limit.py
import asyncio
import aiohttp
class RateLimiter:
def __init__(self, rate_per_second):
self.interval = 1.0 / rate_per_second
self.last_time = 0
async def acquire(self):
now = asyncio.get_event_loop().time()
wait_time = self.last_time + self.interval - now
if wait_time > 0:
await asyncio.sleep(wait_time)
self.last_time = now
async def fetch(session, url, limiter):
await limiter.acquire() # Rate limit
async with session.get(url) as response:
return await response.text()
async def main():
limiter = RateLimiter(10) # 10 requests/second
async with aiohttp.ClientSession() as session:
tasks = [fetch(session, url, limiter) for url in urls]
results = await asyncio.gather(*tasks)

Pattern 3: Hybrid Asyncio + Multiprocessing

hybrid_approach.py
import asyncio
from concurrent.futures import ProcessPoolExecutor
def cpu_intensive(data):
# Pure CPU work - runs in separate process
return sum(x ** 2 for x in data)
async def fetch_and_process(url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
data = await response.json()
# Offload CPU work to process pool
loop = asyncio.get_event_loop()
with ProcessPoolExecutor() as pool:
result = await loop.run_in_executor(
pool, cpu_intensive, data
)
return result

Summary Table

AspectThreadingAsyncioMultiprocessing
ThreadsMultiple OS threadsSingle threadMultiple processes
SwitchingPreemptive (OS)Cooperative (await)N/A
GILLimitedNot applicableBypassed
Memory/task~8 MB~KB~MB per process
Locks neededYesOften noInter-process only
Best forI/O, blocking libsHigh concurrency I/OCPU-bound
Race conditionsUnpredictablePredictablePer-process

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments