How Does Asyncio Concurrency Differ from Threading in Python?
I thought the GIL made threading “safe” - until my counter returned the wrong value. Then I switched to asyncio and wondered why my async function froze everything. Turns out, Python’s concurrency models are fundamentally different, and understanding that difference saved me from countless production bugs.
The Confusion: Threading vs Asyncio
I had three misconceptions that caused real problems:
Misconception 1: “The GIL prevents race conditions”
counter = 0
def increment(): global counter for _ in range(1_000_000): counter += 1 # Race condition!
# With 4 threads, expected: 4,000,000# Actual result: 3,721,584 (varies each run!)Misconception 2: “Asyncio automatically makes everything concurrent”
async def handler(): result = heavy_computation() # Blocks the entire event loop! return resultMisconception 3: “Threading and asyncio are interchangeable” I chose threading for 10,000 concurrent connections. Memory usage exploded to 80GB.
The Core Difference
After much debugging, I finally understood the fundamental distinction:
┌─────────────────────────────────────────────────────────────────┐│ CONCURRENCY MODELS │├─────────────────────────────────────────────────────────────────┤│ ││ THREADING (Preemptive) ASYNCIO (Cooperative) ││ ───────────────────── ─────────────────────── ││ ││ ┌─────────┐ ┌─────────┐ ┌─────────────────────────┐ ││ │ Thread1 │ │ Thread2 │ │ Event Loop (1) │ ││ │ ●──────┼───┼──────● │ │ │ ││ └────┬────┘ └────┬────┘ │ ┌───┐ ┌───┐ ┌───┐ │ ││ │ │ │ │ C1│ │ C2│ │ C3│ │ ││ ▼ ▼ │ └─┬─┘ └─┬─┘ └─┬─┘ │ ││ ┌─────────────────────┐ │ │ │ │ │ ││ │ OS Scheduler │ │ │ │ │ │ ││ │ (decides when) │ │ ▼ ▼ ▼ │ ││ └─────────────────────┘ │ await await await │ ││ │ │ (you decide when) │ ││ ▼ └─────────────────────────┘ ││ ┌─────────────────────┐ ││ │ GIL │ NO GIL CONTENTION ││ │ (only 1 at a time) │ (only 1 thread exists) ││ └─────────────────────┘ ││ ││ Switch: ANY bytecode Switch: ONLY at await ││ Memory: ~8MB per thread Memory: ~KB per coroutine ││ Locks: REQUIRED Locks: OFTEN optional │└─────────────────────────────────────────────────────────────────┘Threading: The OS scheduler can switch threads at ANY bytecode instruction. This is “preemptive” - the OS preempts your code.
Asyncio: The event loop switches coroutines ONLY at explicit await points. This is “cooperative” - your code cooperates by yielding control.
Why This Matters: The Switching Point
The key insight is where switches happen:
THREADING:┌──────────────────────────────────────────────────────────────┐│ Bytecode execution (Thread 1) ││ LOAD_GLOBAL ◄── OS can switch HERE ││ LOAD_FAST ◄── OS can switch HERE ││ INPLACE_ADD ◄── OS can switch HERE (race condition!) ││ STORE_FAST ◄── OS can switch HERE ││ LOAD_GLOBAL ◄── OS can switch HERE │└──────────────────────────────────────────────────────────────┘
ASYNCIO:┌──────────────────────────────────────────────────────────────┐│ Bytecode execution (Coroutine 1) ││ LOAD_GLOBAL ◄── Cannot switch ││ LOAD_FAST ◄── Cannot switch ││ INPLACE_ADD ◄── Cannot switch (SAFE!) ││ STORE_FAST ◄── Cannot switch ││ AWAIT ◄── ONLY HERE can switch to another coroutine │└──────────────────────────────────────────────────────────────┘This means with asyncio, I know EXACTLY when context switches happen. With threading, switches can occur anywhere.
The Race Condition Experiment
I ran this experiment to truly understand the difference:
Threading Version (Race Condition)
import threadingimport time
counter = 0lock = threading.Lock()
def increment_unsafe(): global counter for _ in range(1_000_000): counter += 1 # Read, modify, write - can interleave!
def increment_safe(): global counter for _ in range(1_000_000): with lock: # Explicit lock required counter += 1
# Test unsafecounter = 0threads = [threading.Thread(target=increment_unsafe) for _ in range(4)]for t in threads: t.start()for t in threads: t.join()print(f"Unsafe: {counter}/4,000,000") # Wrong!
# Test safecounter = 0threads = [threading.Thread(target=increment_safe) for _ in range(4)]for t in threads: t.start()for t in threads: t.join()print(f"Safe: {counter}/4,000,000") # Correct: 4,000,000Output:
Unsafe: 2,847,293/4,000,000Safe: 4,000,000/4,000,000Asyncio Version (No Race Condition)
import asyncio
counter = 0 # No lock needed!
async def increment(name): global counter for i in range(1_000_000): # This block is ATOMIC - no switch possible counter += 1
# Only here can another coroutine run if i % 100_000 == 0: await asyncio.sleep(0) # Yield point
async def main(): await asyncio.gather( increment("C1"), increment("C2"), increment("C3"), increment("C4"), ) print(f"Result: {counter}/4,000,000") # Always correct!
asyncio.run(main())Output:
Result: 4,000,000/4,000,000No lock needed because counter += 1 cannot be interrupted by another coroutine - only await statements yield control.
When I Chose Wrong (And How I Fixed It)
Mistake 1: Threading for High Concurrency
I tried to handle 10,000 concurrent downloads with threading:
import threadingimport urllib.request
def download(url): return urllib.request.urlopen(url).read()
# BAD: 10,000 threads = ~80GB virtual memory!threads = [threading.Thread(target=download, args=(url,)) for url in urls * 10000]
# Each thread stack: ~8MB# 10,000 × 8MB = 80GB virtual memoryFix: Use asyncio for high concurrency:
import asyncioimport aiohttp
async def download(session, url): async with session.get(url) as response: return await response.read()
async def main(): urls = ["https://example.com"] * 10000 async with aiohttp.ClientSession() as session: # 10,000 coroutines = ~50MB total # Each coroutine: ~KB await asyncio.gather( *[download(session, url) for url in urls] )
asyncio.run(main())Mistake 2: Asyncio with Blocking Code
I used requests (blocking) in an async function:
import requests # Blocking library!
async def fetch(url): # This BLOCKS the entire event loop! # All other coroutines freeze response = requests.get(url) return response.json()
# Everything stops while this runsFix 1: Use async-compatible library:
import aiohttp
async def fetch(url): async with aiohttp.ClientSession() as session: async with session.get(url) as response: return await response.json()Fix 2: Offload to executor:
import asyncioimport requestsfrom concurrent.futures import ThreadPoolExecutor
async def fetch(url): loop = asyncio.get_event_loop() # Run blocking code in thread pool response = await loop.run_in_executor( None, # Default executor lambda: requests.get(url) ) return response.json()Mistake 3: CPU-bound Work in Asyncio
I expected asyncio to speed up CPU-intensive calculations:
import asyncio
async def cpu_heavy(): total = 0 for i in range(10_000_000): total += i ** 2 # Pure CPU work if i % 100_000 == 0: await asyncio.sleep(0) # Yield, but doesn't help! return total
# Still single-threaded! No parallelism for CPU workFix: Use multiprocessing for CPU-bound:
from concurrent.futures import ProcessPoolExecutorimport asyncio
def cpu_heavy(): # Regular function total = 0 for i in range(10_000_000): total += i ** 2 return total
async def main(): loop = asyncio.get_event_loop() # Each process has its own GIL! with ProcessPoolExecutor() as pool: results = await asyncio.gather( loop.run_in_executor(pool, cpu_heavy), loop.run_in_executor(pool, cpu_heavy), loop.run_in_executor(pool, cpu_heavy), loop.run_in_executor(pool, cpu_heavy), ) return resultsThe Decision Matrix
I created this flowchart to decide which approach to use:
┌─────────────────────────────────────────────────────────────────┐│ CONCURRENCY DECISION TREE │└─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────┐ │ CPU-bound or │ │ I/O-bound? │ └────────┬────────┘ │ ┌──────────────┴──────────────┐ ▼ ▼ CPU-bound I/O-bound │ │ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ Multiprocessing │ │ Concurrency OK? │ │ (separate GIL) │ └────────┬────────┘ └─────────────────┘ │ ┌──────┴──────┐ ▼ ▼ Yes No │ │ ▼ ▼ ┌────────────────┐ ┌────────────────┐ │ >100 tasks? │ │ Multiprocessing│ └────────┬───────┘ └────────────────┘ │ ┌──────────┴──────────┐ ▼ ▼ Yes No │ │ ▼ ▼ ┌────────────────┐ ┌────────────────┐ │ Async library │ │ Async library │ │ available? │ │ available? │ └────────┬───────┘ └────────┬───────┘ │ │ ┌──────┴──────┐ ┌──────┴──────┐ ▼ ▼ ▼ ▼ Yes No Yes No │ │ │ │ ▼ ▼ ▼ ▼ ┌─────────┐ ┌────────┐ ┌─────────┐ ┌────────┐ │ ASYNCIO │ │ ASYNCIO│ │ ASYNCIO │ │THREAD- │ │ │ │+executor│ │ │ │ ING │ └─────────┘ └────────┘ └─────────┘ └────────┘Memory Comparison
This table shows why asyncio scales better:
| Tasks | Threading (Memory) | Asyncio (Memory) | Ratio |
|---|---|---|---|
| 100 | ~800 MB | ~5 MB | 160:1 |
| 1,000 | ~8 GB | ~50 MB | 160:1 |
| 10,000 | ~80 GB | ~500 MB | 160:1 |
| 100,000 | ~800 GB | ~5 GB | 160:1 |
Each thread needs ~8MB stack. Each coroutine needs only kilobytes.
The GIL Reality Check
Both threading and asyncio are limited by the GIL for CPU-bound work:
┌────────────────────────────────────────────────────────────────┐│ GIL BEHAVIOR │├────────────────────────────────────────────────────────────────┤│ ││ THREADING + CPU-BOUND: ││ ┌─────────────────────────────────────────┐ ││ │ Thread 1: ████████░░░░░░░░████████░░░░ │ (GIL contention) ││ │ Thread 2: ░░░░░░░░████████░░░░░░░░████ │ ││ │ Thread 3: ░░░░████████░░░░░░░░████████░ │ ││ └─────────────────────────────────────────┘ ││ Result: No speedup, just overhead ││ ││ THREADING + I/O-BOUND: ││ ┌─────────────────────────────────────────┐ ││ │ Thread 1: ████████░░░░░░░░░░░░░░░░░░░░ │ ││ │ Thread 2: ░░░░░░░░████████░░░░░░░░░░░░ │ (I/O releases ││ │ Thread 3: ░░░░░░░░░░░░░░░░████████░░░ │ GIL) ││ └─────────────────────────────────────────┘ ││ Result: Effective concurrency ││ ││ ASYNCIO: ││ ┌─────────────────────────────────────────┐ ││ │ Single Thread: █████░░░░░██████░░░░░░░ │ ││ │ │ │ │ │ ││ │ ▼ ▼ ▼ │ ││ │ await await await │ ││ └─────────────────────────────────────────┘ ││ Result: No GIL contention (only 1 thread) ││ │└────────────────────────────────────────────────────────────────┘Practical Guidelines
After all my mistakes, here’s my decision process:
Use Threading When:
- Fewer than 100 concurrent I/O operations
- Using blocking libraries (requests, sqlite3, etc.)
- Working with existing synchronous codebases
- Simple scripts where asyncio overhead isn’t justified
Use Asyncio When:
- More than 100 concurrent connections
- HTTP APIs, websockets, chat servers
- Async-compatible libraries available (aiohttp, aiopg, etc.)
- Predictable race-condition behavior matters
- Memory efficiency is critical
Use Multiprocessing When:
- CPU-bound work (calculations, image processing)
- True parallelism needed
- Each task is independent
Common Patterns
Pattern 1: Threading with Bounded Pool
from concurrent.futures import ThreadPoolExecutorimport urllib.request
def fetch(url): return urllib.request.urlopen(url).read()
urls = [...] # 1000 URLs
# Limit to 50 concurrent threadswith ThreadPoolExecutor(max_workers=50) as executor: results = list(executor.map(fetch, urls))Pattern 2: Asyncio with Rate Limiting
import asyncioimport aiohttp
class RateLimiter: def __init__(self, rate_per_second): self.interval = 1.0 / rate_per_second self.last_time = 0
async def acquire(self): now = asyncio.get_event_loop().time() wait_time = self.last_time + self.interval - now if wait_time > 0: await asyncio.sleep(wait_time) self.last_time = now
async def fetch(session, url, limiter): await limiter.acquire() # Rate limit async with session.get(url) as response: return await response.text()
async def main(): limiter = RateLimiter(10) # 10 requests/second async with aiohttp.ClientSession() as session: tasks = [fetch(session, url, limiter) for url in urls] results = await asyncio.gather(*tasks)Pattern 3: Hybrid Asyncio + Multiprocessing
import asynciofrom concurrent.futures import ProcessPoolExecutor
def cpu_intensive(data): # Pure CPU work - runs in separate process return sum(x ** 2 for x in data)
async def fetch_and_process(url): async with aiohttp.ClientSession() as session: async with session.get(url) as response: data = await response.json()
# Offload CPU work to process pool loop = asyncio.get_event_loop() with ProcessPoolExecutor() as pool: result = await loop.run_in_executor( pool, cpu_intensive, data ) return resultSummary Table
| Aspect | Threading | Asyncio | Multiprocessing |
|---|---|---|---|
| Threads | Multiple OS threads | Single thread | Multiple processes |
| Switching | Preemptive (OS) | Cooperative (await) | N/A |
| GIL | Limited | Not applicable | Bypassed |
| Memory/task | ~8 MB | ~KB | ~MB per process |
| Locks needed | Yes | Often no | Inter-process only |
| Best for | I/O, blocking libs | High concurrency I/O | CPU-bound |
| Race conditions | Unpredictable | Predictable | Per-process |
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments