How to Build a High-Throughput Web Scraper with Python aiohttp: Concurrent Requests Guide

Apr 20, 2026

Purpose

I needed to scrape thousands of URLs for a data collection project. Using the synchronous requests library, each URL took about 1 second. Scraping 1000 URLs would take over 16 minutes. I wanted to speed this up using async HTTP requests.

Environment

Python 3.11
aiohttp for async HTTP
asyncio for concurrent execution
Ubuntu 22.04

The Problem with Sequential Scraping

I started with a simple scraper using requests:

import requests
import time

def fetch(url):
    try:
        response = requests.get(url, timeout=10)
        return response.text
    except Exception as e:
        print(f"Error fetching {url}: {e}")
        return None

urls = ["https://example.com/page1", "https://example.com/page2"] * 50  # 100 URLs

start = time.time()
for url in urls:
    fetch(url)
print(f"Time: {time.time() - start:.2f}s")

I ran this:

python sequential_scraper.py

Output:

Time: 100.52s

100 URLs took 100 seconds. Each request blocked while waiting for the response.

The aiohttp Solution

I rewrote the scraper using aiohttp with concurrent requests:

import aiohttp
import asyncio
import time

async def fetch(session, url):
    """Fetch a single URL with timeout and error handling."""
    try:
        async with session.get(url, timeout=aiohttp.ClientTimeout(total=10)) as response:
            return await response.text()
    except Exception as e:
        print(f"Error fetching {url}: {e}")
        return None

async def main(urls):
    """Fetch all URLs concurrently with connection limits."""
    connector = aiohttp.TCPConnector(limit_per_host=100)
    async with aiohttp.ClientSession(connector=connector) as session:
        tasks = [fetch(session, url) for url in urls]
        results = await asyncio.gather(*tasks)
        return results

urls = ["https://example.com/page1", "https://example.com/page2"] * 50  # 100 URLs

start = time.time()
results = asyncio.run(main(urls))
print(f"Fetched {len([r for r in results if r])} pages")
print(f"Time: {time.time() - start:.2f}s")

Running the async scraper:

python async_scraper.py

Output:

Fetched 100 pages
Time: 2.15s

100 URLs completed in 2 seconds instead of 100 seconds. That’s a 50x speedup.

How It Works

The key components:

ClientSession - One session for all requests. Connection pooling keeps TCP connections open between requests.
TCPConnector - Controls concurrency. limit_per_host=100 means up to 100 concurrent connections per host. This prevents overwhelming the server.
asyncio.gather() - Runs all fetch tasks concurrently. Instead of waiting for one request at a time, all requests start together.
ClientTimeout - Prevents stalled requests. A request that hangs won’t block forever.

Sequential (requests):
┌────┐ ┌────┐ ┌────┐ ┌────┐
│ 1s │ │ 1s │ │ 1s │ │ 1s │ ... = 100s total
└────┘ └────┘ └────┘ └────┘

Async (aiohttp):
┌────┐
│ 1s │ (all 100 requests start together)
└────┘ = ~2s total

Common Mistakes

I made several mistakes before getting this working:

1. Creating session per request:

# BAD: New session for each request (slow!)
async def fetch(url):
    async with aiohttp.ClientSession() as session:  # New session every time!
        async with session.get(url) as response:
            return await response.text()

This wastes connections. Each session creates new TCP connections. Create one session and reuse it.

2. No connection limits:

# BAD: No limits, can overwhelm servers
async with aiohttp.ClientSession() as session:  # Default allows unlimited connections!
    tasks = [fetch(session, url) for url in urls]

Servers may block you for opening too many connections. Use TCPConnector with limits.

3. No timeout:

# BAD: Stalled requests block forever
async with session.get(url) as response:  # No timeout!
    return await response.text()

A hanging request blocks forever. Always set ClientTimeout.

4. No error handling:

# BAD: One failure crashes entire scraper
async with session.get(url) as response:
    return await response.text()  # Raises exception on failure

One failed URL crashes asyncio.gather(). Use try/except for resilience.

Summary

In this post, I showed how to build a high-throughput web scraper with aiohttp. The key points are: use one ClientSession for all requests, set connection limits with TCPConnector, add timeouts, and handle errors. This approach reduced my scraping time from 100 seconds to 2 seconds.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 aiohttp Documentation
👨‍💻 aiohttp ClientSession Guide

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!