Requests vs aiohttp: Sync vs Async HTTP for Python Scraping
Compare Python's requests (synchronous) with aiohttp (asynchronous) for web scraping. Learn when async HTTP matters and how to choose.
Option A
Requests
Synchronous HTTP Library
Simple scripts and small projects
Easy
One request at a time
No
None
Pros
- Simplest HTTP library in Python
- Excellent documentation
- Huge ecosystem of adapters
- Session management built-in
Cons
- Synchronous — one request at a time
- Slow for large-scale scraping
- No built-in concurrency
- Can't leverage async/await
Option B
aiohttp
Asynchronous HTTP Library
High-volume concurrent scraping
Moderate
Many requests simultaneously
No
None
Pros
- Concurrent requests (10-100x throughput)
- Low memory overhead per connection
- Native async/await support
- Connection pooling built-in
Cons
- Requires understanding of asyncio
- Debugging async code is harder
- Some libraries aren't async-compatible
- More boilerplate than requests
The Verdict
Start with requests for scripts under 100 pages. Switch to aiohttp when speed matters — scraping 1,000+ pages, or when each page takes time to respond. The async complexity pays off quickly at scale.
The Speed Difference
The fundamental difference: requests waits for each response before sending the next request. aiohttp sends many requests and handles responses as they arrive.
requests (synchronous):
Page1 ──── Page2 ──── Page3 ──── Page4 ──── Total: 8 seconds
aiohttp (async, 4 concurrent):
Page1 ──┐
Page2 ──┤
Page3 ──┼── Total: 2 seconds
Page4 ──┘
Code Comparison
Requests (Synchronous)
import requests
from bs4 import BeautifulSoup
results = []
for i in range(1, 101):
response = requests.get(f"https://example.com/page/{i}")
soup = BeautifulSoup(response.text, "lxml")
results.append(soup.select_one("h1").text)
# ~200 seconds for 100 pages (2s each)
aiohttp (Asynchronous)
import asyncio
import aiohttp
from bs4 import BeautifulSoup
async def fetch(session, url):
async with session.get(url) as response:
html = await response.text()
soup = BeautifulSoup(html, "lxml")
return soup.select_one("h1").text
async def main():
semaphore = asyncio.Semaphore(10) # limit concurrent requests
async with aiohttp.ClientSession() as session:
async def limited_fetch(url):
async with semaphore:
return await fetch(session, url)
urls = [f"https://example.com/page/{i}" for i in range(1, 101)]
results = await asyncio.gather(*[limited_fetch(u) for u in urls])
return results
results = asyncio.run(main())
# ~20 seconds for 100 pages (10 concurrent)
When to Use Each
| Scenario | Use |
|---|---|
| Quick script, < 50 pages | requests |
| Learning web scraping | requests |
| 100+ pages from one site | aiohttp |
| Multiple sites concurrently | aiohttp |
| Scraping behind login | requests (simpler session handling) |
Alternative: httpx
httpx offers both sync and async APIs with a requests-compatible interface:
import httpx
# Sync (like requests)
response = httpx.get("https://example.com")
# Async
async with httpx.AsyncClient() as client:
response = await client.get("https://example.com")
It's a good middle ground if you want the option to go async later without rewriting everything.