Skip to main content
BETAUnder active development. Some features may not work as expected.

Async Web Scraping: Speed Up Python Scraping with asyncio

advanced

Async scraping uses asynchronous programming (Python's asyncio) to send multiple HTTP requests concurrently instead of waiting for each one to complete before starting the next. This can speed up scraping by 10-50x.

Why Async?

With synchronous scraping, you wait for each page to download before requesting the next:

code
Page 1 (2s) → Page 2 (2s) → Page 3 (2s) = 6 seconds total

With async scraping, you request all pages simultaneously:

code
Page 1 (2s) ↘
Page 2 (2s) → All done in ~2 seconds
Page 3 (2s) ↗

Basic Async Scraping

python
import asyncio
import aiohttp
from bs4 import BeautifulSoup

async def fetch(session, url): async with session.get(url) as response: html = await response.text() soup = BeautifulSoup(html, "lxml") title = soup.select_one("h1").text return {"url": url, "title": title}

async def main(): urls = [f"https://example.com/page/{i}" for i in range(1, 101)]

async with aiohttp.ClientSession() as session: tasks = [fetch(session, url) for url in urls] results = await asyncio.gather(*tasks)

for result in results: print(result)

asyncio.run(main())

Rate Limiting with Semaphore

Don't blast a server with 1,000 concurrent requests. Use a semaphore:

python
semaphore = asyncio.Semaphore(10)  # max 10 concurrent

async def fetch_limited(session, url): async with semaphore: async with session.get(url) as response: return await response.text()

When to Use Async Scraping

  • Scraping hundreds or thousands of pages from the same site
  • Pages are mostly I/O-bound (waiting for server response)
  • You need to finish faster without adding more machines

When NOT to Use Async

  • Small scraping jobs (under 50 pages) — not worth the complexity
  • CPU-bound processing (use multiprocessing instead)
  • Sites with strict rate limiting (async won't help if you're limited to 1 req/sec)

Async vs. Multiprocessing vs. Threading

ApproachBest ForOverhead
Async (asyncio)I/O-bound (HTTP requests)Low
ThreadingI/O-bound (simpler API)Medium
MultiprocessingCPU-bound (data processing)High

Learn Async Scraping hands-on

This glossary entry covers the basics. The Master Web Scraping course teaches you to use async scraping in real projects across 16 in-depth chapters.

Get Instant Access — $19

$ need_help?

We're here for you