What Is Rate Limiting? Respecting Server Limits When Scraping
Rate limiting is the practice of controlling the frequency of requests sent to a server. In web scraping, it means adding deliberate delays between requests to avoid overwhelming the target server or triggering anti-bot defenses.
Why Rate Limiting Matters
Sending requests too fast causes two problems:
- 1.You get blocked: Servers detect rapid-fire requests and ban your IP
- 2.You harm the site: Excessive requests can slow down or crash the server
Implementing Rate Limits in Python
import time
import random
import requests
urls = ["https://example.com/page/1", "https://example.com/page/2", ...]
for url in urls:
response = requests.get(url)
# Random delay between 1-3 seconds
time.sleep(random.uniform(1, 3))
Strategies
- •Fixed delay:
time.sleep(2)between every request. Simple but predictable. - •Random delay:
time.sleep(random.uniform(1, 3)). More human-like. - •Exponential backoff: Double the delay after each error. Good for handling 429 responses.
- •Adaptive rate: Speed up when the site responds fast, slow down when you see errors.
# Exponential backoff
delay = 1
for url in urls:
response = requests.get(url)
if response.status_code == 429:
delay *= 2 # double the delay
time.sleep(delay)
else:
delay = 1 # reset on success
time.sleep(random.uniform(0.5, 1.5))
How to Know Your Limits
- •Check the site's
robots.txtforCrawl-delaydirectives - •Watch for 429 (Too Many Requests) status codes
- •Monitor response times — if they spike, you're pushing too hard
- •Start slow and gradually increase speed
Concurrent Requests with Rate Limits
When using async scraping (e.g., asyncio + aiohttp), use a semaphore to limit concurrent requests:
import asyncio
semaphore = asyncio.Semaphore(5) # max 5 concurrent requests