What Is a User-Agent? How It Affects Web Scraping
A User-Agent is an HTTP header that identifies the client making a request — including the browser name, version, and operating system. Websites use User-Agent strings to serve different content and to detect automated scrapers.
Why User-Agent Matters for Scraping
Many websites check the User-Agent header and block requests that look like scripts rather than browsers. Python's requests library sends python-requests/2.x by default — an obvious giveaway.
import requests
# This gets blocked on many sites
response = requests.get("https://example.com")
# User-Agent: python-requests/2.31.0
# This works much better
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"}
response = requests.get("https://example.com", headers=headers)
Common User-Agent Strings
| Browser | User-Agent |
|---|---|
| Chrome (Windows) | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36 |
| Chrome (Mac) | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36 |
| Firefox (Windows) | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0 |
| Safari (Mac) | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 Safari/605.1.15 |
Rotating User-Agents
For large-scale scraping, rotate User-Agents to look more natural:
import random
user_agents = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0",
]
headers = {"User-Agent": random.choice(user_agents)}
response = requests.get(url, headers=headers)
Important: Consistency Matters
When rotating User-Agents, keep other headers consistent with the User-Agent you're sending. A Chrome User-Agent with Firefox-style Accept headers is a red flag. Match the full header set to the browser you're impersonating.