April 3, 202611 min readby Nabeel

Python Requests for Web Scraping: Headers, Sessions & Cookies

pythonrequestsbeginner

The requests library is where most Python web scraping starts. Before you reach for Playwright or Scrapy, you should know how to make HTTP requests properly — with sessions, headers, cookies, and error handling.

This guide covers everything you need to use requests effectively for scraping.

Basic GET and POST Requests

python

import requests
# GET request — fetching a page
response = requests.get("https://httpbin.org/get")
print(response.status_code)  # 200
print(response.text)          # the response body
# POST request — submitting data
response = requests.post("https://httpbin.org/post", data={"key": "value"})
print(response.json())        # parsed JSON response

Most scraping uses GET. You'll use POST when submitting forms or interacting with APIs that expect it.

Setting Headers and User Agents

Bare requests without headers are the easiest way to get blocked. Every request you send has a default user agent that screams "I'm a script."

python

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.5",
    "Accept-Encoding": "gzip, deflate, br",
}
response = requests.get("https://example.com", headers=headers)

At minimum, always set a realistic User-Agent. The other headers make your requests look more like a real browser.

Using Sessions for Cookies

A Session object persists cookies across requests — exactly like a browser does. This is essential for sites that require login or track state.

python

session = requests.Session()
# First request sets cookies
session.get("https://example.com")
# Subsequent requests automatically include those cookies
response = session.get("https://example.com/dashboard")
# You can also set default headers for the session
session.headers.update({
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/125.0.0.0",
})
# All requests through this session now use these headers
response = session.get("https://example.com/api/data")

Sessions also reuse TCP connections, making multiple requests to the same host faster.

Handling Redirects

By default, requests follows redirects automatically. Sometimes you want to control this.

python

# Follow redirects (default behavior)
response = requests.get("https://example.com/old-page")
print(response.url)  # shows the final URL after redirects
# Disable redirects to inspect them manually
response = requests.get("https://example.com/old-page", allow_redirects=False)
print(response.status_code)           # 301 or 302
print(response.headers["Location"])   # where it wants to redirect

Timeouts and Retries

Never make a request without a timeout. Without one, your scraper can hang forever on an unresponsive server.

python

from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
# Simple timeout
response = requests.get("https://example.com", timeout=10)  # 10 seconds
# Retry strategy for production scrapers
session = requests.Session()
retries = Retry(
    total=3,               # retry up to 3 times
    backoff_factor=1,      # wait 1s, 2s, 4s between retries
    status_forcelist=[429, 500, 502, 503, 504],
)
session.mount("https://", HTTPAdapter(max_retries=retries))
session.mount("http://", HTTPAdapter(max_retries=retries))
# This will automatically retry on server errors
response = session.get("https://example.com/api/data", timeout=10)

The retry adapter handles flaky servers and rate limiting automatically. The backoff_factor adds exponential delays between retries.

POST Requests for Form Submission

Some sites require form submissions to access data. Use POST with the form fields:

python

# Form data (application/x-www-form-urlencoded)
response = requests.post("https://example.com/search", data={
    "query": "python web scraping",
    "page": 1,
})
# JSON data (application/json) — common for APIs
response = requests.post("https://example.com/api/search", json={
    "query": "python web scraping",
    "filters": {"category": "tutorials"},
})

Use data= for traditional form submissions and json= for API endpoints.

Downloading Files

Downloading images, PDFs, or other files is straightforward:

python

# Download a file
response = requests.get("https://example.com/report.pdf", stream=True)
with open("report.pdf", "wb") as f:
    for chunk in response.iter_content(chunk_size=8192):
        f.write(chunk)

The stream=True parameter prevents loading the entire file into memory. Important for large files.

Response Handling

Different endpoints return different formats. Here's how to handle each:

python

response = requests.get("https://example.com/page")
# HTML content — pass to BeautifulSoup
html = response.text
# JSON response — parse directly
data = response.json()
# Binary content (images, PDFs)
binary = response.content
# Check encoding
print(response.encoding)  # utf-8, ISO-8859-1, etc.
# Force encoding if auto-detection fails
response.encoding = "utf-8"
html = response.text

Property	Returns	Use Case
`.text`	String (decoded)	HTML pages
`.json()`	Dict/List	API responses
`.content`	Bytes	Files, images
`.status_code`	Integer	Error checking
`.headers`	Dict	Content-type, cookies

Error Handling Patterns

Production scrapers need proper error handling. Here's the pattern I use:

python

import requests
import time
def fetch_page(url, session, max_retries=3):
    """Fetch a URL with error handling and manual retry logic."""
    for attempt in range(max_retries):
        try:
            response = session.get(url, timeout=10)
            response.raise_for_status()  # raises exception for 4xx/5xx
            return response
        except requests.exceptions.HTTPError as e:
            if response.status_code == 429:
                wait = 2 ** attempt  # exponential backoff
                print(f"Rate limited. Waiting {wait}s...")
                time.sleep(wait)
            else:
                print(f"HTTP error {response.status_code} for {url}")
                return None
        except requests.exceptions.ConnectionError:
            print(f"Connection failed for {url}. Retrying...")
            time.sleep(1)
        except requests.exceptions.Timeout:
            print(f"Timeout for {url}. Retrying...")
    return None

Always use raise_for_status() to catch HTTP errors. It's easy to miss a 403 or 500 if you only check for exceptions.

What's Next

The requests library handles 80% of scraping tasks. For JavaScript-rendered pages, you'll need Playwright. For large-scale scraping, you'll want proxy rotation and concurrent requests.

The Master Web Scraping course builds on these fundamentals with real-world projects that put them all together.

Python Requests for Web Scraping: Headers, Sessions & Cookies

Basic GET and POST Requests

Setting Headers and User Agents

Using Sessions for Cookies

Handling Redirects

Timeouts and Retries

POST Requests for Form Submission

Downloading Files

Response Handling

Error Handling Patterns

What's Next

Key Concepts

HTTP Request

User-Agent

Session Cookie

Web Scraping

Want the full course?