March 25, 202615 min readby Nabeel

How to Bypass Anti-Bot Detection: Cloudflare, DataDome & More

anti-botcloudflareadvanced

If you've tried scraping anything beyond simple blogs, you've run into anti-bot detection. The 403 Forbidden response, the Cloudflare challenge page, the empty response that loads fine in your browser but breaks in your script.

This guide covers how modern anti-bot systems work and how to get past them.

How Anti-Bot Systems Detect Scrapers

Anti-bot systems combine multiple signals to tell humans apart from bots. Here's what they look at.

1. TLS Fingerprinting

This is the most important detection method in 2026, and most tutorials still don't cover it.

When your scraper connects to a website via HTTPS, it performs a TLS handshake. The specific ciphers, extensions, and parameters your client sends create a unique "fingerprint." Python's requests library has a TLS fingerprint that looks nothing like a real browser.

The fix: use curl_cffi, which impersonates real browser TLS fingerprints:

python

from curl_cffi import requests
# Impersonate Chrome 120
response = requests.get(
    "https://protected-site.com",
    impersonate="chrome120"
)
print(response.status_code)  # 200!

This one change gets you past roughly 90% of anti-bot systems. Most of them lean on TLS fingerprinting as the primary check.

2. HTTP/2 Fingerprinting

Similar to TLS fingerprinting, but at the HTTP protocol level. Anti-bot systems analyze your HTTP/2 settings frames, header order, and priority signals.

Standard Python HTTP libraries send HTTP/1.1 by default, or send HTTP/2 with settings that scream "I'm a bot."

curl_cffi handles this too. It sends HTTP/2 frames that match real browsers exactly.

3. JavaScript Challenges

Cloudflare Turnstile, DataDome's JS challenge, and similar systems require your client to execute JavaScript. They inject a script that:

1.Checks browser APIs (canvas, WebGL, fonts)
2.Measures mouse movement and timing
3.Generates a token that must be sent with subsequent requests

For simple challenges, you can sometimes extract the challenge token by solving the JavaScript manually or using a headless browser for just the challenge page, then reuse the cookies for subsequent requests.

For complex challenges, use a stealth browser:

python

from playwright.sync_api import sync_playwright
with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    context = browser.new_context(
        user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) ...",
        viewport={"width": 1920, "height": 1080},
    )
    page = context.new_page()
# Navigate and let the challenge complete
    page.goto("https://protected-site.com")
    page.wait_for_load_state("networkidle")
# Now extract your data
    content = page.content()

4. Behavioral Analysis

Advanced systems track:

•Request patterns: do you hit pages in an order no human would?
•Timing: are your requests suspiciously evenly spaced?
•Session behavior: do you visit the homepage before hitting product pages?

Fix this by adding randomized delays and mimicking human browsing patterns:

python

import random
import time
def human_delay():
    """Random delay between 1-4 seconds"""
    time.sleep(random.uniform(1.0, 4.0))
# Between each request
human_delay()

Anti-Bot Systems: A Quick Reference

Cloudflare

Detects via TLS fingerprinting, JavaScript challenges (Turnstile), rate limiting, and IP reputation. Difficulty: medium to hard.

curl_cffi with impersonate="chrome120" handles most Cloudflare-protected sites. For Turnstile challenges, you may need a headless browser for the initial challenge, then reuse the cf_clearance cookie.

DataDome

Detects via TLS fingerprinting, JavaScript fingerprinting, behavioral analysis, and device fingerprinting. Difficulty: hard.

Residential proxies + curl_cffi + proper headers. DataDome is aggressive about flagging datacenter IPs.

PerimeterX (HUMAN)

Detects via JavaScript challenges, behavioral analysis, and sensor data collection. Difficulty: hard.

Stealth Playwright with realistic mouse movements. PerimeterX leans on behavioral signals more than most.

Akamai Bot Manager

Detects via TLS fingerprinting, HTTP/2 fingerprinting, sensor data, and cookie validation. Difficulty: very hard.

Full browser automation with residential proxies. Akamai's sensor data collection is the most sophisticated of the bunch.

Essential Headers for Every Request

No matter which anti-bot system you're dealing with, always send realistic headers:

python

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Sec-Ch-Ua": '"Not_A Brand";v="8", "Chromium";v="120", "Google Chrome";v="120"',
    "Sec-Ch-Ua-Mobile": "?0",
    "Sec-Ch-Ua-Platform": '"macOS"',
    "Sec-Fetch-Dest": "document",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Site": "none",
    "Sec-Fetch-User": "?1",
    "Upgrade-Insecure-Requests": "1",
}

Missing or incorrect headers are the easiest way to get flagged.

The Proxy Factor

Even with perfect TLS impersonation and headers, you'll get blocked if you send too many requests from the same IP. Proxies are non-negotiable for serious scraping:

•Datacenter proxies are cheap and fast, but many sites block known datacenter IP ranges
•Residential proxies cost more, but use real ISP IPs that look like regular users
•Mobile proxies are the hardest to block since mobile IPs are shared among many users

For anti-bot-heavy sites, residential proxies are usually the minimum.

Putting It All Together

Here's a template that combines all the techniques above:

python

from curl_cffi import requests
import random
import time
session = requests.Session(impersonate="chrome120")
headers = {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
}
def scrape_page(url):
    time.sleep(random.uniform(1.5, 3.5))
    response = session.get(url, headers=headers)
if response.status_code == 403:
        print(f"Blocked on {url} — try rotating proxy")
        return None
return response.text

This gets you past most anti-bot systems. For the toughest sites, combine it with browser automation and residential proxies.

Want to Go Deeper?

Chapter 11 of the Master Web Scraping course covers anti-bot evasion with hands-on exercises against real protected sites, including TLS impersonation, cookie extraction workflows, and stealth browser setups.

How to Bypass Anti-Bot Detection: Cloudflare, DataDome & More

How Anti-Bot Systems Detect Scrapers

1. TLS Fingerprinting

2. HTTP/2 Fingerprinting

3. JavaScript Challenges

4. Behavioral Analysis

Anti-Bot Systems: A Quick Reference

Cloudflare

DataDome

PerimeterX (HUMAN)

Akamai Bot Manager

Essential Headers for Every Request

The Proxy Factor

Putting It All Together

Want to Go Deeper?

Key Concepts

Anti-Bot Detection

Proxy Rotation

User-Agent

CAPTCHA

Rate Limiting

Want the full course?