April 5, 202614 min readby Nabeel

How to Scrape Dynamic Websites with Playwright in Python

playwrightjavascriptintermediate

Most websites in 2026 are built with React, Vue, or similar frameworks. They load data after the initial page load using JavaScript. If you try to scrape them with requests and BeautifulSoup, you'll get an empty page.

Playwright solves this. It runs a real browser, executes JavaScript, and gives you the fully rendered page. Here's how to use it effectively.

Why Static Scraping Fails on Modern Sites

When you use requests.get() on a JavaScript-heavy site, you get the raw HTML before any JS runs. For a React app, that's usually just an empty

python

import requests
from bs4 import BeautifulSoup
# This returns almost nothing useful on a JS-rendered site
response = requests.get("https://example-spa.com/products")
soup = BeautifulSoup(response.text, "lxml")
print(soup.select(".product-card"))  # [] — empty list

The product data loads via JavaScript after the page renders. You need a browser to execute that JS.

Setting Up Playwright

Install Playwright and its browser binaries:

bash

pip install playwright
playwright install chromium

Chromium is the lightest option. You don't need Firefox or WebKit unless you're testing cross-browser behavior.

Basic Playwright Scraping

python

from playwright.sync_api import sync_playwright
with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    page.goto("https://example-spa.com/products")
# Wait for the product cards to actually render
    page.wait_for_selector(".product-card")
# Now extract the data
    cards = page.query_selector_all(".product-card")
    for card in cards:
        name = card.query_selector(".name").inner_text()
        price = card.query_selector(".price").inner_text()
        print(f"{name}: {price}")
browser.close()

The key line is wait_for_selector. It pauses execution until the target element appears in the DOM. Without it, you'll scrape before the data loads.

Waiting for Dynamic Content

Different sites need different waiting strategies:

python

# Wait for a specific element
page.wait_for_selector(".product-card", timeout=10000)
# Wait for network requests to finish
page.wait_for_load_state("networkidle")
# Wait for a specific response from an API
page.wait_for_response("/api/products")
# Custom wait — poll until a condition is true
page.wait_for_function("document.querySelectorAll('.item').length > 10")

networkidle waits until there are no network requests for 500ms. It works well for most SPAs but can be slow on sites with analytics pings.

Scraping Infinite Scroll Pages

Infinite scroll is everywhere — social feeds, product listings, search results. The trick is to scroll, wait for new content, and repeat.

python

from playwright.sync_api import sync_playwright
def scrape_infinite_scroll(url, item_selector, max_items=100):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        page.goto(url)
        page.wait_for_selector(item_selector)
items = []
        last_count = 0
while len(items) < max_items:
            # Scroll to bottom
            page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
            page.wait_for_timeout(2000)  # Wait for new content
elements = page.query_selector_all(item_selector)
            items = [el.inner_text() for el in elements]
# Stop if no new items loaded
            if len(items) == last_count:
                break
            last_count = len(items)
browser.close()
        return items[:max_items]

Intercepting Network Requests (The Smart Approach)

This is the technique most tutorials skip, and it's the most powerful. Instead of parsing the rendered HTML, intercept the API calls the page makes and grab the raw JSON.

python

from playwright.sync_api import sync_playwright
import json
products = []
def handle_response(response):
    """Capture API responses containing product data."""
    if "/api/products" in response.url and response.status == 200:
        data = response.json()
        products.extend(data["items"])
with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
# Listen for API responses before navigating
    page.on("response", handle_response)
page.goto("https://example-spa.com/products")
    page.wait_for_load_state("networkidle")
browser.close()
# products now has clean JSON data — no HTML parsing needed
print(json.dumps(products[:3], indent=2))

This gives you structured data directly. No CSS selectors, no brittle HTML parsing. If the site uses an API internally (most SPAs do), this is the way.

Handling Single-Page Applications

SPAs don't reload the page when you navigate. Clicking a link changes the URL and fetches data, but Playwright handles this fine — you just need to wait for the new content.

python

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    page.goto("https://example-spa.com")
# Click through to a category page
    page.click("a[href='/electronics']")
    page.wait_for_selector(".product-card")
# Click into a product detail
    page.click(".product-card >> nth=0")
    page.wait_for_selector(".product-detail")
title = page.inner_text("h1.product-title")
    price = page.inner_text(".product-price")
    print(f"{title}: {price}")
browser.close()

Performance Tips

Playwright is slower than requests because it runs an actual browser. Here's how to speed things up:

python

# Block images, CSS, and fonts to speed up loading
def block_unnecessary(route, request):
    if request.resource_type in ["image", "stylesheet", "font", "media"]:
        route.abort()
    else:
        route.continue_()
page.route("**/*", block_unnecessary)

Other speed improvements:

•Use headless=True (default, but worth mentioning)
•Reuse browser instances across pages instead of launching a new browser each time
•Use page.wait_for_selector instead of networkidle when possible
•Set a viewport size to prevent responsive breakpoint re-renders

Technique	Speed Impact
Block images/CSS	2-3x faster
Reuse browser	Saves 1-2s per page
Targeted waits	30-50% faster than networkidle
Headless mode	10-20% faster than headed

When to Use Playwright vs Requests

Use requests + BeautifulSoup when:

•The page content is in the initial HTML
•You need to scrape thousands of pages quickly
•The site has a hidden API you can call directly

Use Playwright when:

•Content loads via JavaScript after page load
•You need to click buttons, fill forms, or scroll
•The site requires browser-like behavior to avoid detection

What's Next

Dynamic scraping is where things get interesting. Once you're comfortable with Playwright, the next steps are handling pagination patterns, managing proxies for large-scale scraping, and intercepting APIs to skip HTML parsing entirely.

The Master Web Scraping course covers all of this with hands-on projects you build from scratch.

How to Scrape Dynamic Websites with Playwright in Python

Why Static Scraping Fails on Modern Sites

Setting Up Playwright

Basic Playwright Scraping

Waiting for Dynamic Content

Scraping Infinite Scroll Pages

Intercepting Network Requests (The Smart Approach)

Handling Single-Page Applications

Performance Tips

When to Use Playwright vs Requests

What's Next

Key Concepts

Playwright

Headless Browser

JavaScript Rendering

Selenium

Want the full course?