Skip to main content
Back to Blog
How to Scrape Dynamic Websites with Playwright in Python
14 min readby Nabeel

How to Scrape Dynamic Websites with Playwright in Python

playwrightjavascriptintermediate

Most websites in 2026 are built with React, Vue, or similar frameworks. They load data after the initial page load using JavaScript. If you try to scrape them with requests and BeautifulSoup, you'll get an empty page.

Playwright solves this. It runs a real browser, executes JavaScript, and gives you the fully rendered page. Here's how to use it effectively.

Why Static Scraping Fails on Modern Sites

When you use requests.get() on a JavaScript-heavy site, you get the raw HTML before any JS runs. For a React app, that's usually just an empty

.

python
import requests
from bs4 import BeautifulSoup

# This returns almost nothing useful on a JS-rendered site response = requests.get("https://example-spa.com/products") soup = BeautifulSoup(response.text, "lxml") print(soup.select(".product-card")) # [] — empty list

The product data loads via JavaScript after the page renders. You need a browser to execute that JS.

Setting Up Playwright

Install Playwright and its browser binaries:

bash
pip install playwright
playwright install chromium

Chromium is the lightest option. You don't need Firefox or WebKit unless you're testing cross-browser behavior.

Basic Playwright Scraping

python
from playwright.sync_api import sync_playwright

with sync_playwright() as p: browser = p.chromium.launch(headless=True) page = browser.new_page() page.goto("https://example-spa.com/products")

# Wait for the product cards to actually render page.wait_for_selector(".product-card")

# Now extract the data cards = page.query_selector_all(".product-card") for card in cards: name = card.query_selector(".name").inner_text() price = card.query_selector(".price").inner_text() print(f"{name}: {price}")

browser.close()

The key line is wait_for_selector. It pauses execution until the target element appears in the DOM. Without it, you'll scrape before the data loads.

Waiting for Dynamic Content

Different sites need different waiting strategies:

python
# Wait for a specific element
page.wait_for_selector(".product-card", timeout=10000)

# Wait for network requests to finish page.wait_for_load_state("networkidle")

# Wait for a specific response from an API page.wait_for_response("/api/products")

# Custom wait — poll until a condition is true page.wait_for_function("document.querySelectorAll('.item').length > 10")

networkidle waits until there are no network requests for 500ms. It works well for most SPAs but can be slow on sites with analytics pings.

Scraping Infinite Scroll Pages

Infinite scroll is everywhere — social feeds, product listings, search results. The trick is to scroll, wait for new content, and repeat.

python
from playwright.sync_api import sync_playwright

def scrape_infinite_scroll(url, item_selector, max_items=100): with sync_playwright() as p: browser = p.chromium.launch(headless=True) page = browser.new_page() page.goto(url) page.wait_for_selector(item_selector)

items = [] last_count = 0

while len(items) < max_items: # Scroll to bottom page.evaluate("window.scrollTo(0, document.body.scrollHeight)") page.wait_for_timeout(2000) # Wait for new content

elements = page.query_selector_all(item_selector) items = [el.inner_text() for el in elements]

# Stop if no new items loaded if len(items) == last_count: break last_count = len(items)

browser.close() return items[:max_items]

Intercepting Network Requests (The Smart Approach)

This is the technique most tutorials skip, and it's the most powerful. Instead of parsing the rendered HTML, intercept the API calls the page makes and grab the raw JSON.

python
from playwright.sync_api import sync_playwright
import json

products = []

def handle_response(response): """Capture API responses containing product data.""" if "/api/products" in response.url and response.status == 200: data = response.json() products.extend(data["items"])

with sync_playwright() as p: browser = p.chromium.launch(headless=True) page = browser.new_page()

# Listen for API responses before navigating page.on("response", handle_response)

page.goto("https://example-spa.com/products") page.wait_for_load_state("networkidle")

browser.close()

# products now has clean JSON data — no HTML parsing needed print(json.dumps(products[:3], indent=2))

This gives you structured data directly. No CSS selectors, no brittle HTML parsing. If the site uses an API internally (most SPAs do), this is the way.

Handling Single-Page Applications

SPAs don't reload the page when you navigate. Clicking a link changes the URL and fetches data, but Playwright handles this fine — you just need to wait for the new content.

python
with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    page.goto("https://example-spa.com")

# Click through to a category page page.click("a[href='/electronics']") page.wait_for_selector(".product-card")

# Click into a product detail page.click(".product-card >> nth=0") page.wait_for_selector(".product-detail")

title = page.inner_text("h1.product-title") price = page.inner_text(".product-price") print(f"{title}: {price}")

browser.close()

Performance Tips

Playwright is slower than requests because it runs an actual browser. Here's how to speed things up:

python
# Block images, CSS, and fonts to speed up loading
def block_unnecessary(route, request):
    if request.resource_type in ["image", "stylesheet", "font", "media"]:
        route.abort()
    else:
        route.continue_()

page.route("**/*", block_unnecessary)

Other speed improvements:

  • Use headless=True (default, but worth mentioning)
  • Reuse browser instances across pages instead of launching a new browser each time
  • Use page.wait_for_selector instead of networkidle when possible
  • Set a viewport size to prevent responsive breakpoint re-renders
TechniqueSpeed Impact
Block images/CSS2-3x faster
Reuse browserSaves 1-2s per page
Targeted waits30-50% faster than networkidle
Headless mode10-20% faster than headed

When to Use Playwright vs Requests

Use requests + BeautifulSoup when:

  • The page content is in the initial HTML
  • You need to scrape thousands of pages quickly
  • The site has a hidden API you can call directly
Use Playwright when:
  • Content loads via JavaScript after page load
  • You need to click buttons, fill forms, or scroll
  • The site requires browser-like behavior to avoid detection

What's Next

Dynamic scraping is where things get interesting. Once you're comfortable with Playwright, the next steps are handling pagination patterns, managing proxies for large-scale scraping, and intercepting APIs to skip HTML parsing entirely.

The Master Web Scraping course covers all of this with hands-on projects you build from scratch.

Want the full course?

This blog post is just a taste. The Master Web Scraping course covers 16 in-depth chapters from beginner to expert.

Get Instant Access — $19

$ need_help?

We're here for you