How to Scrape Dynamic Websites with Playwright in Python
Most websites in 2026 are built with React, Vue, or similar frameworks. They load data after the initial page load using JavaScript. If you try to scrape them with requests and BeautifulSoup, you'll get an empty page.
Playwright solves this. It runs a real browser, executes JavaScript, and gives you the fully rendered page. Here's how to use it effectively.
Why Static Scraping Fails on Modern Sites
When you use requests.get() on a JavaScript-heavy site, you get the raw HTML before any JS runs. For a React app, that's usually just an empty .
import requests
from bs4 import BeautifulSoup
# This returns almost nothing useful on a JS-rendered site
response = requests.get("https://example-spa.com/products")
soup = BeautifulSoup(response.text, "lxml")
print(soup.select(".product-card")) # [] — empty list
The product data loads via JavaScript after the page renders. You need a browser to execute that JS.
Setting Up Playwright
Install Playwright and its browser binaries:
pip install playwright
playwright install chromium
Chromium is the lightest option. You don't need Firefox or WebKit unless you're testing cross-browser behavior.
Basic Playwright Scraping
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto("https://example-spa.com/products")
# Wait for the product cards to actually render
page.wait_for_selector(".product-card")
# Now extract the data
cards = page.query_selector_all(".product-card")
for card in cards:
name = card.query_selector(".name").inner_text()
price = card.query_selector(".price").inner_text()
print(f"{name}: {price}")
browser.close()
The key line is wait_for_selector. It pauses execution until the target element appears in the DOM. Without it, you'll scrape before the data loads.
Waiting for Dynamic Content
Different sites need different waiting strategies:
# Wait for a specific element
page.wait_for_selector(".product-card", timeout=10000)
# Wait for network requests to finish
page.wait_for_load_state("networkidle")
# Wait for a specific response from an API
page.wait_for_response("/api/products")
# Custom wait — poll until a condition is true
page.wait_for_function("document.querySelectorAll('.item').length > 10")
networkidle waits until there are no network requests for 500ms. It works well for most SPAs but can be slow on sites with analytics pings.
Scraping Infinite Scroll Pages
Infinite scroll is everywhere — social feeds, product listings, search results. The trick is to scroll, wait for new content, and repeat.
from playwright.sync_api import sync_playwright
def scrape_infinite_scroll(url, item_selector, max_items=100):
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto(url)
page.wait_for_selector(item_selector)
items = []
last_count = 0
while len(items) < max_items:
# Scroll to bottom
page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
page.wait_for_timeout(2000) # Wait for new content
elements = page.query_selector_all(item_selector)
items = [el.inner_text() for el in elements]
# Stop if no new items loaded
if len(items) == last_count:
break
last_count = len(items)
browser.close()
return items[:max_items]
Intercepting Network Requests (The Smart Approach)
This is the technique most tutorials skip, and it's the most powerful. Instead of parsing the rendered HTML, intercept the API calls the page makes and grab the raw JSON.
from playwright.sync_api import sync_playwright
import json
products = []
def handle_response(response):
"""Capture API responses containing product data."""
if "/api/products" in response.url and response.status == 200:
data = response.json()
products.extend(data["items"])
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
# Listen for API responses before navigating
page.on("response", handle_response)
page.goto("https://example-spa.com/products")
page.wait_for_load_state("networkidle")
browser.close()
# products now has clean JSON data — no HTML parsing needed
print(json.dumps(products[:3], indent=2))
This gives you structured data directly. No CSS selectors, no brittle HTML parsing. If the site uses an API internally (most SPAs do), this is the way.
Handling Single-Page Applications
SPAs don't reload the page when you navigate. Clicking a link changes the URL and fetches data, but Playwright handles this fine — you just need to wait for the new content.
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto("https://example-spa.com")
# Click through to a category page
page.click("a[href='/electronics']")
page.wait_for_selector(".product-card")
# Click into a product detail
page.click(".product-card >> nth=0")
page.wait_for_selector(".product-detail")
title = page.inner_text("h1.product-title")
price = page.inner_text(".product-price")
print(f"{title}: {price}")
browser.close()
Performance Tips
Playwright is slower than requests because it runs an actual browser. Here's how to speed things up:
# Block images, CSS, and fonts to speed up loading
def block_unnecessary(route, request):
if request.resource_type in ["image", "stylesheet", "font", "media"]:
route.abort()
else:
route.continue_()
page.route("**/*", block_unnecessary)
Other speed improvements:
- •Use
headless=True(default, but worth mentioning) - •Reuse browser instances across pages instead of launching a new browser each time
- •Use
page.wait_for_selectorinstead ofnetworkidlewhen possible - •Set a viewport size to prevent responsive breakpoint re-renders
| Technique | Speed Impact |
|---|---|
| Block images/CSS | 2-3x faster |
| Reuse browser | Saves 1-2s per page |
| Targeted waits | 30-50% faster than networkidle |
| Headless mode | 10-20% faster than headed |
When to Use Playwright vs Requests
Use requests + BeautifulSoup when:
- •The page content is in the initial HTML
- •You need to scrape thousands of pages quickly
- •The site has a hidden API you can call directly
- •Content loads via JavaScript after page load
- •You need to click buttons, fill forms, or scroll
- •The site requires browser-like behavior to avoid detection
What's Next
Dynamic scraping is where things get interesting. Once you're comfortable with Playwright, the next steps are handling pagination patterns, managing proxies for large-scale scraping, and intercepting APIs to skip HTML parsing entirely.
The Master Web Scraping course covers all of this with hands-on projects you build from scratch.