JavaScript Rendering & Web Scraping: What You Need to Know
JavaScript rendering is when a website uses JavaScript to dynamically load, generate, or modify page content after the initial HTML is delivered. This means the data you want isn't in the raw HTML — it's created by JavaScript running in the browser.
The Problem for Scrapers
When you fetch a JS-rendered page with requests, you get the shell but not the content:
import requests
response = requests.get("https://react-app.com/products")
# The HTML contains: <div id="root"></div>
# All product data is loaded by JavaScript AFTER the page loads
How to Detect JS Rendering
- 1.View Page Source (Ctrl+U): If the data is missing but visible on the page, it's JS-rendered
- 2.Disable JavaScript: In DevTools, disable JS and reload — if the page is empty, it's JS-dependent
- 3.Compare
curlvs. browser:curl https://example.com— if the output is sparse, JS is rendering the content
Strategies for JS-Rendered Sites
Strategy 1: Find the Hidden API (Best)
Most JS-rendered sites fetch data from an API. Check the Network tab in DevTools for XHR/Fetch requests.# Instead of rendering the page, hit the API directly
response = requests.get("https://api.example.com/v1/products?page=1")
data = response.json() # clean, structured data
Strategy 2: Use a Headless Browser
When there's no API to hit, render the page with Playwright.from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.goto("https://react-app.com/products")
page.wait_for_selector(".product-card")
content = page.content() # fully rendered HTML
browser.close()
Strategy 3: Intercept Network Requests
Capture the API calls the page makes:api_data = []
def handle_response(response):
if "/api/products" in response.url:
api_data.append(response.json())
page.on("response", handle_response)
page.goto("https://react-app.com/products")
The Decision Tree
- 4.Is data in the HTML source? → Use requests + BeautifulSoup
- 5.Does the site make API calls? → Hit the API directly
- 6.No accessible API? → Use Playwright/headless browser