What Is Selenium? Web Browser Automation for Scraping
Selenium is an open-source browser automation framework originally built for testing web applications. It controls real browsers programmatically and has been widely used for web scraping, especially for JavaScript-heavy websites.
Selenium for Web Scraping
Selenium was the go-to tool for scraping dynamic websites for over a decade. It launches a real browser, navigates pages, and lets you interact with elements just like a human user.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://example.com/products")
# Wait for products to load
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, ".product-card"))
)
products = driver.find_elements(By.CSS_SELECTOR, ".product-card")
for product in products:
name = product.find_element(By.CSS_SELECTOR, ".title").text
price = product.find_element(By.CSS_SELECTOR, ".price").text
print(f"{name}: {price}")
driver.quit()
Why Playwright Is Replacing Selenium
| Feature | Selenium | Playwright |
|---|---|---|
| Auto-wait | Manual waits needed | Built-in |
| Speed | Slower | Faster |
| API design | Verbose | Cleaner |
| Browser support | All major | Chromium, Firefox, WebKit |
| Network interception | Limited | Full support |
| Async support | No | Yes |
When Selenium Still Makes Sense
- •Your project already uses Selenium (switching has a cost)
- •You need a language Playwright doesn't support well
- •You're doing actual browser testing (Selenium's original purpose)
- •Your team knows Selenium and the project is time-sensitive
Common Selenium Issues in Scraping
- •ChromeDriver version mismatches: Chrome updates break your driver
- •No auto-wait: You must manually add waits everywhere
- •Easy to detect: Default Selenium is trivially fingerprinted
- •Memory leaks: Long-running Selenium sessions eat memory
Undetected ChromeDriver
undetected-chromedriver patches Selenium to bypass basic bot detection, but it's a band-aid. For new projects, start with Playwright + stealth plugins instead.