CSS Selectors for Web Scraping: Complete Guide
A CSS selector is a pattern used to select and target specific HTML elements on a web page. In web scraping, CSS selectors are the primary way to locate the data you want to extract from a page's HTML structure.
Essential Selectors for Scraping
| Selector | What it matches | Example |
|---|---|---|
.class | Elements with a class | .product-card |
#id | Element with an ID | #main-content |
tag | All elements of a type | h2, a, div |
parent child | Nested elements | .card .price |
parent > child | Direct children only | ul > li |
[attr=value] | Attribute match | [data-id="123"] |
:nth-child(n) | Nth element | tr:nth-child(2) |
Using CSS Selectors in Python
# BeautifulSoup
soup.select(".product-card .price") # all matching
soup.select_one("h1.title") # first match
# Playwright
page.query_selector_all(".product-card") # all matching
page.query_selector("h1.title") # first match
# Scrapy
response.css(".product-card .price::text").getall()
response.css("h1.title::text").get()
Tips for Finding Good Selectors
- 1.Use your browser's DevTools: Right-click an element, click "Inspect", then right-click the element in the Elements panel and copy the selector
- 2.Prefer class names over position:
.priceis more stable thandiv:nth-child(3) > span - 3.Use data attributes:
[data-product-id]is often more reliable than class names - 4.Test in the console: Use
document.querySelectorAll(".your-selector")in the browser console to verify
Common Scraping Patterns
- •Extract text:
.select_one(".price").text - •Extract link:
.select_one("a").get("href") - •Extract image:
.select_one("img").get("src") - •Extract all items in a list:
.select("ul.results > li")