What Is XPath? XML Path Language for Web Scraping
XPath (XML Path Language) is a query language for navigating and selecting elements in XML and HTML documents. It uses path-like expressions to traverse the document tree and can select elements based on their position, attributes, or text content.
XPath vs. CSS Selectors
Both achieve the same goal — selecting elements — but XPath is more powerful in some scenarios:
| Feature | CSS Selectors | XPath |
|---|---|---|
| Select by class | .product | //div[@class='product'] |
| Select by text content | Not possible | //a[contains(text(),'Next')] |
| Go up the tree (parent) | Not possible | //span[@class='price']/parent::div |
| Select by position | :nth-child(2) | (//div)[2] |
| Readability | Simpler | More verbose |
When XPath Wins
Use XPath when you need to:
- •Select elements by their text content
- •Navigate to parent elements
- •Use complex conditions (AND/OR logic)
- •Select elements that CSS selectors can't reach
# Scrapy (native XPath support)
response.xpath('//a[contains(text(), "Next Page")]/@href').get()
response.xpath('//div[@class="product"]//span[@class="price"]/text()').getall()
# lxml
from lxml import html
tree = html.fromstring(page_content)
prices = tree.xpath('//span[@class="price"]/text()')
Common XPath Patterns for Scraping
- •
//div[@class='product']— all divs with class "product" - •
//a/@href— all link URLs - •
//table//tr[position()>1]— table rows (skip header) - •
//h2[contains(text(),'Price')]/following-sibling::span— span after a heading containing "Price"
Recommendation
Start with CSS selectors — they're simpler and cover 90% of cases. Switch to XPath when you need text-based selection or parent traversal.