What Is XPath? XML Path Language for Web Scraping

intermediate

XPath (XML Path Language) is a query language for navigating and selecting elements in XML and HTML documents. It uses path-like expressions to traverse the document tree and can select elements based on their position, attributes, or text content.

XPath vs. CSS Selectors

Both achieve the same goal — selecting elements — but XPath is more powerful in some scenarios:

Feature	CSS Selectors	XPath
Select by class	`.product`	`//div[@class='product']`
Select by text content	Not possible	`//a[contains(text(),'Next')]`
Go up the tree (parent)	Not possible	`//span[@class='price']/parent::div`
Select by position	`:nth-child(2)`	`(//div)[2]`
Readability	Simpler	More verbose

When XPath Wins

Use XPath when you need to:

•Select elements by their text content
•Navigate to parent elements
•Use complex conditions (AND/OR logic)
•Select elements that CSS selectors can't reach

python

# Scrapy (native XPath support)
response.xpath('//a[contains(text(), "Next Page")]/@href').get()
response.xpath('//div[@class="product"]//span[@class="price"]/text()').getall()
# lxml
from lxml import html
tree = html.fromstring(page_content)
prices = tree.xpath('//span[@class="price"]/text()')

Common XPath Patterns for Scraping

•//div[@class='product'] — all divs with class "product"
•//a/@href — all link URLs
•//table//tr[position()>1] — table rows (skip header)
•//h2[contains(text(),'Price')]/following-sibling::span — span after a heading containing "Price"

Recommendation

Start with CSS selectors — they're simpler and cover 90% of cases. Switch to XPath when you need text-based selection or parent traversal.

What Is XPath? XML Path Language for Web Scraping

XPath vs. CSS Selectors

When XPath Wins

Common XPath Patterns for Scraping

Recommendation

Related Terms

CSS Selector

HTML Parsing

Scrapy

lxml

Learn XPath hands-on