Skip to main content
BETAUnder active development. Some features may not work as expected.

What Is XPath? XML Path Language for Web Scraping

intermediate

XPath (XML Path Language) is a query language for navigating and selecting elements in XML and HTML documents. It uses path-like expressions to traverse the document tree and can select elements based on their position, attributes, or text content.

XPath vs. CSS Selectors

Both achieve the same goal — selecting elements — but XPath is more powerful in some scenarios:

FeatureCSS SelectorsXPath
Select by class.product//div[@class='product']
Select by text contentNot possible//a[contains(text(),'Next')]
Go up the tree (parent)Not possible//span[@class='price']/parent::div
Select by position:nth-child(2)(//div)[2]
ReadabilitySimplerMore verbose

When XPath Wins

Use XPath when you need to:

  • Select elements by their text content
  • Navigate to parent elements
  • Use complex conditions (AND/OR logic)
  • Select elements that CSS selectors can't reach
python
# Scrapy (native XPath support)
response.xpath('//a[contains(text(), "Next Page")]/@href').get()
response.xpath('//div[@class="product"]//span[@class="price"]/text()').getall()

# lxml from lxml import html tree = html.fromstring(page_content) prices = tree.xpath('//span[@class="price"]/text()')

Common XPath Patterns for Scraping

  • //div[@class='product'] — all divs with class "product"
  • //a/@href — all link URLs
  • //table//tr[position()>1] — table rows (skip header)
  • //h2[contains(text(),'Price')]/following-sibling::span — span after a heading containing "Price"

Recommendation

Start with CSS selectors — they're simpler and cover 90% of cases. Switch to XPath when you need text-based selection or parent traversal.

Learn XPath hands-on

This glossary entry covers the basics. The Master Web Scraping course teaches you to use xpath in real projects across 16 in-depth chapters.

Get Instant Access — $19

$ need_help?

We're here for you