Skip to main content
BETAUnder active development. Some features may not work as expected.

What Is lxml? Fast XML and HTML Parsing in Python

beginner

lxml is a high-performance Python library for processing XML and HTML. It provides both a Pythonic API and XPath/CSS selector support, and is the fastest HTML parser available in Python — making it the standard choice for production web scraping.

Why lxml?

lxml is a C-based parser that's significantly faster than Python's built-in html.parser. For scraping thousands of pages, this speed difference adds up.

ParserSpeed (relative)Handles Broken HTMLInstall
html.parser1x (baseline)DecentBuilt-in
lxml5-10x fasterGoodpip install lxml
html5lib0.2x (slow)Bestpip install html5lib

Using lxml with BeautifulSoup

The most common pattern — use lxml as the parser backend for BeautifulSoup:

python
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "lxml") # just change the parser name products = soup.select(".product-card")

Using lxml Directly

For maximum performance, use lxml's own API:

python
from lxml import html

tree = html.fromstring(page_content)

# XPath titles = tree.xpath('//h2[@class="title"]/text()') prices = tree.xpath('//span[@class="price"]/text()')

# CSS Selectors (via cssselect) from lxml.cssselect import CSSSelector selector = CSSSelector(".product-card .title") elements = selector(tree)

When to Use lxml Directly vs. BeautifulSoup

  • BeautifulSoup + lxml: When you want a friendly API and don't need maximum speed
  • lxml directly: When parsing speed is critical (millions of pages) or you prefer XPath

Installation Note

lxml requires C libraries to compile. On most systems, pip install lxml works fine. On some Linux systems, you may need to install libxml2-dev and libxslt-dev first.

Learn lxml hands-on

This glossary entry covers the basics. The Master Web Scraping course teaches you to use lxml in real projects across 16 in-depth chapters.

Get Instant Access — $19

$ need_help?

We're here for you