What Is lxml? Fast XML and HTML Parsing in Python

beginner

lxml is a high-performance Python library for processing XML and HTML. It provides both a Pythonic API and XPath/CSS selector support, and is the fastest HTML parser available in Python — making it the standard choice for production web scraping.

Why lxml?

lxml is a C-based parser that's significantly faster than Python's built-in html.parser. For scraping thousands of pages, this speed difference adds up.

Parser	Speed (relative)	Handles Broken HTML	Install
html.parser	1x (baseline)	Decent	Built-in
lxml	5-10x faster	Good	`pip install lxml`
html5lib	0.2x (slow)	Best	`pip install html5lib`

Using lxml with BeautifulSoup

The most common pattern — use lxml as the parser backend for BeautifulSoup:

python

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "lxml")  # just change the parser name
products = soup.select(".product-card")

Using lxml Directly

For maximum performance, use lxml's own API:

python

from lxml import html
tree = html.fromstring(page_content)
# XPath
titles = tree.xpath('//h2[@class="title"]/text()')
prices = tree.xpath('//span[@class="price"]/text()')
# CSS Selectors (via cssselect)
from lxml.cssselect import CSSSelector
selector = CSSSelector(".product-card .title")
elements = selector(tree)

When to Use lxml Directly vs. BeautifulSoup

•BeautifulSoup + lxml: When you want a friendly API and don't need maximum speed
•lxml directly: When parsing speed is critical (millions of pages) or you prefer XPath

Installation Note

lxml requires C libraries to compile. On most systems, pip install lxml works fine. On some Linux systems, you may need to install libxml2-dev and libxslt-dev first.

What Is lxml? Fast XML and HTML Parsing in Python

Why lxml?

Using lxml with BeautifulSoup

Using lxml Directly

When to Use lxml Directly vs. BeautifulSoup

Installation Note

Related Terms

HTML Parsing

BeautifulSoup

XPath

CSS Selector

Learn lxml hands-on