Skip to main content
BETAUnder active development. Some features may not work as expected.

What Is Web Crawling? Crawling vs. Scraping Explained

beginner

Web crawling is the automated process of systematically browsing the web by following links from page to page. While web scraping extracts data from specific pages, web crawling discovers and navigates to those pages in the first place.

Crawling vs. Scraping

Web CrawlingWeb Scraping
GoalDiscover pagesExtract data
ActionFollow linksParse content
OutputList of URLsStructured data
ScaleBroadTargeted
In practice, most projects do both: crawl to find pages, then scrape to extract data from each one.

How Web Crawlers Work

  1. 1.Start with one or more seed URLs
  2. 2.Fetch each page
  3. 3.Extract all links from the page
  4. 4.Filter links (same domain, not visited, allowed by robots.txt)
  5. 5.Add new links to the queue
  6. 6.Repeat until done
python
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin

visited = set() queue = ["https://example.com"]

while queue: url = queue.pop(0) if url in visited: continue visited.add(url)

response = requests.get(url) soup = BeautifulSoup(response.text, "lxml")

# Extract data from this page # ...

# Find new links to crawl for link in soup.select("a[href]"): full_url = urljoin(url, link["href"]) if full_url.startswith("https://example.com") and full_url not in visited: queue.append(full_url)

Crawling Best Practices

  • Respect robots.txt: Check before crawling any domain
  • Deduplicate URLs: Track visited pages to avoid infinite loops
  • Handle pagination: Don't just follow nav links — detect "next page" patterns
  • Set depth limits: Don't crawl infinitely deep into a site
  • Use breadth-first: Process pages level by level, not deep into one branch

When to Use a Crawler

  • You need all pages from a site (product catalog, directory)
  • You don't know the exact URLs upfront
  • The site doesn't have a sitemap or API
  • You're building a search index

Learn Web Crawling hands-on

This glossary entry covers the basics. The Master Web Scraping course teaches you to use web crawling in real projects across 16 in-depth chapters.

Get Instant Access — $19

$ need_help?

We're here for you