Is Web Scraping Legal? A Practical Guide for Developers
"Is web scraping legal?" is the first question most people ask. The short answer: it depends on what you scrape, how you scrape it, and where you are.
This guide covers the key legal frameworks, court cases, and practical guidelines you need to know.
The Short Answer
Web scraping is generally legal when:
- •You're scraping publicly available data
- •You're not bypassing authentication or access controls
- •You're not violating Terms of Service in a way that causes harm
- •You're respecting rate limits and not damaging the server
- •You're not scraping personal data without a legal basis
Key Court Cases
hiQ Labs v. LinkedIn (2022)
The most important scraping case. hiQ scraped public LinkedIn profile data to build workforce analytics products. LinkedIn sent a cease-and-desist and blocked hiQ's access. hiQ sued.
Result: The Ninth Circuit ruled that scraping publicly available data likely does not violate the Computer Fraud and Abuse Act (CFAA). The key word is "publicly available" — LinkedIn profiles visible without logging in were fair game. What it means for you: Scraping public data is on stronger legal ground than it used to be. But this ruling applies to the Ninth Circuit (US West Coast) and isn't a universal green light.Clearview AI (Multiple cases, 2020-2024)
Clearview scraped billions of photos from social media to build a facial recognition database. Multiple countries and US states have taken legal action.
Result: Fined in the EU, UK, Australia, and several US states. Violations of GDPR, BIPA (Illinois Biometric Information Privacy Act), and other privacy laws. What it means for you: Scraping personal data — especially biometric data — for commercial use is high-risk. The data being "public" doesn't make it legal to collect at scale.Meta v. Bright Data (2024)
Meta sued Bright Data for scraping Instagram and Facebook data. Bright Data argued the data was publicly available.
Result: The court ruled that scraping data behind a login wall (even if the scraper had a valid account) can violate Terms of Service. However, truly public data (accessible without login) had stronger protections. What it means for you: Logging in to scrape is riskier than scraping pages that are publicly accessible without authentication.robots.txt
The robots.txt file at the root of a website tells crawlers which pages they can and can't access.
# Example robots.txt
User-agent: *
Disallow: /admin/
Disallow: /api/
Allow: /products/
Crawl-delay: 1
Checking it is simple:
import requests
response = requests.get("https://example.com/robots.txt")
print(response.text)
Always check it. Always respect it when possible.
Terms of Service
Most websites prohibit scraping in their Terms of Service (ToS). The legal weight of these terms varies:
- •US: Violating ToS alone generally isn't a criminal offense after the Van Buren v. United States (2021) Supreme Court ruling, which narrowed the CFAA. But it can still support civil claims.
- •EU: ToS violations are primarily a contract law issue. Database rights under the EU Database Directive may also apply.
CFAA (United States)
The Computer Fraud and Abuse Act criminalizes "unauthorized access" to computer systems. Before the Van Buren ruling (2021), some companies argued that any ToS violation counted as unauthorized access.
Post-Van Buren, the CFAA applies more narrowly — it's about bypassing technical access barriers (like hacking passwords), not just violating written terms.
You're likely safe under CFAA if you:- •Only access publicly available pages
- •Don't bypass login walls or CAPTCHAs
- •Don't use stolen credentials
- •Bypass authentication systems
- •Access data you're not authorized to see
- •Continue scraping after receiving a cease-and-desist with IP blocks
GDPR (European Union)
If you scrape data about EU residents, GDPR applies regardless of where you're located.
Key GDPR requirements for scrapers:
- •You need a lawful basis to process personal data (consent, legitimate interest, etc.)
- •You must respect data subject rights (right to deletion, right to access)
- •You need to document what data you collect and why
- •Data minimization — only collect what you actually need
Scraping product prices or public statistics? GDPR doesn't apply. Scraping user profiles with names and locations? It does.
Public vs Private Data
The distinction between public and private data matters more than any other factor:
| Data Type | Risk Level | Examples |
|---|---|---|
| Public product data | Low | Prices, specs, availability |
| Public content | Low-moderate | Articles, reviews (copyright applies) |
| Public profiles | Moderate | Social media bios, usernames |
| Login-required data | High | Dashboard data, private messages |
| Personal data at scale | High | Names, emails, phone numbers |
Practical Checklist for Legal Scraping
- 1.Check robots.txt and respect it
- 2.Read the Terms of Service — at least the sections about automated access
- 3.Only scrape public pages — don't log in to access data
- 4.Avoid personal data unless you have a clear legal basis
- 5.Rate limit your requests — don't overwhelm the server
- 6.Don't republish copyrighted content verbatim — facts are fine, creative expression is not
- 7.Respond to cease-and-desist letters — ignoring them escalates the situation
- 8.Document your process — if challenged, you want to show you acted in good faith
- 9.Consult a lawyer for commercial projects that scrape at scale
What to Avoid
- •Scraping behind login walls without permission
- •Ignoring cease-and-desist notices
- •Collecting personal data without a GDPR-compliant basis
- •Republishing copyrighted content as your own
- •Scraping at rates that degrade the target site's performance
- •Selling scraped personal data
What's Next
Legal considerations are important, but they shouldn't paralyze you. Most scraping of public product data, prices, and statistics is on solid legal ground. The risks increase when personal data and access controls are involved.
The Master Web Scraping course covers ethical scraping practices alongside the technical skills, so you know exactly where the lines are.