Skip to main content
BETAUnder active development. Some features may not work as expected.

Storing Scraped Data: CSV, JSON, Databases & More

beginner

Data storage in web scraping refers to how and where you save the extracted data. The choice depends on the data size, structure, and how you plan to use it — from simple CSV files for small projects to databases for large-scale operations.

Storage Options Compared

FormatBest ForMax ScaleQuery Support
CSVSimple tabular data~1M rowsNo (use pandas)
JSONNested/flexible data~100K recordsNo
SQLiteMedium projects~10M rowsFull SQL
PostgreSQLProduction systemsUnlimitedFull SQL
MongoDBVaried/nested dataUnlimitedQuery language

CSV — The Simple Default

python
import csv

with open("products.csv", "w", newline="") as f: writer = csv.DictWriter(f, fieldnames=["name", "price", "url"]) writer.writeheader() writer.writerows(products)

Pros: Universal, easy to open in Excel, no dependencies Cons: No types, no querying, slow for large datasets

JSON — For Nested Data

python
import json

with open("products.json", "w") as f: json.dump(products, f, indent=2)

Pros: Handles nested structures, widely supported Cons: Entire file loaded into memory, no querying

SQLite — The Sweet Spot

python
import sqlite3

conn = sqlite3.connect("scraping.db") conn.execute(""" CREATE TABLE IF NOT EXISTS products ( id INTEGER PRIMARY KEY, name TEXT, price REAL, url TEXT UNIQUE, scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ) """) conn.execute( "INSERT OR IGNORE INTO products (name, price, url) VALUES (?, ?, ?)", (name, price, url) ) conn.commit()

Pros: SQL queries, no server needed, handles duplicates, ACID transactions Cons: Single-writer limitation, not great for concurrent access

Best Practices

  • Always deduplicate: Use UNIQUE constraints or check before inserting
  • Add timestamps: Record when each item was scraped
  • Store raw + clean: Keep the original data alongside processed versions
  • Batch inserts: Insert many rows at once instead of one-by-one
  • Use UPSERT: Update existing records instead of failing on duplicates

Learn Data Storage hands-on

This glossary entry covers the basics. The Master Web Scraping course teaches you to use data storage in real projects across 16 in-depth chapters.

Get Instant Access — $19

$ need_help?

We're here for you