![]()
▶️ Quickstart | Demo | GitHub | Documentation | Info & Motivation | Community
ArchiveBox is a powerful, self-hosted internet archiving solution to collect, save, and view websites offline.
Without active preservation effort, everything on the internet eventually dissapears or degrades. Archive.org does a great job as a free central archive, but they require all archives to be public, and they can't save every type of content.
ArchiveBox is an open source tool that helps you archive web content on your own (or privately within an organization): save copies of browser bookmarks, preserve evidence for legal cases, backup photos from FB / Insta / Flickr, download your media from YT / Soundcloud / etc., snapshot research papers & academic citations, and more...
➡️ Use ArchiveBox as a command-line package and/or self-hosted web app on Linux, macOS, or in Docker.
📥 You can feed ArchiveBox URLs one at a time, or schedule regular imports from browser bookmarks or history, feeds like RSS, bookmark services like Pocket/Pinboard, and more. See input formats for a full list.
💾 It saves snapshots of the URLs you feed it in several redundant formats.
It also detects any content featured inside each webpage & extracts it out into a folder:
- HTML/Generic websites -> HTML, PDF, PNG, WARC, Singlefile
- YouTube/SoundCloud/etc. -> MP3/MP4 + subtitles, description, thumbnail
- News articles -> article body TXT + title, author, featured images
- Github/Gitlab/etc. links -> git cloned source code
- and more...
It uses normal filesystem folders to organize archives (no complicated proprietary formats), and offers a CLI + web UI.
🏛️ ArchiveBox is used by many professionals and hobbyists who save content off the web, for example:
backing up browser bookmarks/history, saving FB/Insta/etc. content, shopping lists crawling and collecting research, preserving quoted material, fact-checking and review evidence collection, hashing & integrity verifying, search, tagging, & review collecting AI training sets, feeding analysis / web crawling pipelinesThe goal is to sleep soundly knowing the part of the internet you care about will be automatically preserved in durable, easily accessible formats for decades after it goes down.
Demo | Screenshots | Usage
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
📦 Get ArchiveBox with docker / apt / brew / pip3 / nix / etc. (see Quickstart below).
# Get ArchiveBox with Docker or Docker Compose (recommended)
docker run -v $PWD/data:/data -it archivebox/archivebox:dev init --setup
# Or install with your preferred package manager (see Quickstart below for apt, brew, and more)
pip3 install archivebox
# Or use the optional auto setup script to install it
curl -sSL 'https://get.archivebox.io' | sh
🔢 Example usage: adding links to archive.
archivebox add 'https://example.com' # add URLs one at a time
archivebox add < ~/Downloads/bookmarks.json # or pipe in URLs in any text-based format
archivebox schedule --every=day --depth=1 https://example.com/rss.xml # or auto-import URLs regularly on a schedule
🔢 Example usage: viewing the archived content.
archivebox server 0.0.0.0:8000 # use the interactive web UI
archivebox list 'https://example.com' # use the CLI commands (--help for more)
ls ./archive/*/index.json # or browse directly via the filesystem
Contact us if your non-profit institution/org wants to use ArchiveBox professionally.
All our work is open-source and primarily geared towards non-profits.
Support/consulting pays for hosting and funds new ArchiveBox open-source development.
🖥 Supported OSs: Linux/BSD, macOS, Windows (Docker) 👾 CPUs: amd64 (x86_64), arm64 (arm8), arm7 (raspi>=3)
Note: On arm7 the playwright package is not available, so chromium must be installed manually if needed.
docker-compose (macOS/Linux/Windows) 👈 recommended (click to expand)
👍 Docker Compose is recommended for the easiest install/update UX + best security + all the extras out-of-the-box.
docker-compose.yml file into a new empty directory (can be anywhere).
mkdir ~/archivebox && cd ~/archivebox
curl -O 'https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/dev/docker-compose.yml'
docker compose run archivebox init --setup
docker compose up
# completely optional, CLI can always be used without running a server
# docker compose run [-T] archivebox [subcommand] [--args]
See below for more usage examples using the CLI, Web UI, or filesystem/SQL/Python to manage your archive.
docker run (macOS/Linux/Windows)
mkdir ~/archivebox && cd ~/archivebox
docker run -v $PWD:/data -it archivebox/archivebox init --setup
docker run -v $PWD:/data -p 8000:8000 archivebox/archivebox
# completely optional, CLI can always be used without running a server
# docker run -v $PWD:/data -it [subcommand] [--args]
See below for more usage examples using the CLI, Web UI, or filesystem/SQL/Python to manage your archive.
bash<
$ claude mcp add ArchiveBox \
-- python -m otcore.mcp_server <graph>