MCPcopy
hub / github.com/ArchiveBox/ArchiveBox

github.com/ArchiveBox/ArchiveBox @v0.7.4 sqlite

repository ↗ · DeepWiki ↗ · release v0.7.4 ↗
818 symbols 3,792 edges 122 files 125 documented · 15%
README

ArchiveBox Open-source self-hosted web archiving.

▶️ Quickstart | Demo | GitHub | Documentation | Info & Motivation | Community

 


ArchiveBox is a powerful, self-hosted internet archiving solution to collect, save, and view websites offline.

Without active preservation effort, everything on the internet eventually dissapears or degrades. Archive.org does a great job as a free central archive, but they require all archives to be public, and they can't save every type of content.

ArchiveBox is an open source tool that helps you archive web content on your own (or privately within an organization): save copies of browser bookmarks, preserve evidence for legal cases, backup photos from FB / Insta / Flickr, download your media from YT / Soundcloud / etc., snapshot research papers & academic citations, and more...

➡️ Use ArchiveBox as a command-line package and/or self-hosted web app on Linux, macOS, or in Docker.


📥 You can feed ArchiveBox URLs one at a time, or schedule regular imports from browser bookmarks or history, feeds like RSS, bookmark services like Pocket/Pinboard, and more. See input formats for a full list.

snapshot detail page

💾 It saves snapshots of the URLs you feed it in several redundant formats.
It also detects any content featured inside each webpage & extracts it out into a folder: - HTML/Generic websites -> HTML, PDF, PNG, WARC, Singlefile - YouTube/SoundCloud/etc. -> MP3/MP4 + subtitles, description, thumbnail - News articles -> article body TXT + title, author, featured images - Github/Gitlab/etc. links -> git cloned source code - and more...

It uses normal filesystem folders to organize archives (no complicated proprietary formats), and offers a CLI + web UI.


🏛️ ArchiveBox is used by many professionals and hobbyists who save content off the web, for example:

  • Individuals: backing up browser bookmarks/history, saving FB/Insta/etc. content, shopping lists
  • Journalists: crawling and collecting research, preserving quoted material, fact-checking and review
  • Lawyers: evidence collection, hashing & integrity verifying, search, tagging, & review
  • Researchers: collecting AI training sets, feeding analysis / web crawling pipelines

The goal is to sleep soundly knowing the part of the internet you care about will be automatically preserved in durable, easily accessible formats for decades after it goes down.

bookshelf graphic   logo   bookshelf graphic

Demo | Screenshots | Usage

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

📦  Get ArchiveBox with docker / apt / brew / pip3 / nix / etc. (see Quickstart below).

# Get ArchiveBox with Docker or Docker Compose (recommended)
docker run -v $PWD/data:/data -it archivebox/archivebox:dev init --setup

# Or install with your preferred package manager (see Quickstart below for apt, brew, and more)
pip3 install archivebox

# Or use the optional auto setup script to install it
curl -sSL 'https://get.archivebox.io' | sh

🔢 Example usage: adding links to archive.

archivebox add 'https://example.com'                                   # add URLs one at a time
archivebox add < ~/Downloads/bookmarks.json                            # or pipe in URLs in any text-based format
archivebox schedule --every=day --depth=1 https://example.com/rss.xml  # or auto-import URLs regularly on a schedule

🔢 Example usage: viewing the archived content.

archivebox server 0.0.0.0:8000            # use the interactive web UI
archivebox list 'https://example.com'     # use the CLI commands (--help for more)
ls ./archive/*/index.json                 # or browse directly via the filesystem

cli init screenshot cli init screenshot server snapshot admin screenshot server snapshot details page screenshot

Key Features

🤝 Professional Integration

Contact us if your non-profit institution/org wants to use ArchiveBox professionally.

  • setup & support, team permissioning, hashing, audit logging, backups, custom archiving etc.
  • for individuals, NGOs, academia, governments, journalism, law, and more...

All our work is open-source and primarily geared towards non-profits.
Support/consulting pays for hosting and funds new ArchiveBox open-source development.

grassgrass

Quickstart

🖥  Supported OSs: Linux/BSD, macOS, Windows (Docker)   👾  CPUs: amd64 (x86_64), arm64 (arm8), arm7 (raspi>=3)

Note: On arm7 the playwright package is not available, so chromium must be installed manually if needed.

✳️  Easy Setup

Docker docker-compose (macOS/Linux/Windows)   👈  recommended   (click to expand)

👍 Docker Compose is recommended for the easiest install/update UX + best security + all the extras out-of-the-box.

  1. Install Docker and Docker Compose on your system (if not already installed).
  2. Download the docker-compose.yml file into a new empty directory (can be anywhere).
    mkdir ~/archivebox && cd ~/archivebox
    curl -O 'https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/dev/docker-compose.yml'
    
  3. Run the initial setup and create an admin user.
    docker compose run archivebox init --setup
    
  4. Optional: Start the server then login to the Web UI http://127.0.0.1:8000 ⇢ Admin.
    docker compose up
    # completely optional, CLI can always be used without running a server
    # docker compose run [-T] archivebox [subcommand] [--args]
    

See below for more usage examples using the CLI, Web UI, or filesystem/SQL/Python to manage your archive.

Docker docker run (macOS/Linux/Windows)

  1. Install Docker on your system (if not already installed).
  2. Create a new empty directory and initialize your collection (can be anywhere).
    mkdir ~/archivebox && cd ~/archivebox
    docker run -v $PWD:/data -it archivebox/archivebox init --setup
    
  3. Optional: Start the server then login to the Web UI http://127.0.0.1:8000 ⇢ Admin.
    docker run -v $PWD:/data -p 8000:8000 archivebox/archivebox
    # completely optional, CLI can always be used without running a server
    # docker run -v $PWD:/data -it [subcommand] [--args]
    

See below for more usage examples using the CLI, Web UI, or filesystem/SQL/Python to manage your archive.

curl sh automatic setup script bash<

Core symbols most depended-on inside this repo

stderr
called by 185
archivebox/config.py
get
called by 115
archivebox/core/views.py
filter
called by 44
archivebox/core/settings.py
end
called by 25
archivebox/logging_util.py
bin_path
called by 22
archivebox/config.py
atomic_write
called by 21
archivebox/system.py
run
called by 17
archivebox/main.py
parse_date
called by 15
archivebox/util.py

Shape

Function 581
Method 160
Class 70
Route 7

Languages

Python73%
TypeScript27%

Modules by API surface

archivebox/templates/static/jquery.dataTables.min.js126 symbols
archivebox/templates/static/jquery.min.js92 symbols
archivebox/index/schema.py48 symbols
archivebox/logging_util.py39 symbols
archivebox/core/models.py35 symbols
archivebox/index/__init__.py34 symbols
archivebox/core/admin.py33 symbols
archivebox/config.py32 symbols
archivebox/util.py20 symbols
archivebox/cli/tests.py19 symbols
archivebox/main.py18 symbols
tests/test_extractors.py16 symbols

Dependencies from manifests, versioned

@postlight/parser2.2.3 · 1×
readability-extractorgithub:ArchiveBox/re · 1×
single-file-cli2.0.83 · 1×
asgiref3.11.1 · 1×
asttokens3.0.1 · 1×
certifi2026.4.22 · 1×
charset-normalizer3.4.7 · 1×
colorama0.4.6 · 1×
croniter6.2.2 · 1×
dateparser1.2.2 · 1×
decorator5.3.1 · 1×
django3.1.14 · 1×

For agents

$ claude mcp add ArchiveBox \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact