MCPcopy
hub / github.com/watercrawl/WaterCrawl

github.com/watercrawl/WaterCrawl @v0.12.3 sqlite

repository ↗ · DeepWiki ↗ · release v0.12.3 ↗
1,936 symbols 5,859 edges 363 files 202 documented · 10%
README

Water Crawl

WaterCrawl Pricing GitHub release (latest by date) GitHub Workflow Status Docker Image Version GitHub stars GitHub issues Python Version

🕷️ WaterCrawl is a powerful web application that uses Python, Django, Scrapy, and Celery to crawl web pages and extract relevant data.

🚀 Quick Start

  1. 🐳 Quick start
  2. 💻 Development (For Contributing)

🐳 Quick start

To build and run WaterCrawl on Docker locally, please follow these steps:

  1. Clone the repository: bash git clone https://github.com/watercrawl/watercrawl.git cd watercrawl

  2. Build and run the Docker containers: bash cd docker cp .env.example .env docker compose up -d

  3. Access the application with open http://localhost

⚠️ IMPORTANT: If you're deploying on a domain or IP address other than localhost, you MUST update the MinIO configuration in your .env file: ```bash

Change this from 'localhost' to your actual domain or IP

MINIO_EXTERNAL_ENDPOINT=your-domain.com

Also update these URLs accordingly

MINIO_BROWSER_REDIRECT_URL=http://your-domain.com/minio-console/ MINIO_SERVER_URL=http://your-domain.com/ ``` Failure to update these settings will result in broken file uploads and downloads. For more details, see DEPLOYMENT.md.

Important: Before deploying to production, ensure that you update the .env file with the appropriate configuration values. Additionally, make sure to set up and configure the database, MinIO, and any other required services. for more information, please read the Deployment Guide.

💻 Development (For Contributing)

For local development and contribution, please follow our Contributing Guide 🤝

We're Hiring

✨ Features

  • 🕸️ Advanced Web Crawling & Scraping - Crawl websites with highly customizable options for depth, speed, and targeting specific content
  • 🔍 Powerful Search Engine - Find relevant content across the web with multiple search depths (basic, advanced, ultimate)
  • 🌐 Multi-language Support - Search and crawl content in different languages with country-specific targeting
  • ⚡ Asynchronous Processing - Monitor real-time progress of crawls and searches via Server-Sent Events (SSE)
  • 🔄 REST API with OpenAPI - Comprehensive API with detailed documentation and client libraries
  • 🔌 Rich Ecosystem - Integrations with Dify, N8N, and other AI/automation platforms
  • 🏠 Self-hosted & Open Source - Full control over your data with easy deployment options
  • 📊 Advanced Results Handling - Download and process search results with customizable parameters

Check our API Overview to learn more about these features.

🛠️ Client SDKs

  • Python Client - Full-featured SDK with support for all API endpoints
  • Node.js Client - Complete JavaScript/TypeScript integration
  • Go Client - Full-featured SDK with support for all API endpoints
  • PHP Client - Full-featured SDK with support for all API endpoints
  • 🔜 Rust Client - Coming soon

🔌 Integrations

🔧 Plugins

  • ✅ WaterCrawl plugin
  • ✅ OpenAI Plugin

⭐ Star History

Star History Chart

🔒 Security Disclosure

⚠️ Please avoid posting security issues on GitHub. Instead, send your questions to support@watercrawl.dev and we will provide you with a more detailed answer.

📄 License

This repository is available under the WaterCrawl License, which is essentially MIT with a few additional restrictions.


Made with ❤️ by the WaterCrawl Team

Extension points exported contracts — how you extend this code

PaginatedResponse (Interface)
(no doc)
frontend/src/types/common.ts
ApiError (Interface)
(no doc)
frontend/src/types/common.ts
UsageStats (Interface)
(no doc)
frontend/src/types/common.ts
UsageHistory (Interface)
(no doc)
frontend/src/types/common.ts
UsageResponse (Interface)
(no doc)
frontend/src/types/common.ts

Core symbols most depended-on inside this repo

get
called by 322
backend/user/views.py
log_event
called by 111
tutorials/Deep Search (Langgraph WaterCrawl LiteLLM )/utils.py
get
called by 67
backend/core/views.py
colorize_text
called by 53
tutorials/Deep Search (Langgraph WaterCrawl LiteLLM )/utils.py
markdown
called by 37
backend/core/views.py
post
called by 34
backend/user/views.py
_debug_print
called by 30
tutorials/Company name and Objective (search filter scrape)/objective_crawler/utils.py
create
called by 29
backend/plan/services.py

Shape

Method 799
Function 591
Class 352
Interface 167
Route 19
Enum 8

Languages

Python67%
TypeScript33%

Modules by API surface

backend/core/services.py121 symbols
backend/plan/services.py99 symbols
backend/user/services.py55 symbols
backend/core/views.py47 symbols
backend/user/views.py45 symbols
backend/core/tests/test_services.py44 symbols
backend/core/serializers.py43 symbols
backend/user/tests/test_services.py37 symbols
backend/common/services.py34 symbols
backend/user/serializers.py24 symbols
tutorials/Deep Search (Langgraph WaterCrawl LiteLLM )/utils.py23 symbols
tutorials/DeepSeekR1_WaterCrawl_live_chat_with_your_website/DeepSeekR1_WaterCrawl_live_chat_with_your_webpage.py22 symbols

Dependencies from manifests, versioned

@auth0/auth0-react2.16.2 · 1×
@docusaurus/core3.8.0 · 1×
@docusaurus/module-type-aliases3.8.0 · 1×
@docusaurus/preset-classic3.8.0 · 1×
@docusaurus/tsconfig3.8.0 · 1×
@docusaurus/types3.8.0 · 1×
@eslint/js9.39.4 · 1×
@headlessui/react2.2.10 · 1×
@heroicons/react2.2.0 · 1×
@hookform/resolvers3.10.0 · 1×
@mdx-js/react3.1.1 · 1×
@monaco-editor/react4.7.0 · 1×

Datastores touched

postgresDatabase · 1 repos

For agents

$ claude mcp add WaterCrawl \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact