
🕷️ WaterCrawl is a powerful web application that uses Python, Django, Scrapy, and Celery to crawl web pages and extract relevant data.
To build and run WaterCrawl on Docker locally, please follow these steps:
Clone the repository:
bash
git clone https://github.com/watercrawl/watercrawl.git
cd watercrawl
Build and run the Docker containers:
bash
cd docker
cp .env.example .env
docker compose up -d
Access the application with open http://localhost
⚠️ IMPORTANT: If you're deploying on a domain or IP address other than localhost, you MUST update the MinIO configuration in your .env file: ```bash
Change this from 'localhost' to your actual domain or IP
MINIO_EXTERNAL_ENDPOINT=your-domain.com
Also update these URLs accordingly
MINIO_BROWSER_REDIRECT_URL=http://your-domain.com/minio-console/ MINIO_SERVER_URL=http://your-domain.com/ ``` Failure to update these settings will result in broken file uploads and downloads. For more details, see DEPLOYMENT.md.
Important: Before deploying to production, ensure that you update the
.envfile with the appropriate configuration values. Additionally, make sure to set up and configure the database, MinIO, and any other required services. for more information, please read the Deployment Guide.
For local development and contribution, please follow our Contributing Guide 🤝
Check our API Overview to learn more about these features.
⚠️ Please avoid posting security issues on GitHub. Instead, send your questions to support@watercrawl.dev and we will provide you with a more detailed answer.
This repository is available under the WaterCrawl License, which is essentially MIT with a few additional restrictions.
Made with ❤️ by the WaterCrawl Team
$ claude mcp add WaterCrawl \
-- python -m otcore.mcp_server <graph>