MCPcopy
hub / github.com/sentient-agi/OpenDeepSearch

github.com/sentient-agi/OpenDeepSearch @main sqlite

repository ↗ · DeepWiki ↗
118 symbols 455 edges 25 files 69 documented · 58%
README

🔍OpenDeepSearch: Democratizing Search with Open-source Reasoning Models and Reasoning Agents 🚀

<img src="https://github.com/sentient-agi/OpenDeepSearch/raw/main/assets/sentient-logo-narrow.png" alt="alt text" width="60%"/>

Homepage GitHub Hugging Face

Discord Twitter Follow

Paper

Description 📝

OpenDeepSearch is a lightweight yet powerful search tool designed for seamless integration with AI agents. It enables deep web search and retrieval, optimized for use with Hugging Face's SmolAgents ecosystem.

<img src="https://github.com/sentient-agi/OpenDeepSearch/raw/main/assets/evals.png" alt="Evaluation Results" width="80%"/>
  • Performance: ODS performs on par with closed source search alternatives on single-hop queries such as SimpleQA 🔍.
  • Advanced Capabilities: ODS performs much better than closed source search alternatives on multi-hop queries such as FRAMES bench 🚀.

Table of Contents 📑

Features ✨

  • Semantic Search 🧠: Leverages Crawl4AI and semantic search rerankers (such as Qwen2-7B-instruct and Jina AI) to provide in-depth results
  • Two Modes of Operation ⚡:
  • Default Mode: Quick and efficient search with minimal latency.
  • Pro Mode (Deep Search): More in-depth and accurate results at the cost of additional processing time.
  • Optimized for AI Agents 🤖: Works seamlessly with SmolAgents like CodeAgent.
  • Fast and Lightweight ⚡: Designed for speed and efficiency with minimal setup.
  • Extensible 🔌: Easily configurable to work with different models and APIs.

Installation 📚

To install OpenDeepSearch, run:

pip install -e . #you can also use: uv pip install -e .
pip install -r requirements.txt #you can also use: uv pip install -r requirements.txt

Note: you must have torch installed. Note: using uv instead of regular pip makes life much easier!

Using PDM (Alternative Package Manager) 📦

You can also use PDM as an alternative package manager for OpenDeepSearch. PDM is a modern Python package and dependency manager supporting the latest PEP standards.

# Install PDM if you haven't already
curl -sSL https://raw.githubusercontent.com/pdm-project/pdm/main/install-pdm.py | python3 -

# Initialize a new PDM project
pdm init

# Install OpenDeepSearch and its dependencies
pdm install

# Activate the virtual environment
eval "$(pdm venv activate)"

PDM offers several advantages: - Lockfile support for reproducible installations - PEP 582 support (no virtual environment needed) - Fast dependency resolution - Built-in virtual environment management

Setup

  1. Choose a Search Provider:
  2. Option 1: Serper.dev: Get free 2500 credits and add your API key.

    • Visit serper.dev to create an account.
    • Retrieve your API key and store it as an environment variable:

    bash export SERPER_API_KEY='your-api-key-here'

  3. Option 2: SearXNG: Use a self-hosted or public SearXNG instance.

    • Specify the SearXNG instance URL when initializing OpenDeepSearch.
    • Optionally provide an API key if your instance requires authentication:

    bash export SEARXNG_INSTANCE_URL='https://your-searxng-instance.com' export SEARXNG_API_KEY='your-api-key-here' # Optional

  4. Choose a Reranking Solution:

  5. Quick Start with Jina: Sign up at Jina AI to get an API key for immediate use
  6. Self-hosted Option: Set up Infinity Embeddings server locally with open source models such as Qwen2-7B-instruct
  7. For more details on reranking options, see our Rerankers Guide

  8. Set up LiteLLM Provider:

  9. Choose a provider from the supported list, including:
    • OpenAI
    • Anthropic
    • Google (Gemini)
    • OpenRouter
    • HuggingFace
    • Fireworks
    • And many more!
  10. Set your chosen provider's API key as an environment variable: bash export <PROVIDER>_API_KEY='your-api-key-here' # e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY
  11. For OpenAI, you can also set a custom base URL (useful for self-hosted endpoints or proxies): bash export OPENAI_BASE_URL='https://your-custom-openai-endpoint.com'
  12. You can set default LiteLLM model IDs for different tasks: ```bash # General default model (fallback for all tasks) export LITELLM_MODEL_ID='openrouter/google/gemini-2.0-flash-001'

# Task-specific models export LITELLM_SEARCH_MODEL_ID='openrouter/google/gemini-2.0-flash-001' # For search tasks export LITELLM_ORCHESTRATOR_MODEL_ID='openrouter/google/gemini-2.0-flash-001' # For agent orchestration export LITELLM_EVAL_MODEL_ID='gpt-4o-mini' # For evaluation tasks - When initializing OpenDeepSearch, you can specify your chosen model using the provider's format (this will override the environment variables):python search_agent = OpenDeepSearchTool(model_name="provider/model-name") # e.g., "anthropic/claude-3-opus-20240229", 'huggingface/microsoft/codebert-base', 'openrouter/google/gemini-2.0-flash-001' ```

Usage ️

You can use OpenDeepSearch independently or integrate it with SmolAgents for enhanced reasoning and code generation capabilities.

Using OpenDeepSearch Standalone 🔍

from opendeepsearch import OpenDeepSearchTool
import os

# Set environment variables for API keys
os.environ["SERPER_API_KEY"] = "your-serper-api-key-here"  # If using Serper
# Or for SearXNG
# os.environ["SEARXNG_INSTANCE_URL"] = "https://your-searxng-instance.com"
# os.environ["SEARXNG_API_KEY"] = "your-api-key-here"  # Optional

os.environ["OPENROUTER_API_KEY"] = "your-openrouter-api-key-here"
os.environ["JINA_API_KEY"] = "your-jina-api-key-here"

# Using Serper (default)
search_agent = OpenDeepSearchTool(
    model_name="openrouter/google/gemini-2.0-flash-001",
    reranker="jina"
)

# Or using SearXNG
# search_agent = OpenDeepSearchTool(
#     model_name="openrouter/google/gemini-2.0-flash-001",
#     reranker="jina",
#     search_provider="searxng",
#     searxng_instance_url="https://your-searxng-instance.com",
#     searxng_api_key="your-api-key-here"  # Optional
# )

if not search_agent.is_initialized:
    search_agent.setup()

query = "Fastest land animal?"
result = search_agent.forward(query)
print(result)

Running the Gradio Demo 🖥️

To try out OpenDeepSearch with a user-friendly interface, simply run:

python gradio_demo.py

This will launch a local web interface where you can test different search queries and modes interactively.

You can customize the demo with command-line arguments:

# Using Serper (default)
python gradio_demo.py --model-name "openrouter/google/gemini-2.0-flash-001" --reranker "jina"

# Using SearXNG
python gradio_demo.py --model-name "openrouter/google/gemini-2.0-flash-001" --reranker "jina" \
  --search-provider "searxng" --searxng-instance "https://your-searxng-instance.com" \
  --searxng-api-key "your-api-key-here"  # Optional

Available options: - --model-name: LLM model to use for search - --orchestrator-model: LLM model for the agent orchestrator - --reranker: Reranker to use (jina or infinity) - --search-provider: Search provider to use (serper or searxng) - --searxng-instance: SearXNG instance URL (required if using searxng) - --searxng-api-key: SearXNG API key (opti

Core symbols most depended-on inside this repo

create_llm_strategy
called by 3
src/opendeepsearch/context_scraping/strategy_factory.py
extract_fields
called by 3
src/opendeepsearch/serp_search/serp_search.py
_get_embeddings
called by 2
src/opendeepsearch/ranking_models/base_reranker.py
split_text
called by 2
src/opendeepsearch/ranking_models/chunker.py
print_extraction_result
called by 2
src/opendeepsearch/context_scraping/extraction_result.py
filter_quality_content
called by 2
src/opendeepsearch/context_scraping/utils.py
get_wikipedia_content
called by 2
src/opendeepsearch/context_scraping/utils.py
scrape
called by 2
src/opendeepsearch/context_scraping/crawl4ai_scraper.py

Shape

Method 62
Function 30
Class 26

Languages

Python100%

Modules by API surface

src/opendeepsearch/serp_search/serp_search.py20 symbols
evals/eval_tasks.py9 symbols
src/opendeepsearch/context_scraping/utils.py8 symbols
src/opendeepsearch/context_scraping/fast_scraper.py8 symbols
src/opendeepsearch/context_building/process_sources_pro.py8 symbols
evals/eval_gpt_web.py8 symbols
src/opendeepsearch/context_scraping/crawl4ai_scraper.py7 symbols
src/opendeepsearch/context_scraping/strategy_factory.py6 symbols
src/opendeepsearch/ranking_models/base_reranker.py5 symbols
src/opendeepsearch/ods_agent.py5 symbols
src/opendeepsearch/context_scraping/basic_web_scraper.py5 symbols
src/opendeepsearch/wolfram_tool.py4 symbols

Dependencies from manifests, versioned

datasets3.3.2 · 1×
fasttext-wheel0.9.2 · 1×
gradio5.20.1 · 1×
langchain0.3.19 · 1×
litellm1.61.20 · 1×
openai1.65.1 · 1×
pillow10.4.0 · 1×
smolagents1.9.2 · 1×
transformers4.49.0 · 1×
wikipedia-api0.8.1 · 1×

For agents

$ claude mcp add OpenDeepSearch \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact