hub / github.com/algorithmicsuperintelligence/optillm

github.com/algorithmicsuperintelligence/optillm @v0.3.16 sqlite

repository ↗ · DeepWiki ↗ · release v0.3.16 ↗

1,332 symbols 4,737 edges 128 files 888 documented · 67%

README

OptiLLM

🚀 2-10x accuracy improvements on reasoning tasks with zero training

🤗 HuggingFace Space • 📓 Colab Demo • 💬 Discussions

OptiLLM is an OpenAI API-compatible optimizing inference proxy that implements 20+ state-of-the-art techniques to dramatically improve LLM accuracy and performance on reasoning tasks - without requiring any model training or fine-tuning.

It is possible to beat the frontier models using these techniques across diverse tasks by doing additional compute at inference time. A good example of how to combine such techniques together is the CePO approach from Cerebras.

✨ Key Features

🎯 Instant Improvements: 2-10x better accuracy on math, coding, and logical reasoning
🔌 Drop-in Replacement: Works with any OpenAI-compatible API endpoint
🧠 20+ Optimization Techniques: From simple best-of-N to advanced MCTS and planning
📦 Zero Training Required: Just proxy your existing API calls through OptiLLM
⚡ Production Ready: Used in production by companies and researchers worldwide
🌍 Multi-Provider: Supports OpenAI, Anthropic, Google, Cerebras, and 100+ models via LiteLLM

🚀 Quick Start

Get powerful reasoning improvements in 3 simple steps:

# 1. Install OptiLLM
pip install optillm

# 2. Start the server
export OPENAI_API_KEY="your-key-here"
optillm

# 3. Use with any OpenAI client - just change the model name!

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1")

# Add 'moa-' prefix for Mixture of Agents optimization
response = client.chat.completions.create(
    model="moa-gpt-4o-mini",  # This gives you GPT-4o performance from GPT-4o-mini!
    messages=[{"role": "user", "content": "Solve: If 2x + 3 = 7, what is x?"}]
)

Before OptiLLM: "x = 1" ❌
After OptiLLM: "Let me work through this step by step: 2x + 3 = 7, so 2x = 4, therefore x = 2" ✅

📊 Proven Results

OptiLLM delivers measurable improvements across diverse benchmarks:

Technique	Base Model	Improvement	Benchmark
MARS	Gemini 2.5 Flash Lite	+30.0 points	AIME 2025 (43.3→73.3)
CePO	Llama 3.3 70B	+18.6 points	Math-L5 (51.0→69.6)
AutoThink	DeepSeek-R1-1.5B	+9.34 points	GPQA-Diamond (21.72→31.06)
LongCePO	Llama 3.3 70B	+13.6 points	InfiniteBench (58.0→71.6)
MOA	GPT-4o-mini	Matches GPT-4	Arena-Hard-Auto
PlanSearch	GPT-4o-mini	+20% pass@5	LiveCodeBench

Full benchmark results below ⬇️

🏗️ Installation

Using pip

pip install optillm
optillm
2024-10-22 07:45:05,612 - INFO - Loaded plugin: privacy
2024-10-22 07:45:06,293 - INFO - Loaded plugin: memory
2024-10-22 07:45:06,293 - INFO - Starting server with approach: auto

Using docker

docker pull ghcr.io/algorithmicsuperintelligence/optillm:latest
docker run -p 8000:8000 ghcr.io/algorithmicsuperintelligence/optillm:latest
2024-10-22 07:45:05,612 - INFO - Loaded plugin: privacy
2024-10-22 07:45:06,293 - INFO - Loaded plugin: memory
2024-10-22 07:45:06,293 - INFO - Starting server with approach: auto

Available Docker image variants:

Full image (latest): Includes all dependencies for local inference and plugins
Proxy-only (latest-proxy): Lightweight image without local inference capabilities
Offline (latest-offline): Self-contained image with pre-downloaded models (spaCy) for fully offline operation

# Proxy-only (smallest)
docker pull ghcr.io/algorithmicsuperintelligence/optillm:latest-proxy

# Offline (largest, includes pre-downloaded models)
docker pull ghcr.io/algorithmicsuperintelligence/optillm:latest-offline

Install from source

Clone the repository with git and use pip install to setup the dependencies.

git clone https://github.com/algorithmicsuperintelligence/optillm.git
cd optillm
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

🔒 SSL Configuration

OptILLM supports SSL certificate verification configuration for working with self-signed certificates or corporate proxies.

Disable SSL verification (development only):

# Command line
optillm --no-ssl-verify

# Environment variable
export OPTILLM_SSL_VERIFY=false
optillm

Use custom CA certificate:

# Command line
optillm --ssl-cert-path /path/to/ca-bundle.crt

# Environment variable
export OPTILLM_SSL_CERT_PATH=/path/to/ca-bundle.crt
optillm

⚠️ Security Note: Disabling SSL verification is insecure and should only be used in development. For production environments with custom CAs, use --ssl-cert-path instead. See SSL_CONFIGURATION.md for details.

Implemented techniques

Approach	Slug	Description
MARS (Multi-Agent Reasoning System)	`mars`	Multi-agent reasoning with diverse temperature exploration, cross-verification, and iterative improvement
Cerebras Planning and Optimization	`cepo`	Combines Best of N, Chain-of-Thought, Self-Reflection, Self-Improvement, and various prompting techniques
CoT with Reflection	`cot_reflection`	Implements chain-of-thought reasoning with \<thinking>, \ and \ sections
PlanSearch	`plansearch`	Implements a search algorithm over candidate plans for solving a problem in natural language
ReRead	`re2`	Implements rereading to improve reasoning by processing queries twice
Self-Consistency	`self_consistency`	Implements an advanced self-consistency method
Z3 Solver	`z3`	Utilizes the Z3 theorem prover for logical reasoning
R* Algorithm	`rstar`	Implements the R* algorithm for problem-solving
LEAP	`leap`	Learns task-specific principles from few shot examples
Round Trip Optimization	`rto`	Optimizes responses through a round-trip process
Best of N Sampling	`bon`	Generates multiple responses and selects the best one
Mixture of Agents	`moa`	Combines responses from multiple critiques
Monte Carlo Tree Search	`mcts`	Uses MCTS for decision-making in chat responses
PV Game	`pvg`	Applies a prover-verifier game approach at inference time
Deep Confidence	N/A for proxy	Implements confidence-guided reasoning with multiple intensity levels for enhanced accuracy
CoT Decoding	N/A for proxy	Implements chain-of-thought decoding to elicit reasoning without explicit prompting
Entropy Decoding	N/A for proxy	Implements adaptive sampling based on the uncertainty of tokens during generation
Thinkdeeper	N/A for proxy	Implements the `reasoning_effort` param from OpenAI for reasoning models like DeepSeek R1
AutoThink	N/A for proxy	Combines query complexity classification with steering vectors to enhance reasoning

Implemented plugins

Plugin	Slug	Description
System Prompt Learning	`spl`	Implements what Andrej Karpathy called the third paradigm for LLM learning, this enables the model to acquire program solving knowledge and strategies
Deep Think	`deepthink`	Implements a Gemini-like Deep Think approach using inference time scaling for reasoning LLMs
Long-Context Cerebras Planning and Optimization	`longcepo`	Combines planning and divide-and-conquer processing of long documents to enable infinite context
Majority Voting	`majority_voting`	Generates k candidate solutions and selects the most frequent answer through majority voting (default k=6)
MCP Client	`mcp`	Implements the model context protocol (MCP) client, enabling you to use any LLM with any MCP Server
Router	`router`	Uses the optillm-modernbert-large model to route requests to different approaches based on the user prompt
Chain-of-Code	`coc`	Implements a chain of code approach that combines CoT with code execution and LLM based code simulation
Memory	`memory`	Implements a short term memory layer, enables you to use unbounded context length with any LLM
Privacy	`privacy`	Anonymize PII data in request and deanonymize it back to original value in response
Read URLs	`readurls`	Reads all URLs found in the request, fetches the content at the URL and adds it to the context
Execute Code	`executecode`	Enables use of code interpreter to execute python code in requests and LLM generated responses
JSON	`json`	Enables structured outputs using the outlines library, supports pydantic types and JSON schema
GenSelect	`genselect`	Generative Solution Selection - generates multiple candidates and selects the best based on quality criteria
Web Search	`web_search`	Performs Google searches using Chrome automation (Selenium) to gather search results and URLs
Deep Research	`deep_research`	Implements Test-Time Diffusion Deep Researcher (TTD-DR) for comprehensive research reports using iterative refinement
Proxy	`proxy`	Load balancing and failover across multiple LLM providers with health monitoring and round-robin routing

We support all major LLM providers and models for inference. You need to set the correct environment variable and the proxy will pick the corresponding client.

Provider	Required Environment Variables	Additional Notes
OptiLLM	`OPTILLM_API_KEY`	Uses the inbuilt local server for inference, supports logprobs and decoding techniques like `cot_decoding` & `entropy_decoding`
OpenAI	`OPENAI_API_KEY`	You can use this with any OpenAI compatible endpoint (e.g. OpenRouter) by setting the `base_url`
Cerebras	`CEREBRAS_API_KEY`	You can use this for fast inference with supported models, see docs for details
Azure OpenAI	`AZURE_OPENAI_API_KEY`

AZURE_API_VERSION

AZURE_API_BASE | - | | Azure OpenAI (Managed Identity) | AZURE_API_VERSION

You can then run the optillm proxy as follows.

```bash python optillm.py 2024-09-06 07:57:14,191 - INFO - Starting server with approach: auto 2024-09-06 07:57:14,191 - INFO - Server configuration: {'approach': 'auto', 'mcts_simulations': 2, 'mcts_exploration': 0.2, 'mcts_depth': 1, 'best_of_n': 3, 'model': 'gpt-4o-mini', 'rstar_max_depth': 3, 'rstar_num_rollouts': 5, 'rstar_c': 1.4, 'base_url': '', 'host': '127.0.0.1'} * Serving Flask app 'optillm'

Core symbols most depended-on inside this repo

create

called by 119

optillm/plugins/proxy/routing.py

optillm/conversation_logger.py

optillm/plugins/longcepo/utils.py

called by 23

optillm/plugins/web_search_plugin.py

start_conversation

called by 17

optillm/conversation_logger.py

load

called by 17

optillm/plugins/proxy/config.py

Shape

Method 731

Function 398

Class 178

Route 25

Languages

Python100%

Modules by API surface

optillm/inference.py102 symbols

tests/test_mcp_plugin.py43 symbols

tests/test_ssl_config.py41 symbols

tests/test_compact_plugin.py39 symbols

tests/test_batching.py39 symbols

tests/test_mars_parallel.py37 symbols

optillm/plugins/mcp_plugin.py32 symbols

optillm/server.py30 symbols

optillm/plugins/spl/strategy.py30 symbols

optillm/rstar.py27 symbols

tests/test_approaches.py25 symbols

tests/test_json_plugin.py24 symbols

Dependencies from manifests, versioned

aiohttp1×

azure-identity1×

beautifulsoup41×

bitsandbytes1×

cerebras_cloud_sdk1×

flask1×

google-cloud-aiplatform1×

ipykernel1×

ipython1×

litellm1×

lxml1×

mlx-lm0.24.0 · 1×

For agents

$ claude mcp add optillm \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact