MCPcopy
hub / github.com/PrimeIntellect-ai/verifiers

github.com/PrimeIntellect-ai/verifiers @v0.1.14 sqlite

repository ↗ · DeepWiki ↗ · release v0.1.14 ↗
5,544 symbols 21,671 edges 367 files 1,778 documented · 32%
README
<img alt="Prime Intellect" src="https://github.com/user-attachments/assets/6414bc9b-126b-41ca-9307-9e982430cde8" width="312" style="max-width: 100%;">

Verifiers: Environments for LLM Reinforcement Learning

DocumentationEnvironments HubPRIME-RL


Style Test Envs

News & Updates

  • [04/17/26] v0.1.12 is released, featuring a new composable Task/Agent/Environment architecture, upstreamed opencode and RLM harnesses/tasksets, major RLMEnv improvements (context dropping, prompt builder, hardened transport), multi-worker env server support, expanded vf-tui capabilities, and richer eval configuration.
  • [03/12/26] v0.1.11 is released, featuring a unified client stack, major RLMEnv and env server reliability improvements, a substantially refined eval TUI, new pass@k and ablation sweep support, and bundled opencode environments.
  • [02/10/26] v0.1.10 is released, featuring OpenEnv and BrowserEnv integrations, resumed evals, improved rollout and token tracking, safer sandbox lifecycle behavior, refreshed workspace setup, and opencode harbor improvements.
  • [01/08/26] v0.1.9 is released, featuring a number of new experimental environment class types, monitor rubrics for automatic metric collection, improved workspace setup flow, improved error handling, bug fixes, and a documentation overhaul.
  • [11/19/25] v0.1.8 is released, featuring a major refactor of the rollout system to use trajectory-based tracking for token-in token-out training across turns, as well as support for truncated or branching rollouts.
  • [11/07/25] Verifiers v0.1.7 is released! This includes an improved quickstart configuration for training with prime-rl, a new included "nano" trainer (vf.RLTrainer, replacing vf.GRPOTrainer), and a number of bug fixes and improvements to the documentation.
  • [10/27/25] A new iteration of the Prime Intellect Environments Program is live!

Overview

Verifiers is our library for creating environments to train and evaluate LLMs.

Environments contain everything required to run and evaluate a model on a particular task: - A dataset of task inputs - A harness for the model (tools, sandboxes, context management, etc.) - A reward function or rubric to score the model's performance

Environments can be used for training models with reinforcement learning (RL), evaluating capabilities, generating synthetic data, experimenting with agent harnesses, and more.

Verifiers is tightly integrated with the Environments Hub, as well as our training framework prime-rl and our Hosted Training platform.

Getting Started

Ensure you have uv installed, as well as the prime CLI tool:

# install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# install the prime CLI
uv tool install prime
# log in to the Prime Intellect platform
prime login

To set up a new workspace for developing environments, do:

# ~/dev/my-lab
prime lab setup 

This sets up a Python project if needed (with uv init), installs verifiers (with uv add verifiers), creates the recommended workspace structure, and downloads useful starter files:

configs/
├── endpoints.toml      # OpenAI-compatible API endpoint configuration
├── rl/                 # Example configs for Hosted Training
├── eval/               # Example multi-environment eval configs
└── gepa/               # Example configs for prompt optimization
.prime/
└── skills/             # Bundled workflow skills for create/browse/review/eval/GEPA/train/brainstorm
environments/
└── AGENTS.md           # Documentation for AI coding agents
AGENTS.md               # Top-level documentation for AI coding agents
CLAUDE.md               # Claude-specific pointer to AGENTS.md

Alternatively, add verifiers to an existing project:

uv add verifiers && prime lab setup --skip-install

Environments built with Verifiers are self-contained Python modules. To initialize a fresh environment template, do:

prime env init my-env # creates a new template in ./environments/my_env

For OpenEnv integration, use:

prime env init my-openenv --openenv

Then copy your OpenEnv project into environments/my_openenv/proj/ and build the image with:

uv run vf-build my-openenv

This will create a new module called my_env with a basic environment template.

environments/my_env/
├── my_env.py           # Main implementation
├── pyproject.toml      # Dependencies and metadata
└── README.md           # Documentation

Environment modules should expose a load_environment function which returns an instance of the Environment object, and which can accept custom arguments. For example:

# my_env.py
import verifiers as vf

def load_environment(dataset_name: str = 'gsm8k') -> vf.Environment:
    dataset = vf.load_example_dataset(dataset_name) # 'question'
    async def correct_answer(completion, answer) -> float:
        completion_ans = completion[-1]['content']
        return 1.0 if completion_ans == answer else 0.0
    rubric = vf.Rubric(funcs=[correct_answer])
    env = vf.SingleTurnEnv(dataset=dataset, rubric=rubric)
    return env

For composable environments with reusable tasksets, toolsets, custom programs, or custom harnesses, use the v1 BYO Harness path:

# my_env.py
import verifiers.v1 as vf

def source():
    yield {
        "prompt": [{"role": "user", "content": "Reverse abc."}],
        "answer": "cba",
        "max_turns": 1,
    }

@vf.reward(weight=1.0)
async def contains_answer(task, state) -> float:
    return float(task["answer"] in str(state.get("completion") or ""))

def load_taskset(config: vf.TasksetConfig | None = None):
    return vf.Taskset(source=source, rewards=[contains_answer], config=config)

def load_environment(config: vf.EnvConfig | None = None) -> vf.Env:
    config = config or vf.EnvConfig()
    return vf.Env(taskset=load_taskset(config=config.taskset))

If no harness is passed, vf.Env uses the base endpoint-backed harness. See BYO Harness for the advanced v1 taskset/harness API. Reusable taskset and harness packages live under verifiers.v1.packages while the v1 API stabilizes, and are re-exported from verifiers.v1 for normal use. For example, Harbor task directories can run through the bundled OpenCode CLI harness with:

env = vf.Env(
    taskset=vf.HarborTaskset(tasks="/path/to/harbor/tasks"),
    harness=vf.OpenCode(),
)

The same environment package is the unit used by evals and prime-rl. The trainer owns model, endpoint, sampling, and rollout count; v1-specific taskset and harness options stay under env.taskset and env.harness:

# configs/rl/my-v1-env.toml
model = "Qwen/Qwen3-30B-A3B-Instruct-2507"
max_steps = 100
batch_size = 256
rollouts_per_example = 8

[sampling]
max_tokens = 4096

[[env]]
id = "my-env"

[env.args]
arg1 = "non-th-arg"

[env.harness]
max_turns = 1

[env.taskset.scoring.contains_answer]
weight = 1.0
prime env install my-env
uv run prime-rl configs/rl/my-v1-env.toml

To install the environment module into your project, do:

prime env install my-env # installs from ./environments/my_env

To install an environment from the Environments Hub into your project, do:

prime env install primeintellect/math-python

To run a local evaluation with any OpenAI-compatible model, do:

prime eval run my-env -m openai/gpt-5-nano # run and save eval results locally

Evaluations use Prime Inference by default; configure your own API endpoints in ./configs/endpoints.toml.

View local evaluation results in the terminal UI:

prime eval tui

To publish the environment to the Environments Hub, do:

prime env push --path ./environments/my_env

To run an evaluation directly from the Environments Hub, do:

prime eval run primeintellect/math-python

Documentation

Environments — Create datasets, rubrics, and custom multi-turn interaction protocols.

BYO Harness — Build composable v1 taskset/harness environments with custom tools, sandboxes, users, and custom programs.

Evaluation - Evaluate models using your environments.

Training — Train models in your environments with reinforcement learning.

Development — Contributing to verifiers

API Reference — Understanding the API and data structures

FAQs - Other frequently asked questions.

Citation

Originally created by Will Brown (@willccbb).

If you use this code in your research, please cite:

@misc{brown_verifiers_2025,
  author       = {William Brown},
  title        = {{Verifiers}: Environments for LLM Reinforcement Learning},
  howpublished = {\url{https://github.com/PrimeIntellect-ai/verifiers}},
  note         = {Commit abcdefg • accessed DD Mon YYYY},
  year         = {2025}
}

Extension points exported contracts — how you extend this code

ActionLogger (Interface)
(no doc)
assets/templates/browserbase/cua/actionExecutor.ts
ActionRequest (Interface)
(no doc)
assets/templates/browserbase/cua/types.ts
Viewport (Interface)
(no doc)
assets/templates/browserbase/cua/types.ts
BrowserState (Interface)
(no doc)
assets/templates/browserbase/cua/types.ts
ActionExecutionResult (Interface)
(no doc)
assets/templates/browserbase/cua/types.ts

Core symbols most depended-on inside this repo

get
called by 1517
verifiers/scripts/tui.py
append
called by 1005
verifiers/v1/task.py
get
called by 359
verifiers/types.py
setdefault
called by 147
verifiers/v1/task.py
run
called by 118
verifiers/v1/harness.py
pop
called by 112
verifiers/v1/task.py
extend
called by 88
verifiers/v1/task.py
freeze
called by 87
verifiers/v1/task.py

Shape

Method 2,778
Function 2,156
Class 571
Route 29
Interface 10

Languages

Python99%
TypeScript1%

Modules by API surface

verifiers/scripts/tui.py323 symbols
tests/test_rlm_env.py219 symbols
verifiers/envs/experimental/rlm_env.py138 symbols
verifiers/v1/runtime.py129 symbols
tests/test_v1_runtime_lifecycle.py118 symbols
tests/test_v1_config_extension.py117 symbols
verifiers/envs/experimental/composable/tasksets/swe/swe_rebench_v2_log_parsers.py104 symbols
tests/test_decorator_ranks.py85 symbols
tests/test_save_utils.py72 symbols
tests/test_eval_cli.py71 symbols
tests/test_harbor_env_mcp.py70 symbols
verifiers/envs/integrations/openenv_env.py68 symbols

Dependencies from manifests, versioned

deepmerge4.3.1 · 1×
dotenv16.4.5 · 1×
esbuild0.27.2 · 1×
fastify5.0.0 · 1×
postject1.0.0-alpha.6 · 1×
tsx4.10.5 · 1×
typescript5.2.2 · 1×
zod3.25.76 · 1×
accelerate1.4.0 · 1×
aiolimiter1.2.1 · 1×
anthropic0.78.0 · 1×

For agents

$ claude mcp add verifiers \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact