hub / github.com/openreasoner/openr

github.com/openreasoner/openr @main sqlite

1,888 symbols 5,028 edges 138 files 149 documented · 8%

README

OpenR: 专注大型语言模型进阶推理能力的开源框架

<a href="https://arxiv.org/abs/2410.09671">技术报告</a>
·
<a href="https://github.com/openreasoner/openr/blob/main/reports/Tutorial-LLM-Reasoning-Wang.pdf">指南</a>
·
<a href="https://github.com/openreasoner/openr">代码库</a>
·
<a href="https://openreasoner.github.io/">文档</a>
·
<a href="https://huggingface.co/datasets/openreasoner/MATH-APS">数据集</a>
·
<a href="https://huggingface.co/openreasoner/Math-psa">模型文件</a>
·
<a href="https://github.com/openreasoner/openr/issues">问答</a>
·
<a href="https://www.modelscope.cn/studios/modelscope/OpenR_Inference">推理</a>






 [ <a href="https://github.com/openreasoner/openr/blob/main/README.md">English</a> ][ <a href="https://github.com/openreasoner/openr/blob/main/README_zh.md">中文</a> ]

[ GitHub contributors ][contributors-url] GitHub License [ GitHub Issues or Pull Requests ][issues-url] [ GitHub forks ][forks-url] [ GitHub Repo stars ][stars-url]

目录 📖

新闻与更新
功能
TODO
Benchmark
图表
数据集与模型
快速入门
- 安装
- 快速开始
用法
加入我们
联系方式
问答示例
社区
参考引用

新闻与更新

[29/11/2024] 已添加 demo 页面至 ModelScope. 感谢 @wangxingjun778 !
[24/10/2024] OpenR 现已支持 MCTS 推理 (#24)! 🌲
[15/10/2024] 我们的报告已发布在 Arxiv 上!
[12/10/2024] OpenR 已经发布！ 🚀

功能

Description

✅ 过程监督的数据生成
✅ 在线策略训练
✅ Generative 和 Discriminative 过程奖励模型的训练
✅ 多种搜索策略
✅ Test-time 计算和 Scaling Law

功能	内容
✅ 过程监督的数据生成	- OmegaPRM: Improve Mathematical Reasoning in Language Models by Automated Process Supervision
✅ 在线策略训练	- 强化学习训练: 使用PRM进行在线RL训练
✅ PRM奖励模型的训练	- PRM 训练: Supervised Training for PRMs

生成式奖励模型训练: Direct GenRM | | ✅ 多种搜索策略 | - Greedy Search
Best-of-N
Beam Search
MCTS
rStar: Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Critic-MCTS | | ✅ Test-time Computation and Scaling Law | 详见 benchmark |

TODO

功能	TODO (高优先级, 欢迎加入开发！)
👨‍💻数据	- 复现 Journey Learning
👨‍💻RL训练	- 分布式训练

Reinforcement Fine-Tuning (RFT) #80 | | 👨‍💻PRM | - 更大规模训练
GenRM-CoT 的训练实现
Soft-label training #57 | | 👨‍💻推理 | - 优化代码结构 #53
添加更多推理任务 (AIME, etc.) #53
多模态推理 #82
代码生成推理 #68
Dots #75
推理精度检查
Benchmarking |

Benchmark

详见 Benchmark !

图表

PRM_Results Inference_Results

数据集与模型

MATH-APS (我们发布的数据集)

MATH-psa (我们发布的过程奖励模型)

快速入门

安装

conda create -n open_reasoner python=3.10
conda activate open_reasoner
pip install -r requirements.txt
pip3 install  "fschat[model_worker,webui]"
pip install -U pydantic
cd envs/MATH/latex2sympy
pip install -e .
cd -

下载基座模型

在运行项目之前，请确保已下载所有所需的基础模型。本项目使用的模型包括：

Qwen2.5-Math-1.5B-Instruct, Qwen2.5-Math-7B-Instruct
peiyi9979/mistral-7b-sft
peiyi9979/math-shepherd-mistral-7b-prm

Huggingface 具体下载方式可参考 Huggingface 下载教程

在继续之前，请确保所有模型已根据项目设置保存在各自的目录中。

快速开始

在运行推理之前，请修改reason/llm_service/目录下脚本中的以下变量，以设置适合您使用的基座模型：

$MODEL_BASE: 设置为存储模型的目录路径。
$POLICY_MODEL_NAME: 设置为您希望使用的策略模型的名称。
$VALUE_MODEL_NAME: 设置为您希望使用的Value模型的名称。
$NUM_LM_WORKER: 设置为要启动的语言模型（LM）进程的数量
$NUM_RM_WORKER: 设置为要启动的奖励模型（RM）进程的数量。

接下来，我们将使用不同的技术运行推理。

启动 LM 和 RM 服务

例如，要启动 Math Shepherd 模型的 LM 和 RM 服务，请运行以下命令：

sh reason/llm_service/create_service_math_shepherd.sh

关闭服务进程可以参考以下命令:

tmux kill-session -t {Your Session Name} # default is `FastChat`

用法

运行推理(Inference)

⚠️ 确保脚本中的输入参数(--LM, --RM)与待运行的进程中的变量($POLICY_MODEL_NAME, $VALUE_MODEL_NAME)保持一致！

export PYTHONPATH=$(pwd)
sh scripts/eval/cot_greedy.sh

# Method: cot. Average result: ({'majority_vote': 0.734, 'total_completion_tokens': 559.13},)

sh scripts/eval/cot_rerank.sh

# Method: best_of_n. Average result: ({'majority_vote': 0.782, 
#                                       'prm_min_max': 0.772, 
#                                       'prm_min_vote': 0.792, 
#                                       'prm_last_max': 0.776, 
#                                       'prm_last_vote': 0.792, 
#                                       'total_completion_tokens': 4431.268},)

sh scripts/eval/beam_search.sh

# Method: beam_search. Average result: ({'majority_vote': 0.74, 'total_completion_tokens': 2350.492},)

sh scripts/eval/vanila_mcts.sh

运行训练(Training)

⚠️ 运行训练之前，请修改 train/mat/scripts/train_llm.sh 文件中的 $dataset_path, $model_name_or_path 和 $prm_name_or_path 项。

cd train/mat/scripts
bash train_llm.sh

运行 PRM学习

cd prm/code

\\ single gpu
python finetune_qwen_single_gpu.py --model_path $YOUR_MODEL_PATH \
                                   --train_data_path $TRAIN_DATA_PATH \
                                   --test_data_path $TEST_DATA_PATH


\\ multi gpu
torchrun --nproc_per_node=2 finetune_qwen.py --model_path $YOUR_MODEL_PATH \
                                             --data_path $YOUR_DATA_FOLDER_PATH \
                                             --datasets both \

加入我们

您的每一份贡献对社区来说都是宝贵的。

感谢您对 OpenR 的关注！🥰 我们致力于发展开源社区，并十分欢迎大家的contribution。无论大小，您的努力都将帮助我们成长和进步。贡献不仅限于代码——解答问题、帮助他人、改进我们的文档、分享项目同样具有深远的影响。

欢迎查阅贡献指南 !

未来计划

更全面的强化学习训练和搜索方法的实验
更大规模的Prove-Verifier模型
支持自我提升训练功能

联系方式

OpenR 社区由以下团队维护：

Openreasoner Team (openreasoner@gmail.com)

License

OpenR is released under the MIT License.

欢迎引用

如果您觉得我们的资源对您有帮助，请引用我们的论文：

@article{wang2024openr,
  title={OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models},
  author={Wang, Jun and Fang, Meng and Wan, Ziyu and Wen, Muning and Zhu, Jiachen and Liu, Anjie and Gong, Ziqin and Song, Yan and Chen, Lei and Ni, Lionel M and others},
  journal={arXiv preprint arXiv:2410.09671},
  year={2024}
}

十分感谢！

问答示例

对比过程奖励模型（PRM）：Math-psa (Ours) V.S. Math-Shepherd

QA 1 QA 2

验证强化学习训练（RL Training）

QA 3 QA 4

探索 Test-time Computation

QA 5 QA 6 QA 7

社区

微信群聊:

参考引用

Inference-time Computing

[1] Alphazero-like tree-search can guide large language model decoding and training.

[2] Reasoning with language model is planning with world model.

[3] Scaling LLM test-time compute optimally can be more effective than scaling model parameters

[4] Think before you speak: Training language models with pause tokens

From Outcome Supervision to Process Supervision

[1] [Training verifi

Core symbols most depended-on inside this repo

assert_equal

called by 494

envs/MATH/latex2sympy/tests/context.py

expr

called by 115

envs/MATH/latex2sympy/gen/PSParser.py

enterRule

called by 57

envs/MATH/latex2sympy/gen/PSParser.py

exitRule

called by 57

envs/MATH/latex2sympy/gen/PSParser.py

supexpr

called by 38

envs/MATH/latex2sympy/gen/PSParser.py

convert_expr

called by 34

envs/MATH/latex2sympy/latex2sympy2.py

from_str

called by 28

preprocess/src/data_types/utils.py

copy

called by 27

envs/base_env.py

Shape

Method 1,232

Function 469

Class 162

Route 25

Languages

Python100%

Modules by API surface

envs/MATH/latex2sympy/gen/PSParser.py780 symbols

envs/MATH/latex2sympy/gen/PSListener.py127 symbols

reason/guided_search/tree.py41 symbols

envs/MATH/latex2sympy/latex2sympy2.py39 symbols

envs/rstar/rstar_utils.py38 symbols

data/omegaPRM_v2/omegaprm.py37 symbols

reason/llm_service/workers/base_model_worker.py34 symbols

train/mat/envs/math/math_env_wrappers.py28 symbols

envs/base_env.py26 symbols

envs/rstar/eval_src/Evaluator.py25 symbols

preprocess/src/preprocessors/math_aps.py24 symbols

envs/MATH/verify_utils.py23 symbols

Dependencies from manifests, versioned

accelerate0.34.2 · 1×

antlr4-python3-runtime4.11.1 · 1×

mpmath1.3.0 · 1×

sympy1.12 · 1×

torch2.4.0 · 1×

transformers4.44.2 · 1×

vllm0.6.1.post2 · 1×

For agents

$ claude mcp add openr \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact

github.com/openreasoner/openr @main sqlite

OpenR: 专注大型语言模型进阶推理能力的开源框架

新闻与更新

功能

TODO

Benchmark

图表

数据集与模型

快速入门

安装

下载基座模型

快速开始

启动 LM 和 RM 服务

用法

运行 推理(Inference)

运行 训练(Training)

运行 PRM学习

加入我们

未来计划

联系方式

License

欢迎引用

问答示例

对比 过程奖励模型（PRM）：Math-psa (Ours) V.S. Math-Shepherd

验证强化学习训练 （RL Training）

探索 Test-time Computation

社区

参考引用

Inference-time Computing

From Outcome Supervision to Process Supervision

Core symbols most depended-on inside this repo

Shape

Languages

Modules by API surface

Dependencies from manifests, versioned

For agents

运行推理(Inference)

运行训练(Training)

对比过程奖励模型（PRM）：Math-psa (Ours) V.S. Math-Shepherd

验证强化学习训练（RL Training）