This repo serves as an open effort on instruction-tuning and post-training popular pretrained language models on publicly available datasets. We release this repo and will keep updating it with:
We also support some evaluations natively in the codebase, but these are now unmaintained and instead we suggest using OLMES, which we used for TÜLU 3.
The latest details on open post-training are found in TÜLU 3: Pushing Frontiers in Open Language Model Post-Training.
Please see our first paper How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources for more thoughts behind this project and our initial findings. Please see our second paper Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2 for results using Llama-2 models and direct preference optimization. We are still working on more models. For more recent results involving PPO and DPO please see our third paper Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback.
<img src="https://github.com/allenai/open-instruct/raw/v0.3.0/assets/images/tulu_logo.png" alt="Tülu (a hybrid camel) represents a suite of LLaMa models that we built by fully-finetuning them on a strong mix of datasets." style="width: 20%; min-width: 200px; display: block; margin: auto;">
Try some of the models we train with Open Instruct. There is a free demo or download them from HuggingFace:
| Stage | Llama 3.1 8B | Llama 3.1 70B | OLMo-2 7B | OLMo-2 13B |
|---|---|---|---|---|
| Base Model | meta-llama/Llama-3.1-8B | meta-llama/Llama-3.1-70B | allenai/OLMo2-7B-1124 | allenai/OLMo-2-13B-1124 |
| SFT | allenai/Llama-3.1-Tulu-3-8B-SFT | allenai/Llama-3.1-Tulu-3-70B-SFT | allenai/OLMo-2-1124-7B-SFT | allenai/OLMo-2-1124-13B-SFT |
| DPO | allenai/Llama-3.1-Tulu-3-8B-DPO | allenai/Llama-3.1-Tulu-3-70B-DPO | allenai/OLMo-2-1124-7B-DPO | allenai/OLMo-2-1124-13B-DPO |
| Final Models (RLVR) | allenai/Llama-3.1-Tulu-3-8B | allenai/Llama-3.1-Tulu-3-70B | allenai/OLMo-2-1124-7B-Instruct | allenai/OLMo-2-1124-13B-Instruct |
| Reward Model (RM) | allenai/Llama-3.1-Tulu-3-8B-RM | (Same as 8B) | allenai/OLMo-2-1124-7B-RM | (Same as 7B) |
scripts/eval/ for examples of running them.Our setup follows our Dockerfile. Note that Open Instruct is a research codebase and does not guarantee backward compatibility.
We use uv for installation and running code. You can install with uv sync.
Git LFS (for running tests): Install Git LFS and run git lfs install before cloning. See CONTRIBUTING.md for details.
Docker installation: You can also use the Dockerfile to build a Docker image. You can build the image with the following command:
docker build . \
--build-arg GIT_COMMIT=$(git rev-parse --short HEAD) \
--build-arg GIT_BRANCH=$(git rev-parse --abbrev-ref HEAD) \
-t open_instruct_dev
# if you are internally at AI2, you can create a beaker image like this:
beaker_user=$(beaker account whoami --format json | jq -r '.[0].name')
beaker image delete $beaker_user/open_instruct_dev
beaker image create open_instruct_dev -n open_instruct_dev -w ai2/$beaker_user
If you are internally at AI2, you may launch experiments using our always-up-to-date auto-built image nathanl/open_instruct_auto.
After having setup the environment, you are ready to launch some experiments. We provide a few examples below. To learn more about how to reproduce the Tulu 3 models, please refer to the Tulu 3 README. The instructions and documentations for Tulu 1 and Tulu 2 are in Tulu 1 and 2 README.
You can run the following command for getting started:
# train an 8B tulu3 model using 8 GPU
bash scripts/train/tulu3/finetune_8b.sh
OLMo-core SFT: For supported models (OLMo, OLMoE, Qwen3), we recommend the more GPU-efficient OLMo-core SFT implementation. See open_instruct/olmo_core_utils.py for the list of supported models.
# train an 8B tulu3 model using 8 GPU
bash scripts/train/tulu3/dpo_8b.sh
We train with open_instruct/grpo_fast.py. Launch via scripts/train/build_image_and_launch.sh, which builds the Beaker image from your current commit and runs the chosen script:
# Single-GPU smoke test on Beaker (small model, fast).
./scripts/train/build_image_and_launch.sh scripts/train/debug/single_gpu_on_beaker.sh
# Two-node 8xGPU run (Qwen2.5-7B on code RLVR).
./scripts/train/build_image_and_launch.sh scripts/train/debug/large_test_script.sh
We release our scripts for measuring the overlap between instruction tuning datasets and evaluation datasets in ./decontamination. See the README for more details.
When submitting a PR to this repo, we check the core code in open_instruct/ for style with the following:
make style
make quality
Run the tests with uv run pytest.
To automatically run linting and formatting on each commit:
uv add pre-commit --dev
uv run pre-commit install
To run on all files (recommended after initial setup):
uv run pre-commit run --all-files
├── assets/ <- Images, licenses, etc.
├── configs/
| ├── beaker_configs/ <- AI2 Beaker configs
| ├── ds_configs/ <- DeepSpeed configs
| └── train_configs/ <- Training configs
├── decontamination/ <- Scripts for measuring train-eval overlap
├── eval/ <- Evaluation suite for fine-tuned models
├── human_eval/ <- Human evaluation interface (not maintained)
├── open_instruct/ <- Source code (flat)
├── quantize/ <- Scripts for quantization
├── scripts/ <- Core training and evaluation scripts
└── Dockerfile <- Dockerfile
This codebase is licensed under Apache 2.0 as given in LICENSE.
The license we use for V1 models released (along with the base model licenses) can be found in assets/model_licenses/tulu_license.txt - just replace <MODELNAME> with the actual model name (i.e., the name on HuggingFace).
V2 models are licensed under the low-risk AI2 ImpACT license. See here for more details.
Open Instruct is a project that benefited from many open-source projects and libraries. We would like to particularly thank the following projects:
If you used this repository or our models, please cite our work:
Tulu 1:
@misc{wang2023far,
title={How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources},
author={Yizhong Wang and Hamish Ivison and Pradeep Dasigi and Jack Hessel and Tushar Khot and Khyathi Raghavi Chandu and David Wadden and Kelsey MacMillan and Noah A. Smith and Iz Beltagy and Hannaneh Hajishirzi},
year={2023},
eprint={2306.04751},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Tulu 2: ```bibtex @misc{ivison2023camels,
$ claude mcp add open-instruct \
-- python -m otcore.mcp_server <graph>