
<b><font size="5">project website</font></b>
<sup>
<a href="https://space.bilibili.com/3493095748405551?spm_id_from=333.337.search-card.all.click">
<i><font size="4">HOT</font></i>
</a>
</sup>
<b><font size="5">PKU-Alignment Team</font></b>
<sup>
<a href="https://space.bilibili.com/3493095748405551?spm_id_from=333.337.search-card.all.click">
<i><font size="4">welcome</font></i>
</a>
</sup>
📘Documentation | 🛠️Quick Start | 🚀Algorithms | 👀Evaluation | 🤔Reporting Issues
Our All-Modality Alignment Datasets
Align-Anything aims to align any modality large models (any-to-any models) with human intentions and values.
Note: We provide a quick start guide for users to quickly get the code structure and development details.
We are actively working on the following features:
⚡️ More Models: Integrating cutting-edge models like the Qwen3-VL series.
🚀 More Inference Engines: Adding support for high-performance engines like SGLang.
🤖 Advanced VLA Algorithms: Implementing more VLA algorithms, including Safe-VLA.
🧠 Agent RL: Expanding capabilities to support Agent-based Reinforcement Learning.
🛠️ Enhanced RLHF Features: Upgrading our RL training framework with features like asynchronous rollout, vLLM sleep mode, and checkpoint-engine.
Stay tuned for more updates!
[2025.11.11] 🎉🎉🎉 We now support the alignment fine-tuning of Qwen3 and Qwen3-MoE models!
[2025.11.11] 🎉🎉🎉 We integrate the InterMT project (NeurIPS 2025 Spotlight) into the main repository, featuring the first multi-turn interleaved preference alignment dataset with human feedback and InterMT-Bench for evaluating multi-turn multimodal interaction capabilities. Check out InterMT for more details.
[2025.11.11] 🛠️🛠️🛠️ We integrate the eval-anything evaluation framework into the main repository as a dedicated project for large-scale evaluation of any-to-any models. Check out eval-anything for more details.
[2025.04.14] 📜📜📜 We release the tutorial on SFT training for text-image-to-text models. Check out the cookbook_en (for English) and cookbook_zh (for Chinese).
[2025.04.07] 🥳🥳🥳 Align-Anything now serves as the homework platform for the PKU course Large Language Models Basics and Alignment, supporting on both Nvidia GPU and Huawei Ascend NPU. The corresponding tutorial will be released soon!
Align-Anything目前已成为北京大学本硕博课程《大模型基础与对齐》的课程作业平台,支持在Nvidia GPU和华为昇腾NPU上进行训练与评估。对应教程将持续发布!
[2025.03.31] ✅✅✅ We enhance the installation process for both Nvidia GPU and Huawei Ascend NPU. Please refer to the Quick Start for details.
[2025.03.31] 🚀🚀🚀 We support wrapping the actor model with vLLM engine for sequence generation in text-to-text ppo training. It greatly accelerates the ppo training process. Our results show that with vLLM engine, it only takes 22 minutes to finish ppo, while the baseline case needs ~150 minutes.
😊 Our implementation is encouraged by OpenRLHF, which is a great project for RLHF training.
[2025.03.27] 📜📜📜 We release the tutorial on DPO training for text-to-text models. Check out the cookbook_en (for English) and cookbook_zh (for Chinese).
[2025.03.15] 📜📜📜 We release the tutorial for extending modality from text-to-text to text-image-to-text models. Check out the cookbook_en (for English) and cookbook_zh (for Chinese).
We will release other tutorials in the future. Stay tuned! 😊
[2025.03.15] We have supported seamless migration to Slurm clusters! Check out our example here to get started.
[2025.03.14] 🛠️🛠️🛠️ We have supported Safe RLHF-V for Text + Image -> Text modality models.
[2025.03.12] 🛠️🛠️🛠️ We have supported resume training for DPO and SFT, see here.
[2025.03.11] 🎉🎉🎉 We support the installation of Huawei Ascend dependencies through pre-set Docker image.
[2025.03.02] 🎉🎉🎉 We have implemented alignment training for Vision-Language-Action Models in embodied intelligence, see VLA Trainer, with more features coming soon!
[2025.02.28] 🤝🤝🤝 We supported the training and inference of align-anything on Huawei Ascend NPU.
近期 align-anything 团队正在和华为昇腾团队积极联合开发,基于 VLLMs-Ascend 上的全模态推理和对齐微调。
More News
./scripts and ./projects/janus directories.Any -> Any modality models Emu3.Text + Video -> Text modality models.Text + Audio -> Text modality models.Text+Image -> Text+Image modality models.models_pk script in here, which enables comparing the performance of two models across different benchmarks.Text+Image -> Text+Image modality for the SFT trainer and Chameleon models.Text -> Image, Text -> Audio, and Text -> Video modalities for the SFT trainer and DPO trainer.# clone the repository
git clone git@github.com:PKU-Alignment/align-anything.git
cd align-anything
# create virtual env
conda create -n align-anything python==3.11
conda activate align-anything
[Optional] We recommend installing CUDA in the conda environment and set the environment variable.# We tested on the H800 computing cluster, and this version of CUDA works well.
# You can adjust this version according to the actual situation of the computing cluster.
conda install nvidia/label/cuda-12.2.0::cuda
export CUDA_HOME=$CONDA_PREFIX
If your CUDA installed in a different location, such as
/usr/local/cuda/bin/nvcc, you can set the environment variables as follows:
export CUDA_HOME="/usr/local/cuda"
Finally, install align-anything by:
pip3 install -e .
pip3 install vllm==0.7.2 # to run ppo on vllm engine
You can build on Huawei Ascend NPU by simply:
pip3 install -e .[ascend]
The current test environment for Ascend is:
[Optional] Install ascend dependencies using our docker image
- Python version: 3.10.6
- CANN version: 8.0.rc3
- Architecture: aarch64
- Hardware: 8x Ascend-SNT9B ARM (192 cores, 1536GB memory)
- Ascend Driver Version: 23.0.7
- AscendHAL Version: 7.35.19
- AICPU Version: 1.0
- TDT Version: 1.0
- Log Version: 1.0
- Profiler Version: 2.0
- DVPP Kernels Version: 1.1
- TSFW Version: 1.0
- Inner Version: V100R001C15SPC012B220
- Compatible Versions: V100R001C30, V100R001C13, V100R001C15
- Compatible Firmware Versions: [7.0.0, 7.1.99]
- Package Version: 23.0.7
setup_docker.sh script located in the ./scripts directory to pull the Docker image and create a container with all necessary environments set up:cd scripts
bash setup_docker.sh
This will automatically pull the Docker image and create a Docker container where all the dependencies and configurations for running the framework are already set up.
If you encounter any issues, please refer to the FAQ for solutions.
[Optional] Other Dependencies
pip install -e .[text-to-audio]: Install the text-to-audio dependencies.pip install -e .[minicpmv]: Install the minicpmv dependencies.pip install -e .[minicpmo]: Install the minicpmo dependencies.We provide some scripts for quick start, you can find them in the ./scripts directory. These scripts would automatically download the model and dataset, and run the training or evaluation.
For example, scripts/llava/llava_dpo.sh is the script for Text + Image -> Text modality, you can run it by:
cd scripts
bash llava/llava_dpo.sh
Note: The scripts will automatically download the model and dataset from huggingface. If you are prohibited from the internet, please try to use the HF Mirror:
export HF_ENDPOINT=https://hf-mirror.com
We fully support seamless migration to Slurm. If you plan to run training on a Slurm-managed cluster, we invite you to use
$ claude mcp add align-anything \
-- python -m otcore.mcp_server <graph>