hub / github.com/ymcui/Chinese-LLaMA-Alpaca-2

github.com/ymcui/Chinese-LLaMA-Alpaca-2 @v4.1 sqlite

repository ↗ · DeepWiki ↗ · release v4.1 ↗

326 symbols 1,275 edges 40 files 63 documented · 19%

README

<img src="https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/raw/v4.1/pics/banner.png" width="800"/>









<img alt="GitHub" src="https://img.shields.io/github/license/ymcui/Chinese-LLaMA-Alpaca-2.svg?color=blue&style=flat-square">
<img alt="GitHub release (latest by date)" src="https://img.shields.io/github/v/release/ymcui/Chinese-LLaMA-Alpaca-2">
<img alt="GitHub top language" src="https://img.shields.io/github/languages/top/ymcui/Chinese-LLaMA-Alpaca-2">
<a href="https://app.codacy.com/gh/ymcui/Chinese-LLaMA-Alpaca-2/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade"><img src="https://app.codacy.com/project/badge/Grade/1710faac5e634acaabfc26b0a778cdde"/></a>

This project is based on the Llama-2, released by Meta, and it is the second generation of the Chinese LLaMA & Alpaca LLM project. We open-source Chinese LLaMA-2 (foundation model) and Alpaca-2 (instruction-following model). These models have been expanded and optimized with Chinese vocabulary beyond the original Llama-2. We used large-scale Chinese data for incremental pre-training, which further improved the fundamental semantic understanding of the Chinese language, resulting in a significant performance improvement compared to the first-generation models. Standard version supports 4K context, and long context version supports 16K and 64K context. The RLHF models are fine-tuned for human preference alignment and have gained significant performance improvements in the representation of correct values compared to the standard version of the model.

Main Contents

🚀 New extended Chinese vocabulary beyond Llama-2, open-sourcing the Chinese LLaMA-2 and Alpaca-2 LLMs.
🚀 Open-sourced the pre-training and instruction finetuning (SFT) scripts for further tuning on user's data
🚀 Quickly deploy and experience the quantized LLMs on CPU/GPU of personal PC
🚀 Support for LLaMA ecosystems like 🤗transformers, llama.cpp, text-generation-webui, LangChain, privateGPT, vLLM etc.

Open-sourced Models

Base model: Chinese-LLaMA-2 (1.3B, 7B, 13B)
Instruction/chat model: Chinese-Alpaca-2 (1.3B, 7B, 13B)
Long context model (16K/64K):
Chinese-LLaMA-2-16K (7B, 13B) 、Chinese-Alpaca-2-16K (7B, 13B)
Chinese-LLaMA-2-64K (7B)、Chinese-Alpaca-2-64K (7B)
RLHF model：Chinese-Alpaca-2-RLHF (1.3B, 7B)

News

[Jan 23, 2024] Add new GGUF models (with imatrix), AWQ models, support YaRN under vLLM. For details, see📚 v4.1 release note

[Dec 29, 2023] Release long context models: Chiense-LLaMA-2-7B-64K and Chinese-Alpaca-2-7B-64K. We also release RLHF-tuned Chinese-Alpaca-2-RLHF (1.3B/7B). For details, see 📚 v4.0 release note

[Sep 01, 2023] Release long context models: Chinese-Alpaca-2-7B-16K and Chinese-Alpaca-2-13B-16K, which can be directly used in downstream tasks, such as privateGPT. For details, see 📚 v3.1 release note

[Aug 25, 2023] Release long context models: Chinese-LLaMA-2-7B-16K and Chinese-LLaMA-2-13B-16K, which support 16K context and can be further extended up to 24K+ using NTK. For details, see 📚 v3.0 release note

[Aug 14, 2023] Release Chinese-LLaMA-2-13B and Chinese-Alpaca-2-13B. Add text-generation-webui/LangChain/privateGPT support. Add CFG sampling, etc. For details, see 📚 v2.0 release note

[Aug 02, 2023] Add FlashAttention-2 training support, vLLM-based inference acceleration support, a new system prompt that generates longer response, etc. For details, see 📚 v1.1 release note

[July 31, 2023] Release Chinese-LLaMA-2-7B (base model), trained with 120GB Chinese data. It was further fine-tuned using 5M instruction data, resulting in the Chinese-Alpaca-2-7B (instruction/chat model). For details, see 📚 v1.0 release notes

[July 19, 2023] 🚀Launched the Chinese LLaMA-2 and Alpaca-2 open-source LLM project

Content Guide

Section	Description
💁🏻‍♂️Introduction	Briefly introduces the technical features of the models in this project
⏬Download	Download links for Chinese LLaMA-2 and Alpaca-2
💻Inference and Deployment	Introduces how to quantify models and deploy and experience large models using a personal computer
💯System Performance	Experimental results on several tasks
📝Training and Fine-tuning	Introduces how to perform further training and fine-tuning on Chinese LLaMA-2 and Alpaca-2
❓Frequently Asked Questions	Responses to some common questions

Introduction

This project launches the Chinese LLaMA-2 and Alpaca-2 models based on Llama-2. Compared to the first generation of the project, the main features include:

📖 Optimized Chinese Vocabulary

In the first generation of the project, we expanded Chinese words and characters for the first-generation Chinese LLaMA model (LLaMA: 49953, Alpaca: 49954) to improve the model's encoding and decoding efficiency of Chinese texts.
In this project, we redesigned the new vocabulary (size: 55296) to further improve the coverage of Chinese words and characters. We also unified the LLaMA/Alpaca vocabulary to avoid problems due to mixed use.

⚡ Efficient FlashAttention-2

FlashAttention-2 is an implementation of efficient attention mechanisms, offering faster speed and optimized memory usage compared to its first-generation.
When the context length is longer, using efficient attention technology is essential to prevent explosive growth in memory usage.

🚄 Adaptive Context Extension based on PI and YaRN

In the first generation of the project, we implemented the context extension based on NTK, which can support longer contexts without further training the model.
We release long context models, using PI and NTK methods, supporting 16K context, and can be further extended up to 24K-32K
We further release long context models, using YaRN, supporting 64K context
Based on the above, we further designed a convenient adaptive empirical formula that does not require manually setting corresponding hyperparameters for different context lengths.

🤖 Simplified Bilingual System Prompt

In the first generation of the project, we use Stanford Alpaca template for our Chinese Alpaca models
Through preliminary experiments, we found that the lengthy system prompt by Llama-2-Chat is not as effective as a simple one
We use a very simple system prompt while keeping the Llama-2-Chat template to better adapt to relevant ecosystems

👮 Human Preference Alignment

In the first generation of the project, the Chinese Alpaca models completed pre-training and instruction fine-tuning, and gained basic conversational ability
Through reinforcement learning from human feedback (RLHF) experiments, we find that the ability of the model to convey correct values can be significantly improved
This project introduces the Alpaca-2-RLHF series of models, which are used in the same way as the SFT models

The following figure depicts all open-sourced models for our projects (including the first-gen project).

Download

Model Selection Guide

Below is a basic comparison between the Chinese LLaMA-2 and Alpaca-2 models, as well as recommended use cases. Use Alpaca for ChatGPT-like interaction.

Comparison	Chinese LLaMA-2	Chinese Alpaca-2
Model Type	Base Model	Instruction/Chat Model (like ChatGPT)
Released Sizes	1.3B, 7B, 13B	1.3B, 7B, 13B
Training Method	Causal-LM (CLM)	Instruction fine-tuning
Training Parts	7B, 13B: LoRA + emb/lm-head

1.3B: full params | 7B, 13B: LoRA + emb/lm-head

Long ctx(PI): 16K (24K-32K)

Long ctx(YaRN): 64K | Standard: 4K (12K-18K)

Long ctx(PI): 16K (24K-32K)

[!NOTE] [1] The vocabulary of the first and second generation models in this project are different, do not mix them. The vocabularies of the second generation LLaMA and Alpaca are the same.

[2] Extended context size with NTK method is depicted in brackets.

[3] Alpaca-2 uses the Llama-2-chat series templates (different prompts), not the templates of the first-generation Alpaca, do not mix them.

[4] 1.3B models are not intended for standalone use; instead, use it together with larger models (7B, 13B) through speculative sampling.

Full Model Download

Below are the full models, which can be used directly afterwards, without additional merging steps. Recommended for users with sufficient network bandwidth.

Model Name	Type	Size	Download Link	GGUF
Chinese-LLaMA-2-13B	Base model	24.7 GB	[Baidu] [Google] [🤗HF]	[🤗HF]
Chinese-LLaMA-2-7B	Base model	12.9 GB	[[Baidu]](https://pan.baidu.com/s/1E5NI3nlQpx1j8z3eIzbI

Core symbols most depended-on inside this repo

from_pretrained

called by 38

scripts/training/peft/peft_model.py

generate

called by 16

scripts/training/peft/peft_model.py

get_prompt

called by 13

scripts/training/peft/peft_model.py

create_error_response

called by 13

scripts/openai_server_demo/openai_api_server_vllm.py

transpose

called by 12

scripts/training/peft/utils/other.py

eval

called by 11

scripts/training/peft/tuners/lora.py

save_pretrained

called by 10

scripts/training/peft/peft_model.py

lower

called by 6

scripts/ceval/evaluator.py

Shape

Method 131

Function 116

Class 73

Route 6

Languages

Python100%

Modules by API surface

scripts/training/peft/tuners/lora.py48 symbols

scripts/training/peft/peft_model.py30 symbols

scripts/openai_server_demo/openai_api_server_vllm.py26 symbols

scripts/inference/gradio_demo.py23 symbols

scripts/training/run_clm_pt_with_peft.py18 symbols

scripts/openai_server_demo/openai_api_protocol_vllm.py18 symbols

scripts/longbench/metrics.py16 symbols

scripts/openai_server_demo/openai_api_server.py12 symbols

scripts/training/run_clm_sft_with_peft.py11 symbols

scripts/openai_server_demo/openai_api_protocol.py11 symbols

scripts/training/peft/utils/config.py10 symbols

scripts/cmmlu/evaluator.py10 symbols

Dependencies from manifests, versioned

bitsandbytes0.41.1 · 1×

peft0.3.0 · 1×

sentencepiece0.1.99 · 1×

torch2.0.1 · 1×

transformers4.35.0 · 1×

For agents

$ claude mcp add Chinese-LLaMA-Alpaca-2 \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact