🇨🇳中文 | 🌐English | 📖文档/Docs | ❓提问/Issues | 💬讨论/Discussions | ⚔️竞技场/Arena
<img src="https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/raw/v4.1/pics/banner.png" width="800"/>
<img alt="GitHub" src="https://img.shields.io/github/license/ymcui/Chinese-LLaMA-Alpaca-2.svg?color=blue&style=flat-square">
<img alt="GitHub release (latest by date)" src="https://img.shields.io/github/v/release/ymcui/Chinese-LLaMA-Alpaca-2">
<img alt="GitHub top language" src="https://img.shields.io/github/languages/top/ymcui/Chinese-LLaMA-Alpaca-2">
<a href="https://app.codacy.com/gh/ymcui/Chinese-LLaMA-Alpaca-2/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade"><img src="https://app.codacy.com/project/badge/Grade/1710faac5e634acaabfc26b0a778cdde"/></a>
This project is based on the Llama-2, released by Meta, and it is the second generation of the Chinese LLaMA & Alpaca LLM project. We open-source Chinese LLaMA-2 (foundation model) and Alpaca-2 (instruction-following model). These models have been expanded and optimized with Chinese vocabulary beyond the original Llama-2. We used large-scale Chinese data for incremental pre-training, which further improved the fundamental semantic understanding of the Chinese language, resulting in a significant performance improvement compared to the first-generation models. Standard version supports 4K context, and long context version supports 16K and 64K context. The RLHF models are fine-tuned for human preference alignment and have gained significant performance improvements in the representation of correct values compared to the standard version of the model.

Chinese LLaMA&Alpaca LLMs| Visual Chinese-LLaMA-Alpaca | Multi-modal VLE | Chinese MiniRBT | Chinese LERT | Chinese-English PERT | Chinese MacBERT | Chinese ELECTRA | Chinese XLNet | Chinese BERT | Knowledge distillation tool TextBrewer | Model pruning tool TextPruner
[Jan 23, 2024] Add new GGUF models (with imatrix), AWQ models, support YaRN under vLLM. For details, see📚 v4.1 release note
[Dec 29, 2023] Release long context models: Chiense-LLaMA-2-7B-64K and Chinese-Alpaca-2-7B-64K. We also release RLHF-tuned Chinese-Alpaca-2-RLHF (1.3B/7B). For details, see 📚 v4.0 release note
[Sep 01, 2023] Release long context models: Chinese-Alpaca-2-7B-16K and Chinese-Alpaca-2-13B-16K, which can be directly used in downstream tasks, such as privateGPT. For details, see 📚 v3.1 release note
[Aug 25, 2023] Release long context models: Chinese-LLaMA-2-7B-16K and Chinese-LLaMA-2-13B-16K, which support 16K context and can be further extended up to 24K+ using NTK. For details, see 📚 v3.0 release note
[Aug 14, 2023] Release Chinese-LLaMA-2-13B and Chinese-Alpaca-2-13B. Add text-generation-webui/LangChain/privateGPT support. Add CFG sampling, etc. For details, see 📚 v2.0 release note
[Aug 02, 2023] Add FlashAttention-2 training support, vLLM-based inference acceleration support, a new system prompt that generates longer response, etc. For details, see 📚 v1.1 release note
[July 31, 2023] Release Chinese-LLaMA-2-7B (base model), trained with 120GB Chinese data. It was further fine-tuned using 5M instruction data, resulting in the Chinese-Alpaca-2-7B (instruction/chat model). For details, see 📚 v1.0 release notes
[July 19, 2023] 🚀Launched the Chinese LLaMA-2 and Alpaca-2 open-source LLM project
| Section | Description |
|---|---|
| 💁🏻♂️Introduction | Briefly introduces the technical features of the models in this project |
| ⏬Download | Download links for Chinese LLaMA-2 and Alpaca-2 |
| 💻Inference and Deployment | Introduces how to quantify models and deploy and experience large models using a personal computer |
| 💯System Performance | Experimental results on several tasks |
| 📝Training and Fine-tuning | Introduces how to perform further training and fine-tuning on Chinese LLaMA-2 and Alpaca-2 |
| ❓Frequently Asked Questions | Responses to some common questions |
This project launches the Chinese LLaMA-2 and Alpaca-2 models based on Llama-2. Compared to the first generation of the project, the main features include:
📖 Optimized Chinese Vocabulary
⚡ Efficient FlashAttention-2
🚄 Adaptive Context Extension based on PI and YaRN
🤖 Simplified Bilingual System Prompt
The following figure depicts all open-sourced models for our projects (including the first-gen project).

Below is a basic comparison between the Chinese LLaMA-2 and Alpaca-2 models, as well as recommended use cases. Use Alpaca for ChatGPT-like interaction.
| Comparison | Chinese LLaMA-2 | Chinese Alpaca-2 |
|---|---|---|
| Model Type | Base Model | Instruction/Chat Model (like ChatGPT) |
| Released Sizes | 1.3B, 7B, 13B | 1.3B, 7B, 13B |
| Training Method | Causal-LM (CLM) | Instruction fine-tuning |
| Training Parts | 7B, 13B: LoRA + emb/lm-head |
1.3B: full params | 7B, 13B: LoRA + emb/lm-head
1.3B: full params | | Trained on | Original Llama-2 (non-chat) | Chinese LLaMA-2 | | Training Corpus | Unlabeled general corpus (120G raw text) | Labeled instruction data (5M samples) | | Vocabulary Size[1] | 55,296 | 55,296 | | Context Size[2] | Standard: 4K (12K-18K)
Long ctx(PI): 16K (24K-32K)
Long ctx(YaRN): 64K | Standard: 4K (12K-18K)
Long ctx(PI): 16K (24K-32K)
Long ctx(YaRN): 64K | | Input Template | Not required | Requires specific templates[3] | | Suitable Scenarios | Text continuation: Given the context, the model generates the following text | Instruction understanding: Q&A, writing, chatting, interaction, etc. | | Unsuitable Scenarios | Instruction understanding, multi-turn chat, etc. | Unrestricted text generation | | Preference Alignment | No | RLHF version (1.3B, 7B) |
[!NOTE] [1] The vocabulary of the first and second generation models in this project are different, do not mix them. The vocabularies of the second generation LLaMA and Alpaca are the same.
[2] Extended context size with NTK method is depicted in brackets.
[3] Alpaca-2 uses the Llama-2-chat series templates (different prompts), not the templates of the first-generation Alpaca, do not mix them.
[4] 1.3B models are not intended for standalone use; instead, use it together with larger models (7B, 13B) through speculative sampling.
Below are the full models, which can be used directly afterwards, without additional merging steps. Recommended for users with sufficient network bandwidth.
| Model Name | Type | Size | Download Link | GGUF |
|---|---|---|---|---|
| Chinese-LLaMA-2-13B | Base model | 24.7 GB | [Baidu] [Google] [🤗HF] | [🤗HF] |
| Chinese-LLaMA-2-7B | Base model | 12.9 GB | [[Baidu]](https://pan.baidu.com/s/1E5NI3nlQpx1j8z3eIzbI |
$ claude mcp add Chinese-LLaMA-Alpaca-2 \
-- python -m otcore.mcp_server <graph>