🇨🇳Chinese | 🌐English | 📖Documentation | ❓Issues | 💬Discussions | ⚔️Arena
<img src="https://github.com/ymcui/Chinese-LLaMA-Alpaca-3/raw/v3.0/pics/banner.png" width="800"/>
<img alt="GitHub" src="https://img.shields.io/github/license/ymcui/Chinese-LLaMA-Alpaca-3.svg?color=blue&style=flat-square">
<img alt="GitHub release (latest by date)" src="https://img.shields.io/github/v/release/ymcui/Chinese-LLaMA-Alpaca-3">
<img alt="GitHub top language" src="https://img.shields.io/github/languages/top/ymcui/Chinese-LLaMA-Alpaca-3">
<a href="https://app.codacy.com/gh/ymcui/Chinese-LLaMA-Alpaca-3/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade"><img src="https://app.codacy.com/project/badge/Grade/142d688425494644b5b156068f55370d"/></a>
This project is developed based on Meta's newly released next-generation open-source large language model Llama-3 and is the third generation of the Chinese-LLaMA-Alpaca open-source LLM series (1st gen, 2nd gen). This project has open-sourced the Llama-3-Chinese base model and the Chinese Llama-3-Chinese-Instruct instruction-tuned large model. These models use large-scale Chinese data for continual pre-training on the original Llama-3, and are fine-tuned with selected instruction data to further enhance Chinese basic semantic and instruction understanding capabilities, significantly improving performance compared to the second-generation models.
Chinese Mixtral | Chinese LLaMA-2 & Alpaca-2 Large Models | Chinese LLaMA & Alpaca Large Models | Multimodal Chinese LLaMA & Alpaca Large Models | Multimodal VLE | Chinese MiniRBT | Chinese LERT | Chinese-English PERT | Chinese MacBERT | Chinese ELECTRA | Chinese XLNet | Chinese BERT | Knowledge Distillation Tool TextBrewer | Model Pruning Tool TextPruner | Distillation and Pruning Integrated GRAIN
[2024/05/30] Release Llama-3-Chinese-8B-Instruct-v3, which has better performance on downstream tasks than v1/v2. For details, see: 📚Version 3.0 Release Log
[2024/05/08] Release Llama-3-Chinese-8B-Instruct-v2, which is directly tuned on Meta-Llama-3-8B-Instruct with 5M instructions. For details, see: 📚Version 2.0 Release Log
[2024/05/07] Add pre-training and SFT scripts. For details, see: 📚Version 1.1 Release Log
[2024/04/30] Released the Llama-3-Chinese-8B base model and Llama-3-Chinese-8B-Instruct instruction model. For details, see: 📚Version 1.0 Release Log
[2024/04/19] 🚀 Officially launched the Chinese-LLaMA-Alpaca-3 project
| Section | Description |
|---|---|
| 💁🏻♂️Model Introduction | Briefly introduces the technical features of the models related to this project |
| ⏬Model Download | Download addresses for the Chinese Llama-3 large models |
| 💻Inference and Deployment | Describes how to quantize the model and deploy it using a personal computer to experience the large model |
| 💯Model Performance | Introduces the effects of the model on some tasks |
| 📝Training and Fine-Tuning | Introduces how to train and fine-tune the Chinese Llama-3 large models |
| ❓Frequently Asked Questions | Replies to some common questions |
This project has launched the Chinese open-source large models Llama-3-Chinese and Llama-3-Chinese-Instruct based on Meta Llama-3. The main features are as follows:
[^1]: Cui and Yao, 2024. Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral
Here's a comparison of the models in this project and recommended usage scenarios. For chat interactions, please choose the Instruct version.
| Comparison Item | Llama-3-Chinese-8B | Llama-3-Chinese-8B-Instruct |
|---|---|---|
| Model Type | Base Model | Instruction/Chat Model (similar to ChatGPT) |
| Model Size | 8B | 8B |
| Training Type | Causal-LM (CLM) | Instruction Fine-Tuning |
| Training Method | LoRA + Full emb/lm-head | LoRA + Full emb/lm-head |
| Initial Model | Meta-Llama-3-8B | v1: Llama-3-Chinese-8B |
v3: mix of inst/inst-v2/inst-meta | | Training Corpus | Unlabeled general corpus (approx. 120GB) | Labeled instruction data (approx. 5 million entries) | | Vocabulary Size | Original vocabulary (128,256) | Original vocabulary (128,256) | | Supported Context Length | 8K | 8K | | Input Template | Not required | Requires Llama-3-Instruct template | | Applicable Scenarios | Text continuation: Given a context, let the model generate the following text | Instruction understanding: Q&A, writing, chatting, interaction, etc. |
Here is a comparison between different versions of Instruct. Unless there is a clear preference, please prioritize using the Instruct-v3 version.
| Comparison Item | Instruct-v1 | Instruct-v2 | Instruct-v3 |
|---|---|---|---|
| Release Date | 2024/4/30 | 2024/5/8 | 2024/5/30 |
| Base Model | Original Meta-Llama-3-8B | Original Meta-Llama-3-8B-Instruct | (See Training Method) |
| Training Method | First Stage: Pre-training with 120G Chinese Corpus |
Second Stage: Fine-tuning with 5 million instruction data | Direct fine-tuning with 5 million instruction data | Model merging using inst-v1, inst-v2, and inst-meta, followed by fine-tuning with a small amount of instruction data | | Chinese Proficiency | 49.3 / 51.5 | 51.6 / 51.6 | 55.2 / 54.8 👍🏻 | | English Proficiency | 63.21 | 66.68 | 66.81 👍🏻 | | Long Text Capability | 29.6 | 46.4 👍🏻 | 40.5 | | LLM Arena Win Rate / Elo | 49.4% / 1430 | 66.1% / 1559 | 83.6% / 1627 👍🏻 |
[!NOTE] Chinese proficiency results are from C-Eval (valid); English proficiency results are from Open LLM Leaderboard (avg); long text capability results are from LongBench (avg). For detailed performance, please refer to the 💯 Model Performance section.
| Model Name | Full Version | LoRA Version | GGUF Version |
|---|---|---|---|
| Llama-3-Chinese-8B-Instruct-v3 |
(chat model) | [🤗Hugging Face]
[wisemodel] | N/A | [🤗Hugging Face]
[🤖ModelScope] | | Llama-3-Chinese-8B-Instruct-v2
(chat model) | [🤗Hugging Face]
[🤖ModelScope] | | Llama-3-Chinese-8B-Instruct
(chat model) | [🤗Hugging Face]
[[wisemodel]](https://wisemodel.cn/models/ChineseAlpacaGroup/llama-3-chine
$ claude mcp add Chinese-LLaMA-Alpaca-3 \
-- python -m otcore.mcp_server <graph>