hub / github.com/ymcui/Chinese-LLaMA-Alpaca-3

github.com/ymcui/Chinese-LLaMA-Alpaca-3 @v3.0 sqlite

repository ↗ · DeepWiki ↗ · release v3.0 ↗

90 symbols 405 edges 17 files 14 documented · 16%

README

<img src="https://github.com/ymcui/Chinese-LLaMA-Alpaca-3/raw/v3.0/pics/banner.png" width="800"/>









<img alt="GitHub" src="https://img.shields.io/github/license/ymcui/Chinese-LLaMA-Alpaca-3.svg?color=blue&style=flat-square">
<img alt="GitHub release (latest by date)" src="https://img.shields.io/github/v/release/ymcui/Chinese-LLaMA-Alpaca-3">
<img alt="GitHub top language" src="https://img.shields.io/github/languages/top/ymcui/Chinese-LLaMA-Alpaca-3">
<a href="https://app.codacy.com/gh/ymcui/Chinese-LLaMA-Alpaca-3/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade"><img src="https://app.codacy.com/project/badge/Grade/142d688425494644b5b156068f55370d"/></a>

This project is developed based on Meta's newly released next-generation open-source large language model Llama-3 and is the third generation of the Chinese-LLaMA-Alpaca open-source LLM series (1st gen, 2nd gen). This project has open-sourced the Llama-3-Chinese base model and the Chinese Llama-3-Chinese-Instruct instruction-tuned large model. These models use large-scale Chinese data for continual pre-training on the original Llama-3, and are fine-tuned with selected instruction data to further enhance Chinese basic semantic and instruction understanding capabilities, significantly improving performance compared to the second-generation models.

Main Content

🚀 Open-source Llama-3-Chinese base model and Llama-3-Chinese-Instruct instruction model (v1, v2, v3)
🚀 Released pre-training scripts and instruction fine-tuning scripts, allowing users to further train or fine-tune the model as needed
🚀 Released alpaca_zh_51k, stem_zh_instruction, ruozhiba_gpt4 (4o/4T) instruction data
🚀 Provides a tutorial for quickly quantizing and deploying large models locally using a personal computer's CPU/GPU
🚀 Supports 🤗transformers, llama.cpp, text-generation-webui, vLLM, Ollama and other Llama-3 ecosystems

News

[2024/05/30] Release Llama-3-Chinese-8B-Instruct-v3, which has better performance on downstream tasks than v1/v2. For details, see: 📚Version 3.0 Release Log

[2024/05/08] Release Llama-3-Chinese-8B-Instruct-v2, which is directly tuned on Meta-Llama-3-8B-Instruct with 5M instructions. For details, see: 📚Version 2.0 Release Log

[2024/05/07] Add pre-training and SFT scripts. For details, see: 📚Version 1.1 Release Log

[2024/04/30] Released the Llama-3-Chinese-8B base model and Llama-3-Chinese-8B-Instruct instruction model. For details, see: 📚Version 1.0 Release Log

[2024/04/19] 🚀 Officially launched the Chinese-LLaMA-Alpaca-3 project

Content Guide

Section	Description
💁🏻‍♂️Model Introduction	Briefly introduces the technical features of the models related to this project
⏬Model Download	Download addresses for the Chinese Llama-3 large models
💻Inference and Deployment	Describes how to quantize the model and deploy it using a personal computer to experience the large model
💯Model Performance	Introduces the effects of the model on some tasks
📝Training and Fine-Tuning	Introduces how to train and fine-tune the Chinese Llama-3 large models
❓Frequently Asked Questions	Replies to some common questions

Model Introduction

This project has launched the Chinese open-source large models Llama-3-Chinese and Llama-3-Chinese-Instruct based on Meta Llama-3. The main features are as follows:

📖 Uses the Original Llama-3 Vocabulary

Llama-3 has significantly expanded its vocabulary from 32K to 128K and switched to a BPE vocabulary.
Preliminary experiments have shown that the encoding efficiency of the Llama-3 vocabulary is comparable to our expanded vocabulary in Chinese LLaMA-2, with an efficiency of about 95% based on encoding efficiency tests on Wikipedia data.
Based on our experience and experimental conclusions with Chinese Mixtral [^1], we did not expand the vocabulary further.

[^1]: Cui and Yao, 2024. Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral

🚄 Extended Context Length from 4K in the Second Generation to 8K

Llama-3 has increased the native context window length from 4K to 8K, allowing for further processing of longer context information.
Users can also use methods like PI, NTK, and YaRN to extend the model's long context capabilities to support longer text processing.

⚡ Uses Grouped Query Attention Mechanism

Llama-3 adopts the Grouped Query Attention (GQA) mechanism used in the large parameter version of Llama-2, which further enhances the model's efficiency.

🗒 New Instruction Template

Llama-3-Instruct uses a new instruction template, which is not compatible with Llama-2-chat; it should be used strictly following the official instruction template. (See instruction template)

Model Download

Model Selection Guide

Here's a comparison of the models in this project and recommended usage scenarios. For chat interactions, please choose the Instruct version.

Comparison Item	Llama-3-Chinese-8B	Llama-3-Chinese-8B-Instruct
Model Type	Base Model	Instruction/Chat Model (similar to ChatGPT)
Model Size	8B	8B
Training Type	Causal-LM (CLM)	Instruction Fine-Tuning
Training Method	LoRA + Full emb/lm-head	LoRA + Full emb/lm-head
Initial Model	Meta-Llama-3-8B	v1: Llama-3-Chinese-8B

v2: Meta-Llama-3-8B-Instruct

Here is a comparison between different versions of Instruct. Unless there is a clear preference, please prioritize using the Instruct-v3 version.

Comparison Item	Instruct-v1	Instruct-v2	Instruct-v3
Release Date	2024/4/30	2024/5/8	2024/5/30
Base Model	Original Meta-Llama-3-8B	Original Meta-Llama-3-8B-Instruct	(See Training Method)
Training Method	First Stage: Pre-training with 120G Chinese Corpus

Second Stage: Fine-tuning with 5 million instruction data | Direct fine-tuning with 5 million instruction data | Model merging using inst-v1, inst-v2, and inst-meta, followed by fine-tuning with a small amount of instruction data | | Chinese Proficiency | 49.3 / 51.5 | 51.6 / 51.6 | 55.2 / 54.8 👍🏻 | | English Proficiency | 63.21 | 66.68 | 66.81 👍🏻 | | Long Text Capability | 29.6 | 46.4 👍🏻 | 40.5 | | LLM Arena Win Rate / Elo | 49.4% / 1430 | 66.1% / 1559 | 83.6% / 1627 👍🏻 |

[!NOTE] Chinese proficiency results are from C-Eval (valid); English proficiency results are from Open LLM Leaderboard (avg); long text capability results are from LongBench (avg). For detailed performance, please refer to the 💯 Model Performance section.

Download Links

Model Name	Full Version	LoRA Version	GGUF Version
Llama-3-Chinese-8B-Instruct-v3

(chat model) | [🤗Hugging Face]

[🤖ModelScope]

[wisemodel] | N/A | [🤗Hugging Face]

[🤖ModelScope] | | Llama-3-Chinese-8B-Instruct-v2

(chat model) | [🤗Hugging Face]

[🤖ModelScope]

[wisemodel] | [🤗Hugging Face]

[🤖ModelScope]

[wisemodel] | [🤗Hugging Face]

[🤖ModelScope] | | Llama-3-Chinese-8B-Instruct

(chat model) | [🤗Hugging Face]

[🤖ModelScope]

[[wisemodel]](https://wisemodel.cn/models/ChineseAlpacaGroup/llama-3-chine

Core symbols most depended-on inside this repo

generate_prompt

called by 3

scripts/oai_api_demo/openai_api_server.py

generate_prompt

called by 3

scripts/inference/inference_hf.py

fill_llama3_prompt_template

called by 3

scripts/longbench/pred.py

translate_state_dict_key

called by 2

scripts/merge_llama3_with_chinese_lora_low_mem.py

unpermute

called by 2

scripts/merge_llama3_with_chinese_lora_low_mem.py

generate_completion_prompt

called by 2

scripts/oai_api_demo/openai_api_server.py

generate_chat_prompt

called by 2

scripts/oai_api_demo/openai_api_server.py

predict

called by 2

scripts/oai_api_demo/openai_api_server.py

Shape

Function 55

Class 19

Method 13

Route 3

Languages

Python100%

Modules by API surface

scripts/longbench/metrics.py16 symbols

scripts/training/run_clm_pt_with_peft.py12 symbols

scripts/oai_api_demo/openai_api_server.py12 symbols

scripts/oai_api_demo/openai_api_protocol.py11 symbols

scripts/merge_llama3_with_chinese_lora_low_mem.py6 symbols

scripts/cmmlu/llama_evaluator.py6 symbols

scripts/ceval/llama_evaluator.py6 symbols

scripts/training/run_clm_sft_with_peft.py5 symbols

scripts/mmlu/eval.py5 symbols

scripts/longbench/pred.py3 symbols

scripts/longbench/eval.py3 symbols

scripts/training/build_dataset.py2 symbols

Dependencies from manifests, versioned

peft0.7.1 · 1×

torch2.0.1 · 1×

transformers4.40.0 · 1×

For agents

$ claude mcp add Chinese-LLaMA-Alpaca-3 \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact