MCPcopy
hub / github.com/ymcui/Chinese-LLaMA-Alpaca-3

github.com/ymcui/Chinese-LLaMA-Alpaca-3 @v3.0 sqlite

repository ↗ · DeepWiki ↗ · release v3.0 ↗
90 symbols 405 edges 17 files 14 documented · 16%
README

🇨🇳Chinese | 🌐English | 📖Documentation | ❓Issues | 💬Discussions | ⚔️Arena

<img src="https://github.com/ymcui/Chinese-LLaMA-Alpaca-3/raw/v3.0/pics/banner.png" width="800"/>









<img alt="GitHub" src="https://img.shields.io/github/license/ymcui/Chinese-LLaMA-Alpaca-3.svg?color=blue&style=flat-square">
<img alt="GitHub release (latest by date)" src="https://img.shields.io/github/v/release/ymcui/Chinese-LLaMA-Alpaca-3">
<img alt="GitHub top language" src="https://img.shields.io/github/languages/top/ymcui/Chinese-LLaMA-Alpaca-3">
<a href="https://app.codacy.com/gh/ymcui/Chinese-LLaMA-Alpaca-3/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade"><img src="https://app.codacy.com/project/badge/Grade/142d688425494644b5b156068f55370d"/></a>

This project is developed based on Meta's newly released next-generation open-source large language model Llama-3 and is the third generation of the Chinese-LLaMA-Alpaca open-source LLM series (1st gen, 2nd gen). This project has open-sourced the Llama-3-Chinese base model and the Chinese Llama-3-Chinese-Instruct instruction-tuned large model. These models use large-scale Chinese data for continual pre-training on the original Llama-3, and are fine-tuned with selected instruction data to further enhance Chinese basic semantic and instruction understanding capabilities, significantly improving performance compared to the second-generation models.

Main Content

  • 🚀 Open-source Llama-3-Chinese base model and Llama-3-Chinese-Instruct instruction model (v1, v2, v3)
  • 🚀 Released pre-training scripts and instruction fine-tuning scripts, allowing users to further train or fine-tune the model as needed
  • 🚀 Released alpaca_zh_51k, stem_zh_instruction, ruozhiba_gpt4 (4o/4T) instruction data
  • 🚀 Provides a tutorial for quickly quantizing and deploying large models locally using a personal computer's CPU/GPU
  • 🚀 Supports 🤗transformers, llama.cpp, text-generation-webui, vLLM, Ollama and other Llama-3 ecosystems

Chinese Mixtral | Chinese LLaMA-2 & Alpaca-2 Large Models | Chinese LLaMA & Alpaca Large Models | Multimodal Chinese LLaMA & Alpaca Large Models | Multimodal VLE | Chinese MiniRBT | Chinese LERT | Chinese-English PERT | Chinese MacBERT | Chinese ELECTRA | Chinese XLNet | Chinese BERT | Knowledge Distillation Tool TextBrewer | Model Pruning Tool TextPruner | Distillation and Pruning Integrated GRAIN

News

[2024/05/30] Release Llama-3-Chinese-8B-Instruct-v3, which has better performance on downstream tasks than v1/v2. For details, see: 📚Version 3.0 Release Log

[2024/05/08] Release Llama-3-Chinese-8B-Instruct-v2, which is directly tuned on Meta-Llama-3-8B-Instruct with 5M instructions. For details, see: 📚Version 2.0 Release Log

[2024/05/07] Add pre-training and SFT scripts. For details, see: 📚Version 1.1 Release Log

[2024/04/30] Released the Llama-3-Chinese-8B base model and Llama-3-Chinese-8B-Instruct instruction model. For details, see: 📚Version 1.0 Release Log

[2024/04/19] 🚀 Officially launched the Chinese-LLaMA-Alpaca-3 project

Content Guide

Section Description
💁🏻‍♂️Model Introduction Briefly introduces the technical features of the models related to this project
⏬Model Download Download addresses for the Chinese Llama-3 large models
💻Inference and Deployment Describes how to quantize the model and deploy it using a personal computer to experience the large model
💯Model Performance Introduces the effects of the model on some tasks
📝Training and Fine-Tuning Introduces how to train and fine-tune the Chinese Llama-3 large models
❓Frequently Asked Questions Replies to some common questions

Model Introduction

This project has launched the Chinese open-source large models Llama-3-Chinese and Llama-3-Chinese-Instruct based on Meta Llama-3. The main features are as follows:

📖 Uses the Original Llama-3 Vocabulary

  • Llama-3 has significantly expanded its vocabulary from 32K to 128K and switched to a BPE vocabulary.
  • Preliminary experiments have shown that the encoding efficiency of the Llama-3 vocabulary is comparable to our expanded vocabulary in Chinese LLaMA-2, with an efficiency of about 95% based on encoding efficiency tests on Wikipedia data.
  • Based on our experience and experimental conclusions with Chinese Mixtral [^1], we did not expand the vocabulary further.

[^1]: Cui and Yao, 2024. Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral

🚄 Extended Context Length from 4K in the Second Generation to 8K

  • Llama-3 has increased the native context window length from 4K to 8K, allowing for further processing of longer context information.
  • Users can also use methods like PI, NTK, and YaRN to extend the model's long context capabilities to support longer text processing.

⚡ Uses Grouped Query Attention Mechanism

  • Llama-3 adopts the Grouped Query Attention (GQA) mechanism used in the large parameter version of Llama-2, which further enhances the model's efficiency.

🗒 New Instruction Template

  • Llama-3-Instruct uses a new instruction template, which is not compatible with Llama-2-chat; it should be used strictly following the official instruction template. (See instruction template)

Model Download

Model Selection Guide

Here's a comparison of the models in this project and recommended usage scenarios. For chat interactions, please choose the Instruct version.

Comparison Item Llama-3-Chinese-8B Llama-3-Chinese-8B-Instruct
Model Type Base Model Instruction/Chat Model (similar to ChatGPT)
Model Size 8B 8B
Training Type Causal-LM (CLM) Instruction Fine-Tuning
Training Method LoRA + Full emb/lm-head LoRA + Full emb/lm-head
Initial Model Meta-Llama-3-8B v1: Llama-3-Chinese-8B

v2: Meta-Llama-3-8B-Instruct

v3: mix of inst/inst-v2/inst-meta | | Training Corpus | Unlabeled general corpus (approx. 120GB) | Labeled instruction data (approx. 5 million entries) | | Vocabulary Size | Original vocabulary (128,256) | Original vocabulary (128,256) | | Supported Context Length | 8K | 8K | | Input Template | Not required | Requires Llama-3-Instruct template | | Applicable Scenarios | Text continuation: Given a context, let the model generate the following text | Instruction understanding: Q&A, writing, chatting, interaction, etc. |

Here is a comparison between different versions of Instruct. Unless there is a clear preference, please prioritize using the Instruct-v3 version.

Comparison Item Instruct-v1 Instruct-v2 Instruct-v3
Release Date 2024/4/30 2024/5/8 2024/5/30
Base Model Original Meta-Llama-3-8B Original Meta-Llama-3-8B-Instruct (See Training Method)
Training Method First Stage: Pre-training with 120G Chinese Corpus

Second Stage: Fine-tuning with 5 million instruction data | Direct fine-tuning with 5 million instruction data | Model merging using inst-v1, inst-v2, and inst-meta, followed by fine-tuning with a small amount of instruction data | | Chinese Proficiency | 49.3 / 51.5 | 51.6 / 51.6 | 55.2 / 54.8 👍🏻 | | English Proficiency | 63.21 | 66.68 | 66.81 👍🏻 | | Long Text Capability | 29.6 | 46.4 👍🏻 | 40.5 | | LLM Arena Win Rate / Elo | 49.4% / 1430 | 66.1% / 1559 | 83.6% / 1627 👍🏻 |

[!NOTE] Chinese proficiency results are from C-Eval (valid); English proficiency results are from Open LLM Leaderboard (avg); long text capability results are from LongBench (avg). For detailed performance, please refer to the 💯 Model Performance section.

Download Links

Model Name Full Version LoRA Version GGUF Version
Llama-3-Chinese-8B-Instruct-v3

(chat model) | [🤗Hugging Face]

[🤖ModelScope]

[wisemodel] | N/A | [🤗Hugging Face]

[🤖ModelScope] | | Llama-3-Chinese-8B-Instruct-v2

(chat model) | [🤗Hugging Face]

[🤖ModelScope]

[wisemodel] | [🤗Hugging Face]

[🤖ModelScope]

[wisemodel] | [🤗Hugging Face]

[🤖ModelScope] | | Llama-3-Chinese-8B-Instruct

(chat model) | [🤗Hugging Face]

[🤖ModelScope]

[[wisemodel]](https://wisemodel.cn/models/ChineseAlpacaGroup/llama-3-chine

Core symbols most depended-on inside this repo

generate_prompt
called by 3
scripts/oai_api_demo/openai_api_server.py
generate_prompt
called by 3
scripts/inference/inference_hf.py
fill_llama3_prompt_template
called by 3
scripts/longbench/pred.py
translate_state_dict_key
called by 2
scripts/merge_llama3_with_chinese_lora_low_mem.py
unpermute
called by 2
scripts/merge_llama3_with_chinese_lora_low_mem.py
generate_completion_prompt
called by 2
scripts/oai_api_demo/openai_api_server.py
generate_chat_prompt
called by 2
scripts/oai_api_demo/openai_api_server.py
predict
called by 2
scripts/oai_api_demo/openai_api_server.py

Shape

Function 55
Class 19
Method 13
Route 3

Languages

Python100%

Modules by API surface

scripts/longbench/metrics.py16 symbols
scripts/training/run_clm_pt_with_peft.py12 symbols
scripts/oai_api_demo/openai_api_server.py12 symbols
scripts/oai_api_demo/openai_api_protocol.py11 symbols
scripts/merge_llama3_with_chinese_lora_low_mem.py6 symbols
scripts/cmmlu/llama_evaluator.py6 symbols
scripts/ceval/llama_evaluator.py6 symbols
scripts/training/run_clm_sft_with_peft.py5 symbols
scripts/mmlu/eval.py5 symbols
scripts/longbench/pred.py3 symbols
scripts/longbench/eval.py3 symbols
scripts/training/build_dataset.py2 symbols

Dependencies from manifests, versioned

peft0.7.1 · 1×
torch2.0.1 · 1×
transformers4.40.0 · 1×

For agents

$ claude mcp add Chinese-LLaMA-Alpaca-3 \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact