MCPcopy
hub / github.com/CrazyBoyM/llama3-Chinese-chat

github.com/CrazyBoyM/llama3-Chinese-chat @main sqlite

repository ↗ · DeepWiki ↗
48 symbols 240 edges 17 files 0 documented · 0%
README

llama3-Chinese-chat

HF Demo

1st version of Chinese-llama3

首个llama3 中文版
本仓库供交流llama3中文相关学习内容,欢迎任何热心朋友加入共建

训练 & 推理流程

训练流程图 训练与推理

看图快速学习: https://deepwiki.com/CrazyBoyM/llama3-Chinese-chat

通知

🔥新增LLM-Chinese仓库,欢迎关注,偏教程性质,以「模型中文化」为一个典型的模型训练问题切入场景,指导读者上手学习LLM二次微调训练:https://github.com/CrazyBoyM/LLM-Chinese (含gemma2 中文版模型,2b、 9b尺寸)

如果你有自己微调的版本或者在网上发现有趣的特化版本,欢迎在issue区评论收录。
如有你有想要建设的内容版块,欢迎fork提交PR成为核心作者成员。
(注意:目前不再接受仅修改单个字、句的typo-PR,请避免频繁提交该类PR)

News 更新记录

  • 2024-07-25 llama3.1 中文DPO版训练权重放出。
  • 2024-07-24 llama3.1 中文版训练计划启动。
  • 2024-05-17 🎉 整理的llama3中文化数据集合在modelscope下载量达2.9k次,连续三周处于modelscope网站首页:数据下载地址
  • 2024-05-17 💪 增加 手写API部署教程、命令调用,文档地址
  • 2024-05-13 💪 增加LMStudio电脑本地部署教程,文档教程手把手视频教程
  • 2024-05-04 五一假期间:🚀 新增语言偏好强化对齐版本(直接对英文instruct版做DPO)。保持原汁原味的口吻回复(喜欢趣味语言、emoji表情),模型下载gguf量化版下载语言偏好强化数据集工作已开源
  • 2024-04-21 晚上2点:增加训练教程、推理教程、网页部署等文档整理
  • 2023-04-20 晚上23点:instruct 中文版训练完成
  • 2024-04-20 早上7点:v2版训练完成
  • 2024-04-19 下午1点:🍺 世界上首个llama3 中文版训练完成,晚上没睡觉哈哈,使用170k+高质量多轮中文对话数据连夜训练得到。

Demo 演示示例

llama3-base-8b 中文SFT版

llama3-instruct-8b 中文DPO版

llama3.1-instruct-8b 中文DPO版

image

llama3 可用Chat版模型整理

llama3.1 - shareAI-DPO中文 8B版本 (RLHF中文) - 训练数据开源: https://huggingface.co/datasets/shareAI/DPO-zh-en-emoji - 训练细节分享:DPO(beta 0.5) + lora rank128, alpha256 + 打开"lm_head", "input_layernorm", "post_attention_layernorm", "norm"层训练. - 算力:8 * A100,5分钟,感谢opencsg社区的友情赞助支持。
- 模型下载 - OpenCSG: https://opencsg.com/models/shareAI/llama3.1-8b-instruct-dpo-zh
- 模型下载 - modelscope: https://modelscope.cn/models/shareAI/llama3.1-8b-instruct-dpo-zh
- 模型下载 - Huggingface: https://huggingface.co/shareAI/llama3.1-8b-instruct-dpo-zh - GGUF版本下载 (ollama、lmstudio可用):https://huggingface.co/shareAI/llama3.1-8b-instruct-dpo-zh/blob/main/llama3.1_8b_chinese_chat_q4_k_m-shareAI.gguf - GGUF版本国内下载 (hf-mirror 国内加速站点):https://hf-mirror.com/shareAI/llama3.1-8b-instruct-dpo-zh - ollama命令直接运行:ollama run shareai/llama3.1-dpo-zh - openCSG wukong中文 405B版本 (SFT中文) - shareAI & openCSG联合发布 - 介绍文章:https://mp.weixin.qq.com/s/7_lDZ6Zslq_WUckfuTToyQ - 模型开源:https://opencsg.com/models/OpenCSG/CSG-Wukong-Chinese-Llama3.1-405B - openbuddy - openbuddy-llama3.1-8b(SFT中文):https://modelscope.cn/models/OpenBuddy/openbuddy-llama3.1-8b-v22.1-131k

llama3相关对话版本优质权重整理:(欢迎issue补充) - shareAI系列: - base预训练 + 直接中文SFT版: - 训练数据:https://modelscope.cn/datasets/baicai003/Llama3-Chinese-dataset/summary - V1版 - OpenCSG满速下载:https://opencsg.com/models/shareAI/llama3-Chinese-chat-8b - WiseModel满速下载:https://wisemodel.cn/models/shareAI/llama3-Chinese-chat-8b - V2版 - modelscope:https://modelscope.cn/models/baicai003/Llama3-Chinese_v2/summary - 思维导图生成能力强化LoRA:https://modelscope.cn/models/shareAI/llama3-instruct-8b-cn-doc2markmap-lora - Instruct + 继续中文SFT版: - modelscope模型下载:https://modelscope.cn/models/baicai003/llama-3-8b-Instruct-chinese_v2/summary - 云服务器镜像在线体验(点击即用,免费 4 小时):https://www.suanyun.cn/console/share?uuid=b1ba51908f8a4bd1af37148765c293ee - Instruct + 强化学习中文版: - llama3 instruct DPO版 (10分钟左右可训练好,对原多语言instruct版最小化性能损伤,实测超过大多中文大量训练版) - modelscope下载:https://modelscope.cn/models/baicai003/Llama3-Chinese-instruct-DPO-beta0.5/summary - 偏好学习数据集:DPO-zh-en-emoji

  • Base预训练 + 海量中文优质数据增量预训练:正在进行中
  • 70b 中文版:计划中
  • by zhuangxialie,因对话模版设置错误,需要用fastchat体验:
    • Base + 中文SFT:https://modelscope.cn/models/zhuangxialie/Llama3_Chinese_Sft/files
    • Base + ORPO:https://modelscope.cn/models/zhuangxialie/Llama3-Chinese-ORPO/summary
    • Instruct + DPO:https://www.modelscope.cn/models/zhuangxialie/Llama3-Chinese-DPO/summary
  • llama3 Pro(加block版,推荐网友积极在该方案上做更多尝试、探索):
  • linjh1118网友(第一个ORPO偏好对齐 + 扩展2*blocks):https://github.com/linjh1118/Llama3-Chinese-ORPO
  • llama3 Moe增强版:
  • cooper12121-llama3-8x8b-MoE:https://github.com/cooper12121/llama3-8x8b-MoE
  • 长上下文版本:
  • 联通微调版v2 (中文,28k上下文):https://huggingface.co/UnicomLLM/Unichat-llama3-Chinese-8B-28K
  • 262k上下文(英文):https://huggingface.co/gradientai/Llama-3-8B-Instruct-262k
  • 262k上下文(中文):计划中
  • 无限上下文版本:计划中,参考:https://medium.com/neoxia/llm-infini-attention-with-linear-complexity-3209b87a77c3
  • 其他普通中文微调版本:
  • 中兴微调版(DPO) - 70B:https://www.modelscope.cn/models/ZTEAIM2024/Llama3_70B_instruct_chinese/summary
  • 联通微调版(SFT):https://www.modelscope.cn/models/UnicomAI/Unichat-llama3-Chinese/summary
  • Openbuddy微调版(SFT,据说不错):https://www.modelscope.cn/models/OpenBuddy/openbuddy-llama3-8b-v21.1-8k/summary
  • zhichen微调版(ORPO方法,应该是第一个orpo):https://github.com/seanzhang-zhichen/llama3-chinese
  • shenzhi-wang微调版(ORPO方法,也说是第一个orpo):https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat
  • Rookie微调版(SFT):https://github.com/Rookie1019/Llama-3-8B-Instruct-Chinese
  • hit-sz klc-lab 微调版:https://github.com/zyg18181818/Llama-3-Chinese
  • 破解安全限制系列(nsfw):
  • Unholy:https://huggingface.co/Undi95/Llama-3-Unholy-8B
  • neural-chat:https://hf-mirror.com/Locutusque/llama-3-neural-chat-v1-8b
  • dolphin:https://huggingface.co/cognitivecomputations/dolphin-2.9-llama3-8b
  • Orion: https://huggingface.co/Orion-zhen/Llama3-70B-Orion-Chinese 破限+中文, 并保留了原版llama3喜欢emoji的习惯
  • v-llama3 多模态版:(支持文字以外的输入、输出)
  • 图像问答:
    • Bunny-Llama-3-8B-V:https://wisemodel.cn/models/BAAI/Bunny-Llama-3-8B-V
    • llava-llama-3-8b:https://huggingface.co/xtuner/llava-llama-3-8b-v1_1
  • 视频理解(可支持 1 分钟内视频问答):https://github.com/THUDM/CogVLM2
  • agent工具能力增强版:
  • ModelScope Chinese Agent版V1(可根据要求帮你选择工具,中文对话):https://modelscope.cn/models/swift/Llama3-Chinese-8B-Instruct-Agent-v1/summary
  • EmoLLM 心理领域数据微调版:
  • 在线体验链接:https://st-app-center-006861-9746-jlroxvg.openxlab.space/
  • 或前往OpenXLab EmoLLM3.0-Llama3启动
  • 模型下载地址

    • OpenXLab: https://openxlab.org.cn/models/detail/chg0901/EmoLLM-Llama3-8B-Instruct3.0
    • ModelScope: https://modelscope.cn/models/chg0901/EmoLLM-Llama3-8B-Instruct3.0/summary
  • 小说、网文、故事撰写任务增强版:计划中

  • 音乐生成任务版:计划中
  • 猫娘扮演版:计划中
  • 涩涩版:计划中

注意由于只训练了常见对话,Base + SFT版有可能会出现不符合预期的回复 (尤其是对于一些非常见回答),本教程更多用于优质资源整理(包含如何对llama3进行中文微调,怎样制作中文对话数据集,角色扮演、agent能力增强,扩充上下文长度,如何进行网页部署和量化,手机、电脑cpu推理部署等),将会逐渐整理补充进来。

模型使用方式

云端服务部署

简单API方式

文档教程:https://github.com/CrazyBoyM/llama3-Chinese-chat/tree/main/deploy/API

vLLM方式 (推荐,兼容OpenAI格式)

文档教程:https://github.com/CrazyBoyM/llama3-Chinese-chat/tree/main/deploy/vLLM

本地电脑部署

LMStudio电脑本地部署方式 (有UI界面)

文档教程:https://github.com/CrazyBoyM/llama3-Chinese-chat/blob/main/deploy/LMStudio/README.md
视频教程:https://www.bilibili.com/video/BV1nt421g79T

ollama 命令行工具方式 (推荐, 简单易用)

首先,去官网下载安装ollama:https://ollama.com/
然后,打开终端命令行,执行以下命令即可开始与AI对话:

ollama run shareai/llama3.1-dpo-zh

image

Streamlit 网页推理方式 (适合训练后,调试、测试模型)

image

pip install -U streamlit transformers==4.40.1

首先通过以上命令安装streamlit,然后通过下面命令启动网页以便访问,'/path/to/model'需要改成你的权重下载路径。
V1版本:

streamlit run deploy/web_streamlit_for_v1.py /path/to/model --theme.base="dark"

Instruct版本 (支持自定义system prompt)

streamlit run deploy/web_streamlit_for_instruct.py /path/to/model --theme.base="dark"

Instruct DPO版 (支持自定义system prompt,喜欢使用有趣语言风格和表情回复)

streamlit run deploy/web_streamlit_for_instruct_v2.py /path/to/model --theme.base="dark"

Python 代码推理方式

点击展开

默认情况下直接运行以下代码即可体验llama3中文对话,请自行修改model_name_or_path为你下载的模型路径

``` from transformers import AutoTokenizer, AutoConfig, AddedToken, AutoModelForCausalLM, BitsAndBytesConfig from peft import PeftModel from dataclasses import dataclass from typing import Dict import torch import copy

定义聊天模板

@dataclass class Template: template_name:str system_format: str user_format: str assistant_format: str system: str stop_word: str

template_dict: Dict[str, Template] = dict()

def register_template(template_name, system_format, user_format, assistant_format, system, stop_word=None): template_dict[template_name] = Template( template_name=template_name, system_format=system_format, user_format=user_format, assistant_format=assistant_format, system=system, stop_word=stop_word, )

这里的系统提示词是训练时使用的,推理时可以自行尝试修改效果

register_template( template_name='llama3', system_format='<|begin_of_text|><>\n{content}\n<>\n\n', user_format='<|start_header_id|>user<|end_header_id|>\n\n{content}<|eot_id|>', assistant_format='<|start_header_id|>assistant<|end_header_id|>\n\n{content}<|end_of_text|>\n', system="You are a helpful, excellent and smart assistant. " "Please respond to the user using the language they input, ensuring the language is elegant and fluent." "If you don't know the answer to a question, please don't share false information.", stop_word='<|end_of_text|>' )

加载模型

def load_model(model_name_or_path, load_in_4bit=False, adapter_name_or_path=None): if load_in_4bit: quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", llm_int8_threshold=6.0, llm_int8_has_fp16_weight=False, ) else: quantization_config = None

# 加载base model
model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path,
    load_in_4bit=load_in_4bit,
    trust_remote_code=True,
    low_cpu_mem_usage=True,
    torch_dtype=torch.float16,
    device_map='auto',
    quantization_config=quantization_config
)

# 加载adapter
if adapter_name_or_path is not None:
    model = PeftModel.from_pretrained(model, adapter_name_or_path)

return model

加载tokenizer

def load_tokenizer(model_name_or_path): tokenizer = AutoTokenizer.from_pretrained( model_name_or_path, trust_remote_code=True, use_fast=False )

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

return tokenizer

构建prompt

def build_prompt(tokenizer, template, query, history, system=None): template_name = template.template_name system_format = template.system_format user_format = template.user_format assistant_format = template.assistant_format system = system if system is not None else template.system

history.append({"role": 'user', 'message': query})
input_ids = []

# 添加系统信息
if system_format is not None:
    if system is not None:
        system_text = system_format.format(content=system)
        input_ids = tokenizer.encode(system_text, add_special_tokens=False)
# 拼接历史对话
for item in history:
    role, message = item['role'], item['message']
    if role == 'user':
        message = user_format.format(content=message, stop_token=tokenizer.eos_token)
    else:
        message = assistant_format.format(content=message, stop_token=tokenizer.eos_token)
    tokens = tokenizer.encode(message, add_special_tokens=False)
    input_ids += tokens
input_ids = torch.tensor([input_ids], dtype=torch.long)

return input_ids

def main(): model_name_or_path = 'shareAI/llama3-Chinese-chat-8b' # 模型名称或路径,请修改这里 template_name = 'llama3' adapter_name_or_path = None

template = template_dict[template_name]
# 若开启4bit推理能够节省很多显存,但效果可能下降
load_in_4bit = False

# 生成超参配置,可修改以取得更好的效果
max_new_tokens = 500 # 每次回复时,AI生成文本的最大长度
top_p = 0.9
temperature = 0.6 # 越大越有创造性,越小越保守
repetition_penalty = 1.1 # 越大越能避免吐字重复

# 加载模型
print(f'Loading model from: {model_name_or_path}')
print(f'adapter_name_or_path: {adapter_name_or_path}')
model = load_model(
    model_name_or_path,
    load_in_4bit=load_in_4bit,
    adapter_name_or_path=adapter_name_or_path
).eval()
tokenizer = load_tokenizer(model_name_or_path if adapter_name_or_path is None else adapter_name_or_path)
if template.stop_word is None:
    template.stop_word = tokenizer.eos_token
stop_token_id = tokenizer.encode(template.stop_word, add_special_tokens=True)
assert len(stop_token_id) == 1
stop_token_id = stop_token_id[0]

history = []

query = input('# User:')
while True:
    query = query.strip()
    input_ids =

Core symbols most depended-on inside this repo

convert_jsonl
called by 1
tools/convert_firefly_data_to_sharegpt.py
convert_entry
called by 1
tools/convert_raw_data_for_firefly.py
convert_jsonl
called by 1
tools/convert_raw_data_for_firefly.py
sample_jsonl
called by 1
tools/sample_data.py
count_jsonl
called by 1
tools/count_data.py
replace_keywords_and_remove_lines
called by 1
tools/change_info.py
init_embeddings_average
called by 1
tools/expand_embedding_and_lmhead.py
draw
called by 1
tools/expand_embedding_and_lmhead.py

Shape

Function 41
Class 6
Route 1

Languages

Python100%

Modules by API surface

deploy/web_streamlit_for_v1.py7 symbols
deploy/web_streamlit_for_instruct_v2.py7 symbols
deploy/web_streamlit_for_instruct.py7 symbols
deploy/python/chat_demo.py6 symbols
deploy/streamlit/web_llama3_chat.py5 symbols
deploy/streamlit/web_gemma2_chat.py5 symbols
tools/expand_embedding_and_lmhead.py3 symbols
tools/convert_raw_data_for_firefly.py2 symbols
deploy/API/easy_server_demo.py2 symbols
tools/sample_data.py1 symbols
tools/count_data.py1 symbols
tools/convert_firefly_data_to_sharegpt.py1 symbols

For agents

$ claude mcp add llama3-Chinese-chat \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact