首个llama3 中文版
本仓库供交流llama3中文相关学习内容,欢迎任何热心朋友加入共建
看图快速学习: https://deepwiki.com/CrazyBoyM/llama3-Chinese-chat
🔥新增LLM-Chinese仓库,欢迎关注,偏教程性质,以「模型中文化」为一个典型的模型训练问题切入场景,指导读者上手学习LLM二次微调训练:https://github.com/CrazyBoyM/LLM-Chinese (含gemma2 中文版模型,2b、 9b尺寸)
如果你有自己微调的版本或者在网上发现有趣的特化版本,欢迎在issue区评论收录。
如有你有想要建设的内容版块,欢迎fork提交PR成为核心作者成员。
(注意:目前不再接受仅修改单个字、句的typo-PR,请避免频繁提交该类PR)
llama3.1
- shareAI-DPO中文 8B版本 (RLHF中文)
- 训练数据开源: https://huggingface.co/datasets/shareAI/DPO-zh-en-emoji
- 训练细节分享:DPO(beta 0.5) + lora rank128, alpha256 + 打开"lm_head", "input_layernorm", "post_attention_layernorm", "norm"层训练.
- 算力:8 * A100,5分钟,感谢opencsg社区的友情赞助支持。
- 模型下载 - OpenCSG: https://opencsg.com/models/shareAI/llama3.1-8b-instruct-dpo-zh
- 模型下载 - modelscope: https://modelscope.cn/models/shareAI/llama3.1-8b-instruct-dpo-zh
- 模型下载 - Huggingface: https://huggingface.co/shareAI/llama3.1-8b-instruct-dpo-zh
- GGUF版本下载 (ollama、lmstudio可用):https://huggingface.co/shareAI/llama3.1-8b-instruct-dpo-zh/blob/main/llama3.1_8b_chinese_chat_q4_k_m-shareAI.gguf
- GGUF版本国内下载 (hf-mirror 国内加速站点):https://hf-mirror.com/shareAI/llama3.1-8b-instruct-dpo-zh
- ollama命令直接运行:ollama run shareai/llama3.1-dpo-zh
- openCSG wukong中文 405B版本 (SFT中文)
- shareAI & openCSG联合发布
- 介绍文章:https://mp.weixin.qq.com/s/7_lDZ6Zslq_WUckfuTToyQ
- 模型开源:https://opencsg.com/models/OpenCSG/CSG-Wukong-Chinese-Llama3.1-405B
- openbuddy
- openbuddy-llama3.1-8b(SFT中文):https://modelscope.cn/models/OpenBuddy/openbuddy-llama3.1-8b-v22.1-131k
llama3相关对话版本优质权重整理:(欢迎issue补充) - shareAI系列: - base预训练 + 直接中文SFT版: - 训练数据:https://modelscope.cn/datasets/baicai003/Llama3-Chinese-dataset/summary - V1版 - OpenCSG满速下载:https://opencsg.com/models/shareAI/llama3-Chinese-chat-8b - WiseModel满速下载:https://wisemodel.cn/models/shareAI/llama3-Chinese-chat-8b - V2版 - modelscope:https://modelscope.cn/models/baicai003/Llama3-Chinese_v2/summary - 思维导图生成能力强化LoRA:https://modelscope.cn/models/shareAI/llama3-instruct-8b-cn-doc2markmap-lora - Instruct + 继续中文SFT版: - modelscope模型下载:https://modelscope.cn/models/baicai003/llama-3-8b-Instruct-chinese_v2/summary - 云服务器镜像在线体验(点击即用,免费 4 小时):https://www.suanyun.cn/console/share?uuid=b1ba51908f8a4bd1af37148765c293ee - Instruct + 强化学习中文版: - llama3 instruct DPO版 (10分钟左右可训练好,对原多语言instruct版最小化性能损伤,实测超过大多中文大量训练版) - modelscope下载:https://modelscope.cn/models/baicai003/Llama3-Chinese-instruct-DPO-beta0.5/summary - 偏好学习数据集:DPO-zh-en-emoji
模型下载地址
小说、网文、故事撰写任务增强版:计划中
注意由于只训练了常见对话,Base + SFT版有可能会出现不符合预期的回复 (尤其是对于一些非常见回答),本教程更多用于优质资源整理(包含如何对llama3进行中文微调,怎样制作中文对话数据集,角色扮演、agent能力增强,扩充上下文长度,如何进行网页部署和量化,手机、电脑cpu推理部署等),将会逐渐整理补充进来。
文档教程:https://github.com/CrazyBoyM/llama3-Chinese-chat/tree/main/deploy/API
文档教程:https://github.com/CrazyBoyM/llama3-Chinese-chat/tree/main/deploy/vLLM
文档教程:https://github.com/CrazyBoyM/llama3-Chinese-chat/blob/main/deploy/LMStudio/README.md
视频教程:https://www.bilibili.com/video/BV1nt421g79T
首先,去官网下载安装ollama:https://ollama.com/
然后,打开终端命令行,执行以下命令即可开始与AI对话:
ollama run shareai/llama3.1-dpo-zh
pip install -U streamlit transformers==4.40.1
首先通过以上命令安装streamlit,然后通过下面命令启动网页以便访问,'/path/to/model'需要改成你的权重下载路径。
V1版本:
streamlit run deploy/web_streamlit_for_v1.py /path/to/model --theme.base="dark"
Instruct版本 (支持自定义system prompt)
streamlit run deploy/web_streamlit_for_instruct.py /path/to/model --theme.base="dark"
Instruct DPO版 (支持自定义system prompt,喜欢使用有趣语言风格和表情回复)
streamlit run deploy/web_streamlit_for_instruct_v2.py /path/to/model --theme.base="dark"
点击展开
默认情况下直接运行以下代码即可体验llama3中文对话,请自行修改model_name_or_path为你下载的模型路径
``` from transformers import AutoTokenizer, AutoConfig, AddedToken, AutoModelForCausalLM, BitsAndBytesConfig from peft import PeftModel from dataclasses import dataclass from typing import Dict import torch import copy
@dataclass class Template: template_name:str system_format: str user_format: str assistant_format: str system: str stop_word: str
template_dict: Dict[str, Template] = dict()
def register_template(template_name, system_format, user_format, assistant_format, system, stop_word=None): template_dict[template_name] = Template( template_name=template_name, system_format=system_format, user_format=user_format, assistant_format=assistant_format, system=system, stop_word=stop_word, )
register_template( template_name='llama3', system_format='<|begin_of_text|><>\n{content}\n<>\n\n', user_format='<|start_header_id|>user<|end_header_id|>\n\n{content}<|eot_id|>', assistant_format='<|start_header_id|>assistant<|end_header_id|>\n\n{content}<|end_of_text|>\n', system="You are a helpful, excellent and smart assistant. " "Please respond to the user using the language they input, ensuring the language is elegant and fluent." "If you don't know the answer to a question, please don't share false information.", stop_word='<|end_of_text|>' )
def load_model(model_name_or_path, load_in_4bit=False, adapter_name_or_path=None): if load_in_4bit: quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", llm_int8_threshold=6.0, llm_int8_has_fp16_weight=False, ) else: quantization_config = None
# 加载base model
model = AutoModelForCausalLM.from_pretrained(
model_name_or_path,
load_in_4bit=load_in_4bit,
trust_remote_code=True,
low_cpu_mem_usage=True,
torch_dtype=torch.float16,
device_map='auto',
quantization_config=quantization_config
)
# 加载adapter
if adapter_name_or_path is not None:
model = PeftModel.from_pretrained(model, adapter_name_or_path)
return model
def load_tokenizer(model_name_or_path): tokenizer = AutoTokenizer.from_pretrained( model_name_or_path, trust_remote_code=True, use_fast=False )
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
return tokenizer
def build_prompt(tokenizer, template, query, history, system=None): template_name = template.template_name system_format = template.system_format user_format = template.user_format assistant_format = template.assistant_format system = system if system is not None else template.system
history.append({"role": 'user', 'message': query})
input_ids = []
# 添加系统信息
if system_format is not None:
if system is not None:
system_text = system_format.format(content=system)
input_ids = tokenizer.encode(system_text, add_special_tokens=False)
# 拼接历史对话
for item in history:
role, message = item['role'], item['message']
if role == 'user':
message = user_format.format(content=message, stop_token=tokenizer.eos_token)
else:
message = assistant_format.format(content=message, stop_token=tokenizer.eos_token)
tokens = tokenizer.encode(message, add_special_tokens=False)
input_ids += tokens
input_ids = torch.tensor([input_ids], dtype=torch.long)
return input_ids
def main(): model_name_or_path = 'shareAI/llama3-Chinese-chat-8b' # 模型名称或路径,请修改这里 template_name = 'llama3' adapter_name_or_path = None
template = template_dict[template_name]
# 若开启4bit推理能够节省很多显存,但效果可能下降
load_in_4bit = False
# 生成超参配置,可修改以取得更好的效果
max_new_tokens = 500 # 每次回复时,AI生成文本的最大长度
top_p = 0.9
temperature = 0.6 # 越大越有创造性,越小越保守
repetition_penalty = 1.1 # 越大越能避免吐字重复
# 加载模型
print(f'Loading model from: {model_name_or_path}')
print(f'adapter_name_or_path: {adapter_name_or_path}')
model = load_model(
model_name_or_path,
load_in_4bit=load_in_4bit,
adapter_name_or_path=adapter_name_or_path
).eval()
tokenizer = load_tokenizer(model_name_or_path if adapter_name_or_path is None else adapter_name_or_path)
if template.stop_word is None:
template.stop_word = tokenizer.eos_token
stop_token_id = tokenizer.encode(template.stop_word, add_special_tokens=True)
assert len(stop_token_id) == 1
stop_token_id = stop_token_id[0]
history = []
query = input('# User:')
while True:
query = query.strip()
input_ids =
$ claude mcp add llama3-Chinese-chat \
-- python -m otcore.mcp_server <graph>