hub / github.com/zai-org/ChatGLM-6B

github.com/zai-org/ChatGLM-6B @main sqlite

114 symbols 538 edges 13 files 48 documented · 42%

README

ChatGLM-6B

🌐 Blog • 🤗 HF Repo • 🐦 Twitter • 📄 Report

👋 Join our  <a href="https://discord.gg/fK2dz4bg" target="_blank">Discord</a> and <a href="https://github.com/zai-org/ChatGLM-6B/raw/main/resources/WECHAT.md" target="_blank">WeChat</a>

📍Experience and use a larger-scale GLM business model on the Zhipu AI Open Platform

GLM-4 Open Source Model and API

We have released the latest GLM-4 model, which has made new breakthroughs in multiple indicators. You can directly experience our latest model in the following two channels.

GLM-4 open source model We have open sourced the GLM-4-9B series models, which have significantly improved the performance of various indicators. Welcome to try.
Zhipu Qingyan Experience the latest version of GLM-4, including GLMs, All tools and other functions.
API platform The new generation of API platform has been launched. You can directly experience new models such as GLM-4-0520, GLM-4-air, GLM-4-airx, GLM-4-flash, GLM-4, GLM-3-Turbo, CharacterGLM-3, CogView-3 on the API platform. Among them, the two models GLM-4 and GLM-3-Turbo support new functions such as System Prompt, Function Call, Retrieval, and Web_Search. You are welcome to experience them.
GLM-4 API open source tutorial GLM-4 API tutorial and basic applications, welcome to try. API-related questions can be asked in this open source tutorial, or use GLM-4 API AI Assistant to get help with common problems.

Introduction

ChatGLM-6B is an open bilingual language model based on General Language Model (GLM) framework, with 6.2 billion parameters. With the quantization technique, users can deploy locally on consumer-grade graphics cards (only 6GB of GPU memory is required at the INT4 quantization level). Welcome to use the larger ChatGLM model on chatglm.cn

ChatGLM-6B uses technology similar to ChatGPT, optimized for Chinese QA and dialogue. The model is trained for about 1T tokens of Chinese and English corpus, supplemented by supervised fine-tuning, feedback bootstrap, and reinforcement learning wit human feedback. With only about 6.2 billion parameters, the model is able to generate answers that are in line with human preference.

In order to facilitate downstream developers to customize the model for their own application scenarios, we also implements an parameter-efficient tuning method based on P-Tuning v2 (Guidelines). Tuning requires at least 7GB of GPU memory at INT4 quantization level.

ChatGLM-6B weights are completely open for academic research, and free commercial use is also allowed after completing the questionnaire.

Try the online demo on Huggingface Spaces.

Update

[2023/07/25] Release CodeGeeX2, which is based on ChatGLM2-6B and trained on more code data. It has the following features:

More Powerful Coding Capabilities: CodeGeeX2-6B has been further pre-trained on 600B code tokens, which has been comprehensively improved in coding capability compared to the first-generation. On the HumanEval-X benchmark, all six languages have been significantly improved (Python +57%, C++ +71%, Java +54%, JavaScript +83%, Go +56%, Rust +321\%), and in Python it reached 35.9% of Pass@1 one-time pass rate, surpassing the larger StarCoder-15B.
More Useful Features: Inheriting the ChatGLM2-6B model features, CodeGeeX2-6B better supports both Chinese and English prompts, maximum 8192 sequence length, and the inference speed is significantly improved compared to the first-generation. After quantization, it only needs 6GB of GPU memory for inference, thus supports lightweight local deployment.
Comprehensive AI Coding Assistant: The backend of CodeGeeX plugin (VS Code, Jetbrains) is upgraded, supporting 100+ programming languages, and adding practical functions such as infilling and cross-file completion. Combined with the "Ask CodeGeeX" interactive AI coding assistant, it can be used to solve various programming problems via Chinese or English dialogue, including but not limited to code summarization, code translation, debugging, and comment generation, which helps increasing the efficiency of developpers.

[2023/06/25] Release ChatGLM2-6B, the second-generation version of ChatGLM-6B. It retains the smooth conversation flow and low deployment threshold of the first-generation model, while introducing the following new features:

Stronger Performance: Based on the development experience of the first-generation ChatGLM model, we have fully upgraded the base model of ChatGLM2-6B. ChatGLM2-6B uses the hybrid objective function of GLM, and has undergone pre-training with 1.4T bilingual tokens and human preference alignment training. The evaluation results show that, compared to the first-generation model, ChatGLM2-6B has achieved substantial improvements in performance on datasets like MMLU (+23%), CEval (+33%), GSM8K (+571%), BBH (+60%), showing strong competitiveness among models of the same size.
Longer Context: Based on FlashAttention technique, we have extended the context length of the base model from 2K in ChatGLM-6B to 32K, and trained with a context length of 8K during the dialogue alignment, allowing for more rounds of dialogue. However, the current version of ChatGLM2-6B has limited understanding of single-round ultra-long documents, which we will focus on optimizing in future iterations.
More Efficient Inference: Based on Multi-Query Attention technique, ChatGLM2-6B has more efficient inference speed and lower GPU memory usage: under the official implementation, the inference speed has increased by 42% compared to the first generation; under INT4 quantization, the dialogue length supported by 6G GPU memory has increased from 1K to 8K.

Fore more information, please refer to ChatGLM2-6B.

[2023/05/17] Release VisualGLM-6B, a multimodal conversational language model supporting image understanding.

You can run the command line and web demo through cli_demo_vision.py and web_demo_vision.py in the repository. Note that VisualGLM-6B requires additional installation of SwissArmyTransformer and torchvision. For more information, please refer to VisualGLM-6B.

[2023/05/15] Update the checkpoint of v1.1 version, add English instruction data for training to balance the proportion of Chinese and English data, which solves the phenomenon of Chinese words mixed in English answers .

The following is a comparison of English questions before and after the update

Question: Describe a time when you had to make a difficult decision.
v1.0:
v1.1:
Question: Describe the function of a computer motherboard
v1.0:
v1.1:
Question: Develop a plan to reduce electricity usage in a home.
v1.0:
v1.1:
Question：未来的NFT，可能真实定义一种现实的资产，它会是一处房产，一辆汽车，一片土地等等，这样的数字凭证可能比真实的东西更有价值，你可以随时交易和使用，在虚拟和现实中无缝的让拥有的资产继续创造价值，未来会是万物归我所用，但不归我所有的时代。翻译成专业的英语
v1.0:
v1.1:

For more update info, please refer to UPDATE.md.

Projects

Open source projects that accelerate ChatGLM: * lyraChatGLM: Inference acceleration for ChatGLM-6B, up to 9000+ tokens/s inference speed. * ChatGLM-MNN: An MNN-based implementation of ChatGLM-6B C++ inference, which supports automatic allocation of computing tasks to GPU and CPU according to the size of GPU memory * JittorLLMs: Running ChatGLM-6B in FP16 with a minimum of 3G GPU memory or no GPU at all, with Linux, windows, and Mac support * InferLLM: Lightweight C++ inference, which can realize real-time chat on local x86 and Arm processors, and can also run in real time on mobile phones. It only requires 4G of running memory.

Open source projects using ChatGLM-6B: * langchain-ChatGLM: ChatGLM application based on langchain, realizing Q&A based on extensible knowledge base * Wenda: Large-scale language model call platform, based on ChatGLM-6B to achieve ChatPDF-like functions * chatgpt_academic: An academic writing and programming toolbox that supports ChatGLM-6B. It has the characteristics of modularization and multi-thread calling LLM, and can call multiple LLMs in parallel. * glm-bot: Connect ChatGLM to Koishi to call ChatGLM on major chat platforms

Example projects supporting online training of ChatGLM-6B and related applications: * ChatGLM-6B deployment and fine-tuning tutorial * ChatGLM-6B combined with langchain to implement local knowledge base QA Bot

Third-party evaluation: * Measuring Massive Multitask Chinese Understanding

For more open source projects, see PROJECT.md.

Getting Started

Hardware Requirements

Quantization Level	GPU Memory
FP16（no quantization）	13 GB
INT8	10 GB
INT4	6 GB

Environment Setup

Install the requirements with pip: pip install -r requirements.txt. transformers library version is recommended to be 4.27.1, but theoretically any version no lower than 4.23.1 is acceptable.

In addition, if you need to run the quantified model on the CPU, you also need to install gcc and openmp. Most Linux distributions are installed by default. For Windows, you can check openmp when installing TDM-GCC. On Windows testing environment, the gcc version is TDM-GCC 10.3.0, and on Linux is gcc 11.3.0.

Usage

Generate dialogue with the following code

>>> from transformers import AutoTokenizer, AutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
>>> model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
>>> model = model.eval()
>>> response, history = model.chat(tokenizer, "你好", history=[])
>>> print(response)
你好👋!我是人工智能助手 ChatGLM-6B,很高兴见到你,欢迎问我任何问题。
>>> response, history = model.chat(tokenizer, "晚上睡不着应该怎么办", history=history)
>>> print(response)
晚上睡不着可能会让你感到焦虑或不舒服,但以下是一些可以帮助你入睡的方法:

1. 制定规律的睡眠时间表:保持规律的睡眠时间表可以帮助你建立健康的睡眠习惯,使你更容易入睡。尽量在每天的相同时间上床,并在同一时间起床。
2. 创造一个舒适的睡眠环境:确保睡眠环境舒适,安静,黑暗且温度适宜。可以使用舒适的床上用品,并保持房间通风。
3. 放松身心:在睡前做些放松的活动,例如泡个热水澡,听些轻柔的音乐,阅读一些有趣的书籍等,有助于缓解紧张和焦虑,使你更容易入睡。
4. 避免饮用含有咖啡因的饮料:咖啡因是一种刺激性物质,会影响你的睡眠质量。尽量避免在睡前饮用含有咖啡因的饮料,例如咖啡,茶和可乐。
5. 避免在床上做与睡眠无关的事情:在床上做些与睡眠无关的事情,例如看电影,玩游戏或工作等,可能会干扰你的睡眠。
6. 尝试呼吸技巧:深呼吸是一种放松技巧,可以帮助你缓解紧张和焦虑,使你更容易入睡。试着慢慢吸气,保持几秒钟,然后缓慢呼气。

如果这些方法无法帮助你入睡,你可以考虑咨询医生或睡眠专家,寻求进一步的建议。

The implementation of the model is still in development. If you want to fix the used model implementation to ensure compatibility, you can add the revision="v1.1.0" parameter in the from_pretrained call. v1.1.0 is the latest version number. For a complete list of versions, see Change Log.

Load the model locally

The above code will automatically download the model implementation and checkpoints by transformers. The full model implementation can be found at Hugging Face Hub. If your network environment is poor, downloading model parameters may take a long time or even fail. At this point, you can download the model to the local first, and then load it from the local.

To download models from Hugging Face Hub, you need to install Git LFS , then run

git clone https://huggingface.co/THUDM/chatglm-6b

After downloading the model locally, replace THUDM/chatglm-6b in the above code with the path of your local chatglm-6b folder to load the model lo

Core symbols most depended-on inside this repo

is_world_process_zero

_issue_warnings_after_load

Shape

Method 73

Function 36

Class 4

Route 1

Languages

Python100%

Modules by API surface

ptuning/trainer.py69 symbols

web_demo_vision.py6 symbols

ptuning/web_demo.py6 symbols

ptuning/main.py6 symbols

web_demo.py5 symbols

ptuning/trainer_seq2seq.py5 symbols

ptuning/arguments.py3 symbols

cli_demo_vision.py3 symbols

cli_demo.py3 symbols

api.py3 symbols

web_demo2.py2 symbols

utils.py2 symbols

Dependencies from manifests, versioned

torch1.10 · 1×

transformers4.27.1 · 1×

For agents

$ claude mcp add ChatGLM-6B \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact