hub / github.com/zai-org/ChatGLM3

github.com/zai-org/ChatGLM3 @main sqlite

209 symbols 799 edges 34 files 18 documented · 9%

README

ChatGLM3

📄 Report • 🤗 HF Repo • 🤖 ModelScope • 🟣 WiseModel • 📔 Document • 🧰 OpenXLab • 🐦 Twitter

👋 Join our  <a href="https://discord.gg/fK2dz4bg" target="_blank">Discord</a> and <a href="https://github.com/zai-org/ChatGLM3/raw/main/resources/WECHAT.md" target="_blank">WeChat</a>

📍Experience the larger-scale ChatGLM model at chatglm.cn

📔 About ChatGLM3-6BFor more detailed usage information, please refer to:

GLM-4 Open Source Model and API

We have released the latest GLM-4 model, which has made new breakthroughs in multiple indicators. You can directly experience our latest model in the following two channels.

GLM-4 open source model We have open sourced the GLM-4-9B series models, which have significantly improved the performance of various indicators. Welcome to try.
Zhipu Qingyan Experience the latest version of GLM-4, including GLMs, All tools and other functions.
API platform The new generation of API platform has been launched. You can directly experience new models such as GLM-4-0520, GLM-4-air, GLM-4-airx, GLM-4-flash, GLM-4, GLM-3-Turbo, CharacterGLM-3, CogView-3 on the API platform. Among them, the two models GLM-4 and GLM-3-Turbo support new functions such as System Prompt, Function Call, Retrieval, and Web_Search. You are welcome to experience them.
GLM4 API open source tutorial GLM-4 API tutorial and basic applications, welcome to try. API-related questions can be asked in this open source tutorial, or use GLM-4 API AI Assistant to get help with common problems.

ChatGLM3 Introduction

ChatGLM3 is a generation of pre-trained dialogue models jointly released by Zhipu AI and Tsinghua KEG. ChatGLM3-6B is the open-source model in the ChatGLM3 series, maintaining many excellent features of the first two generations such as smooth dialogue and low deployment threshold, while introducing the following features:

Stronger Base Model: The base model of ChatGLM3-6B, ChatGLM3-6B-Base, adopts a more diverse training dataset, more sufficient training steps, and a more reasonable training strategy. Evaluations on datasets from various perspectives such as semantics, mathematics, reasoning, code, and knowledge show that ChatGLM3-6B-Base has the strongest performance among base models below 10B.
More Complete Function Support: ChatGLM3-6B adopts a newly designed Prompt format, supporting multi-turn dialogues as usual. It also natively supports tool invocation (Function Call), code execution (Code Interpreter), and Agent tasks in complex scenarios.
More Comprehensive Open-source Series: In addition to the dialogue model ChatGLM3-6B, the basic model ChatGLM3-6B-Base, the long-text dialogue model ChatGLM3-6B-32K and further strengthens the ability to understand long texts ChatGLM3-6B-128K have also been open-sourced. All these weights are fully open for academic research, and free commercial use is also allowed after registration via a questionnaire.

The ChatGLM3 open-source model aims to promote the development of large-model technology together with the open-source community. Developers and everyone are earnestly requested to comply with the open-source protocol, and not to use the open-source models, codes, and derivatives for any purposes that might harm the nation and society, and for any services that have not been evaluated and filed for safety. Currently, no applications, including web, Android, Apple iOS, and Windows App, have been developed based on the ChatGLM3 open-source model by our project team.

Although every effort has been made to ensure the compliance and accuracy of the data at various stages of model training, due to the smaller scale of the ChatGLM3-6B model and the influence of probabilistic randomness factors, the accuracy of output content cannot be guaranteed. The model output is also easily misled by user input. This project does not assume risks and liabilities caused by data security, public opinion risks, or any misleading, abuse, dissemination, and improper use of open-source models and codes.

Model List

Model	Seq Length	Download
ChatGLM3-6B	8k	HuggingFace \| ModelScope
ChatGLM3-6B-Base	8k	HuggingFace \| ModelScope
ChatGLM3-6B-32K	32k	HuggingFace \| ModelScope
ChatGLM3-6B-128K	128k	HuggingFace ｜ ModelScope

Projects

The following excellent open source repositories have in-depth support for the ChatGLM3-6B model, and everyone is welcome to expand their learning.

Inference acceleration:

chatglm.cpp: Real-time inference on your laptop accelerated by quantization, similar to llama.cpp.
ChatGLM3-TPU: Using the TPU accelerated inference solution, it runs about 7.5 token/s in real time on the end-side chip BM1684X (16T@FP16, 16G DDR).
TensorRT-LLM: A high-performance GPU-accelerated inference solution developed by NVIDIA, you can refer to these steps to deploy ChatGLM3.
OpenVINO: A high-performance CPU and GPU accelerated inference solution developed by Intel, you can refer to this step to deploy the ChatGLM3-6B model

Efficient fine-tuning:

LLaMA-Factory: An excellent, easy-to-use and efficient fine-tuning framework.

Application framework:

LangChain-Chatchat: Based on large language models such as ChatGLM and application frameworks such as Langchain, open source and offline deployable retrieval enhancement generation (RAG) large Model knowledge base project.
BISHENG: open-source platform for developing LLM applications. It empowers and accelerates the development of LLM applications and helps users to enter the next generation of application development mode with the best experience.
RAGFlow: An open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. It offers a streamlined RAG workflow for businesses of any scale, combining LLM (Large Language Models) to provide truthful question-answering capabilities, backed by well-founded citations from various complex formatted data.

Evaluation Results

Typical Tasks

We selected 8 typical Chinese-English datasets and conducted performance tests on the ChatGLM3-6B (base) version.

Model	GSM8K	MATH	BBH	MMLU	C-Eval	CMMLU	MBPP	AGIEval
ChatGLM2-6B-Base	32.4	6.5	33.7	47.9	51.7	50.0	-	-
Best Baseline	52.1	13.1	45.0	60.1	63.5	62.2	47.5	45.8
ChatGLM3-6B-Base	72.3	25.7	66.1	61.4	69.0	67.5	52.4	53.7

"Best Baseline" refers to the pre-trained models that perform best on the corresponding datasets with model parameters below 10B, excluding models that are trained specifically for a single task and do not maintain general capabilities.

In the tests of ChatGLM3-6B-Base, BBH used a 3-shot test, GSM8K and MATH that require inference used a 0-shot CoT test, MBPP used a 0-shot generation followed by running test cases to calculate Pass@1, and other multiple-choice type datasets all used a 0-shot test.

We have conducted manual evaluation tests on ChatGLM3-6B-32K in multiple long-text application scenarios. Compared with the second-generation model, its effect has improved by more than 50% on average. In applications such as paper reading, document summarization, and financial report analysis, this improvement is particularly significant. In addition, we also tested the model on the LongBench evaluation set, and the specific results are shown in the table below.

Model	Average	Summary	Single-Doc QA	Multi-Doc QA	Code	Few-shot	Synthetic
ChatGLM2-6B-32K	41.5	24.8	37.6	34.7	52.8	51.3	47.7
ChatGLM3-6B-32K	50.2	26.6	45.8	46.1	56.2	61.2	65

How to Use

Environment Installation

First, you need to download this repository:

git clone https://github.com/THUDM/ChatGLM3
cd ChatGLM3

Then use pip to install the dependencies:

pip install -r requirements.txt

In order to ensure that the version of torch is correct, please strictly follow the instructions of official documentation for installation.

Integrated Demo

We provide an integrated demo that incorporates the following three functionalities. Please refer to Integrated Demo for how to run it.

Chat: Dialogue mode, where you can interact with the model.
Tool: Tool mode, where in addition to dialogue, the model can also perform other operations using tools.
Code Interpreter: Code interpreter mode, where the model can execute code in a Jupyter environment and obtain results to complete complex tasks.

Usage

The ChatGLM model can be called to start a conversation using the following code:

>> from transformers import AutoTokenizer, AutoModel
>> tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
>> model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True, device='cuda')
>> model = model.eval()
>> response, history = model.chat(tokenizer, "你好", history=[])
>> print(response)

Hello 👋! I am the AI assistant ChatGLM3-6B, nice to meet you, feel free to ask me any questions.

>> response, history = model.chat(tokenizer, "What should I do if I can't sleep at night?", history=history)
>> print(response)

Having trouble sleeping at night may make you feel anxious or uncomfortable, but here are some ways to help you fall asleep:

1. Develop a regular sleep schedule: Keeping a regular sleep schedule can help you develop healthy sleep habits and make it easier for you to fall asleep. Try to go to bed at the same time every day and get up at the same time.

2. Create a comfortable sleeping environment: Make sure the sleeping environment is comfortable, quiet, dark and at a suitable temperature. You can use comfortable bedding and keep the room ventilated.

3. Relax your body and mind: Doing some relaxing activities before bed, such as taking a hot bath, listening to some soft music, reading some interesting books, etc., can help relieve tension and anxiety and make it easier for you to fall asleep.

4. Avoid drinking caffeinated beverages: Caffeine is a stimulant that can affect your sleep quality. Try to avoid drinking caffeinated beverages such as coffee, tea and cola before bed.

5. Avoid doing things that are not related to sleep in bed: Doing things that are not related to sleep in bed, such as watching movies, playing games or working, etc., may interfere with your sleep.
6. Try breathing techniques: Deep breathing is a relaxation technique that can help you relieve tension and anxiety, making it easier for you to fall asleep. Try breathing in slowly, holding it for a few seconds, and then exhaling slowly.

If these methods don't help you fall asleep, you may consider consulting a doctor or sleep specialist for further advice.

Load Model Locally

The above code will automatically download the model

Core symbols most depended-on inside this repo

postprocess_text

called by 14

composite_demo/conversation.py

append_conversation

called by 6

composite_demo/demo_tool.py

append_conversation

called by 6

composite_demo/demo_ci.py

show

called by 6

composite_demo/conversation.py

get_client

called by 3

composite_demo/client.py

generate_stream

called by 3

composite_demo/client.py

process_response

called by 3

openai_api_demo/utils.py

generate_stream_chatglm3

called by 3

openai_api_demo/utils.py

Shape

Function 104

Class 49

Method 48

Route 8

Languages

Python100%

Modules by API surface

openai_api_demo/api_server.py27 symbols

finetune_demo/finetune_hf.py27 symbols

Intel_device_demo/ipex_llm_cpu_demo/api_server.py27 symbols

composite_demo/demo_ci.py18 symbols

composite_demo/conversation.py9 symbols

composite_demo/client.py9 symbols

openai_api_demo/utils.py8 symbols

langchain_demo/ChatGLM3.py8 symbols

Intel_device_demo/ipex_llm_cpu_demo/utils.py8 symbols

basic_demo/web_demo_gradio.py7 symbols

composite_demo/tool_registry.py6 symbols

tools_using_demo/tool_register.py5 symbols

Dependencies from manifests, versioned

accelerate0.29.2 · 1×

arxiv2.1.0 · 1×

cpm_kernels1.0.11 · 1×

datasets2.18.0 · 1×

deepspeed0.16.2 · 1×

fastapi0.110.0 · 1×

gradio4.26.0 · 1×

huggingface_hub0.19.4 · 1×

ipykernel6.26.0 · 1×

ipython8.18.1 · 1×

jieba0.42.1 · 1×

jupyter1.0.0 · 1×

For agents

$ claude mcp add ChatGLM3 \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact