hub / github.com/zai-org/CodeGeeX2

github.com/zai-org/CodeGeeX2 @main sqlite

59 symbols 218 edges 9 files 15 documented · 25%

README

🏠 <a href="https://codegeex.cn" target="_blank">Homepage</a>｜🛠 Extensions <a href="https://marketplace.visualstudio.com/items?itemName=aminer.codegeex" target="_blank">VS Code</a>, <a href="https://plugins.jetbrains.com/plugin/20587-codegeex" target="_blank">Jetbrains</a>｜🤗 <a href="https://huggingface.co/THUDM/codegeex2-6b" target="_blank">HF Repo</a>｜📄 <a href="https://arxiv.org/abs/2303.17568" target="_blank">Paper</a>







👋 Join our <a href="https://discord.gg/8gjHdkmAN6" target="_blank">Discord</a>, <a href="https://join.slack.com/t/codegeexworkspace/shared_invite/zt-1s118ffrp-mpKKhQD0tKBmzNZVCyEZLw" target="_blank">Slack</a>, <a href="https://t.me/+IipIayJ32B1jOTg1" target="_blank">Telegram</a>, <a href="https://github.com/zai-org/CodeGeeX2/raw/main/resources/wechat.md"target="_blank">WeChat</a>

查看中文版

日本語で読む

Lire en Français

CodeGeeX2: A More Powerful Multilingual Code Generation Model

CodeGeeX2 is the second-generation model of the multilingual code generation model CodeGeeX (KDD’23), which is implemented based on the ChatGLM2 architecture trained on more code data. Due to the advantage of ChatGLM2, CodeGeeX2 has been comprehensively improved in coding capability (+107% > CodeGeeX; with only 6B parameters, surpassing larger StarCoder-15B for some tasks). It has the following features:

More Powerful Coding Capabilities: Based on the ChatGLM2-6B model, CodeGeeX2-6B has been further pre-trained on 600B code tokens, which has been comprehensively improved in coding capability compared to the first-generation. On the HumanEval-X benchmark, all six languages have been significantly improved (Python +57%, C++ +71%, Java +54%, JavaScript +83%, Go +56%, Rust +321\%), and in Python it reached 35.9% of Pass@1 one-time pass rate, surpassing the larger StarCoder-15B.
More Useful Features: Inheriting the ChatGLM2-6B model features, CodeGeeX2-6B better supports both Chinese and English prompts, maximum 8192 sequence length, and the inference speed is significantly improved compared to the first-generation. After quantization, it only needs 6GB of GPU memory for inference, thus supports lightweight local deployment.
Comprehensive AI Coding Assistant: The backend of CodeGeeX plugin (VS Code, Jetbrains) is upgraded, supporting 100+ programming languages, and adding practical functions such as infilling and cross-file completion. Combined with the "Ask CodeGeeX" interactive AI coding assistant, it can be used to solve various programming problems via Chinese or English dialogue, including but not limited to code summarization, code translation, debugging, and comment generation, which helps increasing the efficiency of developpers.
Open License: CodeGeeX2-6B weights are fully open to academic research, and please apply for commercial use by filling in the registration form.

AI Coding Assistant

We have developed the CodeGeeX plugin, which supports IDEs such as VS Code, IntelliJ IDEA, PyCharm, GoLand, WebStorm, and Android Studio. The plugin allows you to experience the CodeGeeX2 model's capabilities in code generation and completion, annotation, code translation, and "Ask CodeGeeX" interactive programming, which can help improve your development efficiency. Please download the CodeGeeX plugin in your IDE to get a more comprehensive AI coding experience. You can find more details on our homepage.

Get Started

Use transformers to quickly launch CodeGeeX2-6B：

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True, device='cuda')
model = model.eval()

# remember adding a language tag for better performance
prompt = "# language: Python\n# write a bubble sort function\n"
inputs = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_length=256, top_k=1)
response = tokenizer.decode(outputs[0])

>>> print(response)
# language: Python
# write a bubble sort function


def bubble_sort(list):
    for i in range(len(list) - 1):
        for j in range(len(list) - 1):
            if list[j] > list[j + 1]:
                list[j], list[j + 1] = list[j + 1], list[j]
    return list


print(bubble_sort([5, 2, 1, 8, 4]))

Launch Gradio DEMO:

python ./demo/run_demo.py

❗️Attention: * CodeGeeX2 is a base model, which is not instruction-tuned for chatting. It can do tasks like code completion/translation/explaination. To try the instruction-tuned version in CodeGeeX plugins (VS Code, Jetbrains). * Programming languages can be controled by adding language tag, e.g., # language: Python. The format should be respected to ensure performance, full list can be found here. Please write comments under the format of the selected programming language to achieve better results. * If the GPU doesn't support bfloat16 format, it will cause incorrect output. Please convert the model to float16 format: python model = AutoModel.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True).half().cuda() * If you need to use Multiple GPUs to load the model, you can use the following code: python tokenizer = AutoTokenizer.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True) model = AutoModel.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True, device='cuda') model = model.eval() Replace with

```python
def get_model():
    tokenizer = AutoTokenizer.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True)
    from gpus import load_model_on_gpus
    # The "gpus" file is located in the demo folder
    model = load_model_on_gpus("THUDM/codegeex2-6b", num_gpus=2)
    model = model.eval()
    return tokenizer, model

tokenizer, model = get_model()
```

Evaluation

CodeGeeX2 is a base model for multilingual code generation, which has been significantly improved in its coding ability compared to the previous generation. The following are the evaluation results on the HumanEval, HumanEval-X, and DS1000 benchmarks (the evaluation metric Pass@k is the same as in the paper):

HumanEval (Pass@1,10,100)

Model	Pass@1	Pass@10	Pass@100
CodeGen-16B-multi	19.2	34.6	55.2
CodeGeeX-13B	22.9	39.6	60.9
Codex-12B	28.8	46.8	72.3
CodeT5Plus-16B-mono	30.9	51.6	76.7
Code-Cushman-001	33.5	54.3	77.4
LLaMA-65B	23.7	-	79.3
LLaMA2-70B	29.9	-	-
CodeGen2.5-7B-mono	33.4	58.4	82.7
StarCoder-15B	33.2	61.0	84.7
CodeGeeX2-6B	35.9	62.6	88.3
> `n=20, t=0.2, top_p=0.95` for Pass@1; `n=200, t=0.8, top_p=0.95` for Pass@10 and Pass@100.

HumanEval-X (Pass@1)

Model	Python	C++	Java	JavaScript	Go	Rust	Overall
CodeGen-16B-multi	19.2	18.1	15.0	18.4	13.0	1.8	14.2
CodeGeeX-13B	22.9	17.1	20.0	17.6	14.4	4.3	16.0
Replit-code-v1-3B	22.0	20.1	20.1	20.1	12.2	8.6	17.2
CodeGen2.5-7B-multi	30.6	24.3	29.0	27.5	18.9	20.1	25.1
StarCoder-15B	35.5	28.2	31.5	33.2	21.3	17.8	27.9
CodeGeeX2-6B	35.9	29.3	30.8	32.2	22.5	18.1	28.1
> `n=20, t=0.2, top_p=0.95` for Pass@1.

The above results can be reproduced by running scripts/run_humanevalx.sh. Refer to HumanEval-X environment for the experiment setups.

DS1000 (Pass@1)

Model	Matplotlib	Numpy	Pandas	Pytorch	SciPy	Scikit-learn	TensorFlow	Overall
# Samples	155	220	291	68	106	115	45	1000
CodeGen-16B-Mono	31.7	10.9	3.4	7.0	9.0	10.8	15.2	11.7
code-cushman-001	40.7	21.8	7.9	12.4	11.3	18.0	12.2	18.1
Codex-001	41.8	26.6	9.4	9.7	15.0	18.5	17.2	20.2
CodeGeeX2-6B	40.5	25.5	14.5	17.3	19.3	24.0	23.0	23.1
StarCoder-15B	51.7	29.7	11.4	21.4	20.2	29.5	24.5	26.0
Codex-002	57.0	43.1	26.5	41.8	31.8	44.8	39.3	39.2
> `n=40, t=0.2, top_p=0.5` for Pass@1。

The above results can be reproduced by the code in DS1000 repo.

Inference

CodeGeeX2 is more friendly to deployment than the previous generation. Thanks to the use of Multi-Query Attention and Flash Attention, the inference speed is faster, and only 6GB of GPU memory is required after INT4 quantization.

Quantization

Model	FP16/BF16	INT8	INT4
CodeGeeX-13B	26.9 GB	14.7 GB	-
CodeGeeX2-6B	13.1 GB	8.2 GB	5.5 GB
> Based on PyTorch 2.0, using `torch.nn.functional.scaled_dot_product_attention` for effecient attention mechanism。

Acceleration

Model	Inference speed (token/s)
CodeGeeX-13B	32
CodeGeeX2-6B	94
> `batch_size=1, max_length=2048`, both using acceleration framework, in `GeForce RTX-3090`。

License

The code in this repository is open source under the Apache-2.0 license. The model weights are licensed under the Model License. CodeGeeX2-6B weights are open for academic research, and please apply for commercial use by filling in the registration form.

Citation

If you find our work helpful, please feel free to cite the following paper:

@inproceedings{zheng2023codegeex,
  title={CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X},
  author={Qinkai Zheng and Xiao Xia and Xu Zou and Yuxiao Dong and Shan Wang and Yufei Xue and Zihan Wang and Lei Shen and Andi Wang and Yang Li and Teng Su and Zhilin Yang and Jie Tang},
  booktitle={Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
  pages={5673--5684},
  year={2023}
}

Core symbols most depended-on inside this repo

evaluation/execution.py

auto_configure_device_map

Shape

Function 40

Method 13

Class 5

Route 1

Languages

Python100%

Modules by API surface

evaluation/utils.py19 symbols

evaluation/execution.py16 symbols

evaluation/generation.py8 symbols

demo/run_demo.py5 symbols

evaluation/evaluation.py4 symbols

demo/fastapicpu.py4 symbols

demo/gpus.py2 symbols

evaluation/inspect_jsonl.py1 symbols

Dependencies from manifests, versioned

github.com/davecgh/go-spewv1.1.1 · 1×

github.com/go-openapi/inflectv0.19.0 · 1×

github.com/pmezard/go-difflibv1.0.0 · 1×

github.com/stretchr/testifyv1.8.0 · 1×

gopkg.in/yaml.v3v3.0.1 · 1×

torch2.0 · 1×

transformers4.30.2 · 1×

For agents

$ claude mcp add CodeGeeX2 \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact