<a href="https://github.com/IDEA-CCNL/Fengshenbang-LM/raw/main/LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-dfd.svg"></a>
<a href=""><img src="https://img.shields.io/badge/python-3.7+-aff.svg"></a>
<a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
Fengshenbang 1.0: Fengshenbang1.0 bilingual general paper, aims to be the Foundation of Chinese Cognitive Intelligence.
BioBART: A generative language model for biomedical domain provided by Tsinghua University together with IDEA Institute.(
BioNLP 2022)UniMC: A unified model for zero-shot scenarios based on labeled datasets.(
EMNLP 2022)FMIT: A single-tower multimodal named entity recognition model based on relative position encoding.(
COLING 2022)UniEX: A Natural Language Understanding Model for Unified Extraction Tasks.(
ACL 2023)Solving Math Word Problems via Cooperative Reasoning induced Language Models: Solving Math Word Problems via Cooperative Reasoning induced Language Models.(
ACL 2023)MVP-Tuning: 基Multi-View Knowledge Retrieval with Prompt Tuning for Commonsense Reasoning.(
ACL 2023)
| Series | Demand | Task | Parameter Scale | Extra |
|---|---|---|---|---|
| Ziya | General | AGI | >7B | Ziya has the capabilities of translation, programming, text classification, information extraction, summarization, copy generation, common sense question and answer, and mathematical calculation. |
| Erlangshen | General | NLU | 97M-3.9B | Erlangshen was designed to solve NLU tasks; The largest BERT when publicly released; SOTA on FewCLUE and ZeroCLUE in 2021. |
| Wenzhong | General | NLG | 1B-3.5B | Wenzhong focuses on NLG tasks; Provides several generative models with different scales, such as GPT2, etc. |
| Randeng | General | NLT | 770M-5B | Randeng handles natural language transformation (NLT) type tasks that convert from source text to target text, such as machine translation, text summarization, etc. |
| Taiyi | Speical | MultiModal | 87M-1B | Taiyi was applied to cross-modality scenarios, including text image generation, protein structure prediction, speech-text representation, etc. |
| Yuyuan | Speical | Domain | 0.1B-3.5B | Yuyuan was applied to specific domains such as healthcare, finance, law, programming, etc; The largest open-source GPT2 medical model |
| -TBD- | Special | Exploration | -Unknown- | This series hopes to develop experimental models on NLP with various technology companies and universities. Currently there are:Zhouwenwang |
Fengshenbang Model training and fine-tuning code script
Remarkable advances in Artificial Intelligence (AI) have produced great models, in particular, pre-trained based foundation models become an emerging paradigm. In contrast to traditional AI models that must be trained on vast datasets for one or a few scenarios, foundation models can be adapted to a wide range of downstream tasks, therefore, limiting the amount of resource demanded to acquire an AI venture off the ground. Moreover, we observe that these models grow rapidly within a short period, around 10 times each year. For instance, BERT has 100 million parameters and GTP-3 has over 100 billion parameters. Many of the forefront challenges in AI, especially generalization ability, are becoming achievable due to this inspiring trend.
Foundation models, most notably language models, are dominated by the English-language community. The Chinese language as the world's largest spoken language (native speakers), however, has no systematic research resources to support it, making the progress in the Chinese language domain lag behind others.
And the world needs an answer for this.
On November 22nd, 2021, Harry Shum, the Founder and Chairman of the IDEA (International Digital Economy Academy) officially announces the launch of "Fengshenbang" open source project. —— a Chinese language driven foundation ecosystem, incorporates pre-trained models, task-specific fine-tune applications, benchmarks, and datasets.

"Fengshenbang Model" will open-source a series of NLP-related pre-trained models in all aspects. There are a wide range of research tasks in the NLP community, which can be divided into two categories: general demands and special demands. In general demands, there are common NLP tasks, which are classified into Natural Language Understanding (NLU), Natural Language Generation (NLG), and Natural Language Transformation (NLT). Due to the fast development, NLP community brings special demands to the entire AI community, which are often assigned to MultiModal (MM), Domains and Exploration. We consider all of these tasks and provide models that are fine tuning for downstream tasks, making our base model easy to use for users with limited computing resources. We consider all of these demands and provide models that are fine-tuned for downstream tasks, making our base model easy to use for users with limited computing resources. Moreover, we guarantee that we will optimize the models continuously with new datasets and latest algorithms. We aim to build universal infrastructure for Chinese cognitive intelligence and prevent duplicative construction, and hence save computing resources for the community.

We also call for businesses, universities and institutions to join us with the project and build the sytem of large-scale open-source models collaboratively. We envision that, in the near future, the first choice when in need of a new pretrained model should be selecting one in closest proximity to the desired scale,architecture and domain from the series, followed by further training. After obtaining a trained new model, we shall add it back to the series of open-source models for future usage. In this way we build the open-source system iteratively and collaboratively while individuals could get desired models using minimal computing resources.
For better open source experience, all models of the Fengshenbang series are synchronized within the Huggingface community, and can be obtained for use within few lines of code. Welcome to download and use our models from our repo at IDEA-CCNL at HuggingFace.
The general large-scale model "Ziya" series has the capabilities of translation, programming, text classification, information extraction, summarization, copy generation, common sense question and answer, and mathematical calculation. At present, Ziya's general-purpose large model (v1/v1.1) has completed a three-stage training process of large-scale pre-training, multi-task supervised fine-tuning, and human feedback learning. Ziya series models include the following models: - Ziya-LLaMA-13B-v1.1 - Ziya-LLaMA-13B-v1 - Ziya-LLaMA-7B-Reward - Ziya-LLaMA-13B-Pretrain-v1 - Ziya-BLIP2-14B-Visual-v1
Refer to Ziya-LLaMA-13B-v1
Refer to ziya_finetune
Refer to ziya_inference
This series focuses on using bidirectional language models with encoders to solve multiple natural language understanding tasks. Erlangshen-MegatronBert-1.3B is the largest Chinese open source model with the structure of Bert. It contains 13 billion parameters, and was trained with 280G datasets on 32 A100 GPUs for 14 days. It achieved the top on the Chinese natural language understanding benchmark FewCLUE on Nov 10th, 2021. Among the tasks of FewCLUE, Erlangshen-1.3 beat human performance on the task of CHID(Chinese idioms cloze test) and TNEWS(News Classification), and achieved SOTA on tasks of CHID, CSLDCP(academic literature classification) and OCNLI(Natural language Inference), refreshing the records of few-shot learning. We will continue to optimize the Erlangshen series with respect to model scale, knowledge fusion, auxiliary supervision tasks, etc.

Erlangshen-MRC achieved the Chinese language comprehension evaluations benchmark ZeroCLUE on Jan 24th, 2022. Among the tasks of ZeroCLUE, CSLDCP (discipline literature classification), TNEWS (news classification), IFLYTEK (application description classification), CSL (abstract keyword recognition), CLUEWSC (reference resolution) achieved SOTA.

[Huggingface Erlangshen-MegatronBert-1.3B](https://huggingfa
$ claude mcp add Fengshenbang-LM \
-- python -m otcore.mcp_server <graph>