hub / github.com/IDEA-CCNL/Fengshenbang-LM

github.com/IDEA-CCNL/Fengshenbang-LM @main sqlite

3,121 symbols 8,923 edges 259 files 675 documented · 22%

README

<a href="https://github.com/IDEA-CCNL/Fengshenbang-LM/raw/main/LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-dfd.svg"></a>
<a href=""><img src="https://img.shields.io/badge/python-3.7+-aff.svg"></a>
<a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>

Installation | Model Downloads | Code Example | Docs | Fengshenbang Website | API

Fengshenbang Achievements

Fengshenbang 1.0: Fengshenbang1.0 bilingual general paper, aims to be the Foundation of Chinese Cognitive Intelligence.

BioBART: A generative language model for biomedical domain provided by Tsinghua University together with IDEA Institute.(BioNLP 2022)

UniMC: A unified model for zero-shot scenarios based on labeled datasets.(EMNLP 2022)

FMIT: A single-tower multimodal named entity recognition model based on relative position encoding.(COLING 2022)

UniEX: A Natural Language Understanding Model for Unified Extraction Tasks.(ACL 2023)

Solving Math Word Problems via Cooperative Reasoning induced Language Models: Solving Math Word Problems via Cooperative Reasoning induced Language Models.(ACL 2023)

MVP-Tuning: 基Multi-View Knowledge Retrieval with Prompt Tuning for Commonsense Reasoning.(ACL 2023)

Fengshenbang Big Event

Navigation

Fengshenbang Achievements
Fengshenbang Big Event
Navigation
Model Infofrmation
Fengshenbang-LM
Fengshenbang Model
Ziya
Erlangshen
Taiyi
Fengshen Framework
Installation
Pipelines
Fengshen Benchmark
Fegnshenbang Series Articles
Citation
Contact
License

Model Infofrmation

Series	Demand	Task	Parameter Scale	Extra
Ziya	General	AGI	>7B	Ziya has the capabilities of translation, programming, text classification, information extraction, summarization, copy generation, common sense question and answer, and mathematical calculation.
Erlangshen	General	NLU	97M-3.9B	Erlangshen was designed to solve NLU tasks; The largest BERT when publicly released; SOTA on FewCLUE and ZeroCLUE in 2021.
Wenzhong	General	NLG	1B-3.5B	Wenzhong focuses on NLG tasks; Provides several generative models with different scales, such as GPT2, etc.
Randeng	General	NLT	770M-5B	Randeng handles natural language transformation (NLT) type tasks that convert from source text to target text, such as machine translation, text summarization, etc.
Taiyi	Speical	MultiModal	87M-1B	Taiyi was applied to cross-modality scenarios, including text image generation, protein structure prediction, speech-text representation, etc.
Yuyuan	Speical	Domain	0.1B-3.5B	Yuyuan was applied to specific domains such as healthcare, finance, law, programming, etc; The largest open-source GPT2 medical model
-TBD-	Special	Exploration	-Unknown-	This series hopes to develop experimental models on NLP with various technology companies and universities. Currently there are:Zhouwenwang

Download url of Fengshenbang

Fengshenbang Model training and fine-tuning code script

Handbook of Fengshenbang

Fengshenbang-LM

Remarkable advances in Artificial Intelligence (AI) have produced great models, in particular, pre-trained based foundation models become an emerging paradigm. In contrast to traditional AI models that must be trained on vast datasets for one or a few scenarios, foundation models can be adapted to a wide range of downstream tasks, therefore, limiting the amount of resource demanded to acquire an AI venture off the ground. Moreover, we observe that these models grow rapidly within a short period, around 10 times each year. For instance, BERT has 100 million parameters and GTP-3 has over 100 billion parameters. Many of the forefront challenges in AI, especially generalization ability, are becoming achievable due to this inspiring trend.

Foundation models, most notably language models, are dominated by the English-language community. The Chinese language as the world's largest spoken language (native speakers), however, has no systematic research resources to support it, making the progress in the Chinese language domain lag behind others.

And the world needs an answer for this.

On November 22nd, 2021, Harry Shum, the Founder and Chairman of the IDEA (International Digital Economy Academy) officially announces the launch of "Fengshenbang" open source project. —— a Chinese language driven foundation ecosystem, incorporates pre-trained models, task-specific fine-tune applications, benchmarks, and datasets. avatar

Fengshenbang Model

"Fengshenbang Model" will open-source a series of NLP-related pre-trained models in all aspects. There are a wide range of research tasks in the NLP community, which can be divided into two categories: general demands and special demands. In general demands, there are common NLP tasks, which are classified into Natural Language Understanding (NLU), Natural Language Generation (NLG), and Natural Language Transformation (NLT). Due to the fast development, NLP community brings special demands to the entire AI community, which are often assigned to MultiModal (MM), Domains and Exploration. We consider all of these tasks and provide models that are fine tuning for downstream tasks, making our base model easy to use for users with limited computing resources. We consider all of these demands and provide models that are fine-tuned for downstream tasks, making our base model easy to use for users with limited computing resources. Moreover, we guarantee that we will optimize the models continuously with new datasets and latest algorithms. We aim to build universal infrastructure for Chinese cognitive intelligence and prevent duplicative construction, and hence save computing resources for the community.

avatar

We also call for businesses, universities and institutions to join us with the project and build the sytem of large-scale open-source models collaboratively. We envision that, in the near future, the first choice when in need of a new pretrained model should be selecting one in closest proximity to the desired scale,architecture and domain from the series, followed by further training. After obtaining a trained new model, we shall add it back to the series of open-source models for future usage. In this way we build the open-source system iteratively and collaboratively while individuals could get desired models using minimal computing resources.

For better open source experience, all models of the Fengshenbang series are synchronized within the Huggingface community, and can be obtained for use within few lines of code. Welcome to download and use our models from our repo at IDEA-CCNL at HuggingFace.

Ziya

The general large-scale model "Ziya" series has the capabilities of translation, programming, text classification, information extraction, summarization, copy generation, common sense question and answer, and mathematical calculation. At present, Ziya's general-purpose large model (v1/v1.1) has completed a three-stage training process of large-scale pre-training, multi-task supervised fine-tuning, and human feedback learning. Ziya series models include the following models: - Ziya-LLaMA-13B-v1.1 - Ziya-LLaMA-13B-v1 - Ziya-LLaMA-7B-Reward - Ziya-LLaMA-13B-Pretrain-v1 - Ziya-BLIP2-14B-Visual-v1

Example of Usage

Refer to Ziya-LLaMA-13B-v1

Online Demo

Finetune Example

Refer to ziya_finetune

Inference & Quantization Example

Refer to ziya_inference

Erlangshen

This series focuses on using bidirectional language models with encoders to solve multiple natural language understanding tasks. Erlangshen-MegatronBert-1.3B is the largest Chinese open source model with the structure of Bert. It contains 13 billion parameters, and was trained with 280G datasets on 32 A100 GPUs for 14 days. It achieved the top on the Chinese natural language understanding benchmark FewCLUE on Nov 10th, 2021. Among the tasks of FewCLUE, Erlangshen-1.3 beat human performance on the task of CHID(Chinese idioms cloze test) and TNEWS(News Classification), and achieved SOTA on tasks of CHID, CSLDCP（academic literature classification) and OCNLI(Natural language Inference), refreshing the records of few-shot learning. We will continue to optimize the Erlangshen series with respect to model scale, knowledge fusion, auxiliary supervision tasks, etc.

Erlangshen-MRC achieved the Chinese language comprehension evaluations benchmark ZeroCLUE on Jan 24th, 2022. Among the tasks of ZeroCLUE, CSLDCP (discipline literature classification), TNEWS (news classification), IFLYTEK (application description classification), CSL (abstract keyword recognition), CLUEWSC (reference resolution) achieved SOTA.

Download the Models

[Huggingface Erlangshen-MegatronBert-1.3B](https://huggingfa

Core symbols most depended-on inside this repo

size

called by 345

fengshen/data/hubert/hubert_dataset.py

log

called by 188

fengshen/examples/disco_project/guided_diffusion/guided_diffusion/logger.py

from_pretrained

called by 176

fengshen/models/zen1/ngram_utils.py

items

called by 121

fengshen/models/auto/auto_factory.py

keys

called by 108

fengshen/models/auto/auto_factory.py

write

called by 78

fengshen/data/megatron_dataloader/indexed_dataset.py

get

called by 64

fengshen/models/auto/auto_factory.py

exists

called by 47

fengshen/data/megatron_dataloader/indexed_dataset.py

Shape

Method 2,009

Class 564

Function 538

Route 10

Languages

Python100%

Modules by API surface

fengshen/models/deberta_v2/modeling_deberta_v2.py99 symbols

fengshen/models/roformer/modeling_roformer.py95 symbols

fengshen/models/longformer/modeling_longformer.py91 symbols

fengshen/models/megatron_t5/modeling_megatron_t5.py89 symbols

fengshen/models/zen2/modeling.py87 symbols

fengshen/models/uniex/modeling_uniex.py77 symbols

fengshen/models/zen1/modeling.py76 symbols

fengshen/data/megatron_dataloader/indexed_dataset.py71 symbols

fengshen/models/deltalm/modeling_deltalm.py68 symbols

fengshen/models/albert/modeling_albert.py60 symbols

fengshen/examples/disco_project/guided_diffusion/guided_diffusion/logger.py59 symbols

fengshen/examples/zen2_finetune/fengshen_token_level_ft_task.py51 symbols

Dependencies from manifests, versioned

accelerate0.19.0 · 1×

deepspeed0.5.10 · 1×

diffusers0.16.1 · 1×

pytorch-lightning1.8.1 · 1×

transformers4.28.0 · 1×

For agents

$ claude mcp add Fengshenbang-LM \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact