MCPcopy Index your code
hub / github.com/RUCAIBox/LLMSurvey

github.com/RUCAIBox/LLMSurvey @main sqlite

repository ↗ · DeepWiki ↗
1,240 symbols 3,455 edges 143 files 104 documented · 8%
README

LLMSurvey

A collection of papers and resources related to Large Language Models.

The organization of papers refers to our survey "A Survey of Large Language Models". Paper page

Please let us know if you find out a mistake or have any suggestions by e-mail: batmanfly@gmail.com

(we suggest ccing another email francis_kun_zhou@163.com meanwhile, in case of any unsuccessful delivery issue.)

If you find our survey useful for your research, please cite the following paper:

@article{LLMSurvey,
    title={A Survey of Large Language Models},
    author={Zhao, Wayne Xin and Zhou, Kun and Li, Junyi and Tang, Tianyi and Wang, Xiaolei and Hou, Yupeng and Min, Yingqian and Zhang, Beichen and Zhang, Junjie and Dong, Zican and Du, Yifan and Yang, Chen and Chen, Yushuo and Chen, Zhipeng and Jiang, Jinhao and Ren, Ruiyang and Li, Yifan and Tang, Xinyu and Liu, Zikang and Liu, Peiyu and Nie, Jian-Yun and Wen, Ji-Rong},
    year={2023},
    journal={arXiv preprint arXiv:2303.18223},
    url={http://arxiv.org/abs/2303.18223}
}

🚀(New) We have released the Chinese book of our survey!

The Chinese book focuses on providing explanations for beginners in the field of LLMs, aiming to present a comprehensive framework and roadmap for LLMs. This book is suitable for senior undergraduate students and junior graduate students with a foundation in deep learning and can serve as an introductory technical book. You can download the Chinese book at https://llmbook-zh.github.io/.

Here is our Chinese book sales page.

chinese_version

🚀(New) The content about long CoT reasoning

In our latest version, we add new content of the recent popular reasoning paradigm by allocating more time to thinking before responding to a problem. We focus on long CoT reasoning which is the mainstream approach taken by recent LLMs, such as DeepSeek-R1 and OpenAI's o-series models. We first discuss the reasoning patterns and advantages of the long CoT paradigm. Then we present the construction approaches of long CoT data, including data distillation, search-based data synthesis, and multi-agent collaboration. Moreover, we introduce the commonly-used two training methods: long CoT instruction tuning and scaling reinforcement learning training. Finally, we conduct a in-depth discussion about recent test-time scaling efforts for LLMs.

Cover

The trends of the number of papers related to LLMs on arXiv

Here are the trends of the cumulative numbers of arXiv papers that contain the keyphrases “language model” (since June 2018) and “large language model” (since October 2019), respectively.

arxiv_llms

The statistics are calculated using exact match by querying the keyphrases in title or abstract by months. We set different x-axis ranges for the two keyphrases, because “language models” have been explored at an earlier time. We label the points corresponding to important landmarks in the research progress of LLMs. A sharp increase occurs after the release of ChatGPT: the average number of published arXiv papers that contain “large language model” in title or abstract goes from 0.40 per day to 8.58 per day.

Technical Evolution of GPT-series Models

A brief illustration for the technical evolution of GPT-series models. We plot this figure mainly based on the papers, blog articles and official APIs from OpenAI. Here, solid lines denote that there exists an explicit evidence (e.g., the official statement that a new model is developed based on a base model) on the evolution path between two models, while dashed lines denote a relatively weaker evolution relation.

gpt-series

Evolutionary Graph of LLaMA Family

An evolutionary graph of the research work conducted on LLaMA. Due to the huge number, we cannot include all the LLaMA variants in this figure, even much excellent work.

LLaMA_family

To support incremental update, we share the source file of this figure, and welcome the readers to include the desired models by submitting the pull requests on our GitHub page. If you're instrested, please request by application.

Prompts

We collect some useful tips for designing prompts that are collected from online notes and experiences from our authors, where we also show the related ingredients and principles (introduced in Section 8.1).

prompt examples

Please click here to view more detailed information.

Welcome everyone to provide us with more relevant tips in the form of issues. After selection, we will regularly update them on GitHub and indicate the source.

Experiments

Instruction Tuning Experiments

We will explore the effect of different types of instructions in fine-tuning LLMs (i.e., 7B LLaMA26), as well as examine the usefulness of several instruction improvement strategies.

instruction_tuning_table

Please click here to view more detailed information.

Ability Evaluaition Experiments

We conduct a fine-grained evaluation on the abilities discussed in Section 7.1 and Section 7.2. For each kind of ability, we select representative tasks and datasets for conducting evaluation experiments to examine the corresponding performance of LLMs.

ability_main

Please click here to view more detailed information.

We also call for support of computing power for conducting more comprehensive experiments.

Table of Contents

Timeline of LLMs

LLMs_timeline

List of LLMs

Category model Release Time Size(B) Link
Publicly Accessbile T5 2019/10 11 Paper
mT5 2021/03 13 Paper
PanGu-α 2021/05 13 Paper
CPM-2 2021/05 198 Paper
T0 2021/10 11 Paper
GPT-NeoX-20B 2022/02 20 Paper
CodeGen 2022/03 16 Paper
Tk-Instruct 2022/04 11 Paper
UL2 2022/02 20 Paper
OPT 2022/05 175 Paper
YaLM 2022/06 100 GitHub
NLLB 2022/07 55 Paper
BLOOM 2022/07 176 Paper
GLM 2022/08 130 Paper
Flan-T5 2022/10 11 Paper
mT0 2022/11 13 Paper
Galatica 2022/11 120 Paper
BLOOMZ 2022/11 176 Paper
OPT-IML 2022/12 175 Paper
Pyt

Core symbols most depended-on inside this repo

load
called by 74
Experiments/InstructTuning/mmlu/modeling.py
write
called by 68
Experiments/ToolManipulation/HotPotQA/wrappers.py
tree_to_variable_index
called by 50
Experiments/ToolManipulation/Gorilla/eval/eval-scripts/codebleu/parser/utils.py
commaSep1
called by 18
Experiments/ToolManipulation/Gorilla/eval/eval-scripts/codebleu/parser/tree-sitter-python/grammar.js
close
called by 17
Experiments/ToolManipulation/HotPotQA/wrappers.py
normalize_answer
called by 8
Experiments/ToolManipulation/HotPotQA/wrappers.py
process_single_data
called by 6
Experiments/SymbolicReasoning/data_process.py
ngrams
called by 6
Experiments/ToolManipulation/Gorilla/eval/eval-scripts/codebleu/utils.py

Shape

Method 696
Function 392
Class 151
Route 1

Languages

Python100%
TypeScript1%

Modules by API surface

Experiments/ToolManipulation/Gorilla/eval/eval-scripts/codebleu/parser/tree-sitter-python/examples/python3.8_grammar.py155 symbols
Experiments/ToolManipulation/Gorilla/eval/eval-scripts/codebleu/parser/tree-sitter-python/examples/python2-grammar.py101 symbols
Experiments/ToolManipulation/Gorilla/eval/eval-scripts/codebleu/parser/tree-sitter-python/examples/python2-grammar-crlf.py101 symbols
Experiments/ToolManipulation/Gorilla/eval/eval-scripts/codebleu/parser/tree-sitter-python/examples/python3-grammar.py98 symbols
Experiments/ToolManipulation/Gorilla/eval/eval-scripts/codebleu/parser/tree-sitter-python/examples/python3-grammar-crlf.py98 symbols
Experiments/ToolManipulation/HotPotQA/wrappers.py32 symbols
Experiments/InstructTuning/mmlu/modeling.py29 symbols
Experiments/SymbolicReasoning/data_process.py26 symbols
Experiments/MathematicalReasoning/data_process.py26 symbols
Experiments/LanguageGeneration/HumanEval/model.py22 symbols
Experiments/ToolManipulation/Gorilla/eval/eval-scripts/codebleu/weighted_ngram_match.py16 symbols
Experiments/ToolManipulation/Gorilla/eval/eval-scripts/codebleu/bleu.py15 symbols

Dependencies from manifests, versioned

nan2.15.0 · 1×
tree-sitter-cli0.20.1 · 1×
Flask2.3.2 · 1×
Flask-Slack0.1.5 · 1×
GitPython3.1.31 · 1×
Levenshtein0.21.0 · 1×
Pillow9.4.0 · 1×
PyYAML6.0 · 1×
Werkzeug2.3.4 · 1×
accelerate0.17.1 · 1×
aiohttp3.8.4 · 1×
aiosignal1.3.1 · 1×

For agents

$ claude mcp add LLMSurvey \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact