A collection of papers and resources related to Large Language Models.
The organization of papers refers to our survey "A Survey of Large Language Models".
Please let us know if you find out a mistake or have any suggestions by e-mail: batmanfly@gmail.com
(we suggest ccing another email francis_kun_zhou@163.com meanwhile, in case of any unsuccessful delivery issue.)
If you find our survey useful for your research, please cite the following paper:
@article{LLMSurvey,
title={A Survey of Large Language Models},
author={Zhao, Wayne Xin and Zhou, Kun and Li, Junyi and Tang, Tianyi and Wang, Xiaolei and Hou, Yupeng and Min, Yingqian and Zhang, Beichen and Zhang, Junjie and Dong, Zican and Du, Yifan and Yang, Chen and Chen, Yushuo and Chen, Zhipeng and Jiang, Jinhao and Ren, Ruiyang and Li, Yifan and Tang, Xinyu and Liu, Zikang and Liu, Peiyu and Nie, Jian-Yun and Wen, Ji-Rong},
year={2023},
journal={arXiv preprint arXiv:2303.18223},
url={http://arxiv.org/abs/2303.18223}
}
The Chinese book focuses on providing explanations for beginners in the field of LLMs, aiming to present a comprehensive framework and roadmap for LLMs. This book is suitable for senior undergraduate students and junior graduate students with a foundation in deep learning and can serve as an introductory technical book. You can download the Chinese book at https://llmbook-zh.github.io/.
Here is our Chinese book sales page.

In our latest version, we add new content of the recent popular reasoning paradigm by allocating more time to thinking before responding to a problem. We focus on long CoT reasoning which is the mainstream approach taken by recent LLMs, such as DeepSeek-R1 and OpenAI's o-series models. We first discuss the reasoning patterns and advantages of the long CoT paradigm. Then we present the construction approaches of long CoT data, including data distillation, search-based data synthesis, and multi-agent collaboration. Moreover, we introduce the commonly-used two training methods: long CoT instruction tuning and scaling reinforcement learning training. Finally, we conduct a in-depth discussion about recent test-time scaling efforts for LLMs.

Here are the trends of the cumulative numbers of arXiv papers that contain the keyphrases “language model” (since June 2018) and “large language model” (since October 2019), respectively.

The statistics are calculated using exact match by querying the keyphrases in title or abstract by months. We set different x-axis ranges for the two keyphrases, because “language models” have been explored at an earlier time. We label the points corresponding to important landmarks in the research progress of LLMs. A sharp increase occurs after the release of ChatGPT: the average number of published arXiv papers that contain “large language model” in title or abstract goes from 0.40 per day to 8.58 per day.
A brief illustration for the technical evolution of GPT-series models. We plot this figure mainly based on the papers, blog articles and official APIs from OpenAI. Here, solid lines denote that there exists an explicit evidence (e.g., the official statement that a new model is developed based on a base model) on the evolution path between two models, while dashed lines denote a relatively weaker evolution relation.

An evolutionary graph of the research work conducted on LLaMA. Due to the huge number, we cannot include all the LLaMA variants in this figure, even much excellent work.

To support incremental update, we share the source file of this figure, and welcome the readers to include the desired models by submitting the pull requests on our GitHub page. If you're instrested, please request by application.
We collect some useful tips for designing prompts that are collected from online notes and experiences from our authors, where we also show the related ingredients and principles (introduced in Section 8.1).

Please click here to view more detailed information.
Welcome everyone to provide us with more relevant tips in the form of issues. After selection, we will regularly update them on GitHub and indicate the source.
We will explore the effect of different types of instructions in fine-tuning LLMs (i.e., 7B LLaMA26), as well as examine the usefulness of several instruction improvement strategies.

Please click here to view more detailed information.
We conduct a fine-grained evaluation on the abilities discussed in Section 7.1 and Section 7.2. For each kind of ability, we select representative tasks and datasets for conducting evaluation experiments to examine the corresponding performance of LLMs.

Please click here to view more detailed information.
We also call for support of computing power for conducting more comprehensive experiments.

| Category | model | Release Time | Size(B) | Link |
|---|---|---|---|---|
| Publicly Accessbile | T5 | 2019/10 | 11 | Paper |
| mT5 | 2021/03 | 13 | Paper | |
| PanGu-α | 2021/05 | 13 | Paper | |
| CPM-2 | 2021/05 | 198 | Paper | |
| T0 | 2021/10 | 11 | Paper | |
| GPT-NeoX-20B | 2022/02 | 20 | Paper | |
| CodeGen | 2022/03 | 16 | Paper | |
| Tk-Instruct | 2022/04 | 11 | Paper | |
| UL2 | 2022/02 | 20 | Paper | |
| OPT | 2022/05 | 175 | Paper | |
| YaLM | 2022/06 | 100 | GitHub | |
| NLLB | 2022/07 | 55 | Paper | |
| BLOOM | 2022/07 | 176 | Paper | |
| GLM | 2022/08 | 130 | Paper | |
| Flan-T5 | 2022/10 | 11 | Paper | |
| mT0 | 2022/11 | 13 | Paper | |
| Galatica | 2022/11 | 120 | Paper | |
| BLOOMZ | 2022/11 | 176 | Paper | |
| OPT-IML | 2022/12 | 175 | Paper | |
| Pyt |
$ claude mcp add LLMSurvey \
-- python -m otcore.mcp_server <graph>