MCPcopy
hub / github.com/kimiyoung/transformer-xl

github.com/kimiyoung/transformer-xl @main sqlite

repository ↗ · DeepWiki ↗
361 symbols 885 edges 19 files 105 documented · 29%
README

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

This repository contains the code in both PyTorch and TensorFlow for our paper

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov (*: equal contribution)

Preprint 2018

TensorFlow

  • The source code is in the tf/ folder, supporting (1) single-node multi-gpu training, and (2) multi-host TPU training.
  • Besides the source code, we also provide pretrained "TensorFlow" models with state-of-the-art (SoTA) performances reported in the paper.
  • Please refer to tf/README.md for details.

PyTorch

  • The source code is in the pytorch/ folder, supporting single-node multi-gpu training via the module nn.DataParallel.
  • Please refer to pytorch/README.md for details.

Results

Transformer-XL achieves new state-of-the-art results on multiple language modeling benchmarks. Transformer-XL is also the first to break through the 1.0 barrier on char-level language modeling. Below is a summary.

Method enwiki8 text8 One Billion Word WT-103 PTB (w/o finetuning)
Previous Best 1.06 1.13 23.7 20.5 55.5
Transformer-XL 0.99 1.08 21.8 18.3 54.5

Acknowledgement

A large portion of the getdata.sh script comes from the awd-lstm repo. Happy Language Modeling :)

Core symbols most depended-on inside this repo

join
called by 59
tf/tpu_estimator.py
get
called by 25
tf/tpu_estimator.py
logging
called by 22
pytorch/utils/exp_utils.py
capture
called by 12
tf/tpu_estimator.py
_create_or_get_iterations_per_loop
called by 10
tf/tpu_estimator.py
encode_file
called by 10
tf/vocabulary.py
encode_file
called by 9
pytorch/utils/vocabulary.py
features_and_labels
called by 7
tf/tpu_estimator.py

Shape

Method 216
Function 100
Class 45

Languages

Python100%

Modules by API surface

tf/tpu_estimator.py178 symbols
pytorch/mem_transformer.py42 symbols
tf/data_utils.py19 symbols
pytorch/data_utils.py19 symbols
tf/vocabulary.py18 symbols
pytorch/utils/vocabulary.py18 symbols
tf/model.py14 symbols
pytorch/utils/data_parallel.py8 symbols
pytorch/train.py8 symbols
tf/train_gpu.py6 symbols
tf/train.py6 symbols
tf/gpu_utils.py6 symbols

For agents

$ claude mcp add transformer-xl \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact