MCPcopy
hub / github.com/jiesutd/LatticeLSTM

github.com/jiesutd/LatticeLSTM @main sqlite

repository ↗ · DeepWiki ↗
118 symbols 310 edges 15 files 24 documented · 20%
README

Chinese NER Using Lattice LSTM

Lattice LSTM for Chinese NER. Character based LSTM with Lattice embeddings as input.

Models and results can be found at our ACL 2018 paper Chinese NER Using Lattice LSTM. It achieves 93.18% F1-value on MSRA dataset, which is the state-of-the-art result on Chinese NER task.

Details will be updated soon.

Requirement:

Python: 2.7   
PyTorch: 0.3.0

(for PyTorch 0.3.1, please refer issue#8 for a slight modification.)

Input format:

CoNLL format (prefer BIOES tag scheme), with each character its label for one line. Sentences are splited with a null line.

美   B-LOC
国   E-LOC
的   O
华   B-PER
莱   I-PER
士   E-PER

我   O
跟   O
他   O
谈   O
笑   O
风   O
生   O

Pretrained Embeddings:

The pretrained character and word embeddings are the same with the embeddings in the baseline of RichWordSegmentor

Character embeddings (gigaword_chn.all.a2b.uni.ite50.vec): Google Drive or Baidu Pan

Word(Lattice) embeddings (ctb.50d.vec): Google Drive or Baidu Pan

How to run the code?

  1. Download the character embeddings and word embeddings and put them in the data folder.
  2. Modify the run_main.py or run_demo.py by adding your train/dev/test file directory.
  3. sh run_main.py or sh run_demo.py

Resume NER data

Crawled from the Sina Finance, it includes the resumes of senior executives from listed companies in the Chinese stock market. Details can be found in our paper.

Cite:

Please cite our ACL 2018 paper:

@article{zhang2018chinese,  
 title={Chinese NER Using Lattice LSTM},  
 author={Yue Zhang and Jie Yang},  
 booktitle={Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL)},
 year={2018}  
}

Core symbols most depended-on inside this repo

size
called by 54
utils/alphabet.py
get_index
called by 17
utils/alphabet.py
add
called by 8
utils/alphabet.py
close
called by 6
utils/alphabet.py
normalize_word
called by 6
utils/functions.py
generate_instance_with_gaz
called by 6
utils/data.py
read_seg_instance
called by 4
utils/functions.py
read_instance_with_gaz
called by 4
utils/functions.py

Shape

Method 76
Function 29
Class 13

Languages

Python100%

Modules by API surface

model/latticelstm.py16 symbols
utils/alphabet.py15 symbols
utils/data.py14 symbols
main.py10 symbols
utils/trie.py8 symbols
utils/metric.py8 symbols
utils/functions.py8 symbols
model/crf.py8 symbols
utils/gazetteer.py7 symbols
model/bilstm.py7 symbols
model/charcnn.py6 symbols
model/charbilstm.py6 symbols

For agents

$ claude mcp add LatticeLSTM \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact