MCPcopy
hub / github.com/NVIDIA/TensorRT-LLM / load_tokenizer

Function load_tokenizer

examples/utils.py:199–212  ·  view source on GitHub ↗
(tokenizer_dir: Optional[str] = None,
                   vocab_file: Optional[str] = None,
                   model_name: str = 'GPTForCausalLM',
                   model_version: Optional[str] = None,
                   tokenizer_type: Optional[str] = None)

Source from the content-addressed store, hash-verified

197
198
199def load_tokenizer(tokenizer_dir: Optional[str] = None,
200 vocab_file: Optional[str] = None,
201 model_name: str = 'GPTForCausalLM',
202 model_version: Optional[str] = None,
203 tokenizer_type: Optional[str] = None):
204 func = partial(_load_tokenizer, tokenizer_dir, vocab_file, model_name,
205 model_version, tokenizer_type)
206 if mpi_world_size() > 1:
207 # Under MPI env, load tokenizer will result in multiple processes to download the same file to the same folder.
208 # This will result some random bug. Force loading on rank0 to warmup the tokenizer to avoid this issue.
209 if mpi_rank() == 0:
210 func()
211 mpi_barrier()
212 return func()
213
214
215def prepare_enc_dec_inputs(batch_input_ids: List[torch.Tensor], model_name: str,

Callers 4

mainFunction · 0.90
mainFunction · 0.90
mainFunction · 0.90
mainFunction · 0.90

Calls 4

mpi_world_sizeFunction · 0.90
mpi_rankFunction · 0.90
mpi_barrierFunction · 0.90
funcFunction · 0.50

Tested by

no test coverage detected