MCPcopy
hub / github.com/zai-org/CogView / prepare_tokenizer

Function prepare_tokenizer

generate_samples.py:274–292  ·  view source on GitHub ↗
(args)

Source from the content-addressed store, hash-verified

272
273
274def prepare_tokenizer(args):
275
276 tokenizer = get_tokenizer(args)
277
278 num_tokens = tokenizer.num_tokens
279 before = num_tokens
280 after = before
281 multiple = args.make_vocab_size_divisible_by * \
282 mpu.get_model_parallel_world_size()
283 while (after % multiple) != 0:
284 after += 1
285 print_rank_0('> padded vocab (size: {}) with {} dummy '
286 'tokens (new size: {})'.format(
287 before, after - before, after))
288
289 args.vocab_size = after
290 print("prepare tokenizer done", flush=True)
291
292 return tokenizer
293
294
295def main():

Callers 1

mainFunction · 0.85

Calls 2

get_tokenizerFunction · 0.90
print_rank_0Function · 0.90

Tested by

no test coverage detected