MCPcopy Index your code
hub / github.com/deepspeedai/DeepSpeedExamples / get_model

Function get_model

Megatron-LM/evaluate_gpt2.py:46–75  ·  view source on GitHub ↗

Build the model.

(args)

Source from the content-addressed store, hash-verified

44from detokenizer import *
45
46def get_model(args):
47 """Build the model."""
48
49 print_rank_0('building GPT2 model ...')
50 model = GPT2Model(num_layers=args.num_layers,
51 vocab_size=args.vocab_size,
52 hidden_size=args.hidden_size,
53 num_attention_heads=args.num_attention_heads,
54 embedding_dropout_prob=args.hidden_dropout,
55 attention_dropout_prob=args.attention_dropout,
56 output_dropout_prob=args.hidden_dropout,
57 max_sequence_length=args.max_position_embeddings,
58 checkpoint_activations=args.checkpoint_activations,
59 checkpoint_num_layers=args.checkpoint_num_layers,
60 parallel_output=not args.cloze_eval)
61
62 print_rank_0(' > number of parameters: {}'.format(
63 sum([p.nelement() for p in model.parameters()])))
64
65 # GPU allocation.
66 model.cuda(torch.cuda.current_device())
67
68 # Fp16 conversion.
69 if args.fp16:
70 model = FP16_Module(model)
71
72 # Wrap model for distributed training.
73 model = DDP(model)
74
75 return model
76
77
78def setup_model(args):

Callers 1

setup_modelFunction · 0.70

Calls 3

print_rank_0Function · 0.90
GPT2ModelClass · 0.90
FP16_ModuleClass · 0.90

Tested by

no test coverage detected