MCPcopy
hub / github.com/InternLM/InternLM / initialize_model

Function initialize_model

internlm/train/training_internlm.py:46–83  ·  view source on GitHub ↗

Initialize model. Returns: The neural network model to be trained or evaluated.

()

Source from the content-addressed store, hash-verified

44
45
46def initialize_model():
47 """
48 Initialize model.
49
50 Returns: The neural network model to be trained or evaluated.
51 """
52
53 model = MODEL_INITIALIZER.get_module(module_name=gpc.config.model_type)(**(gpc.config.model))
54 if isinstance(model, nn.ModuleList):
55 model = nn.ModuleList(
56 [
57 NaiveAMPModel(
58 model=_m,
59 output_to_fp32=False, # manually controlled by interleaved pipleline scheduler
60 dtype=gpc.config.model.get("dtype", torch.half),
61 sync_buffer=False,
62 )
63 for _m in model
64 ]
65 )
66 else:
67 model = NaiveAMPModel(
68 model=model,
69 output_to_fp32=is_no_pp_or_last_stage(),
70 dtype=gpc.config.model.get("dtype", torch.half),
71 sync_buffer=False,
72 )
73
74 # This sync is very important, cause the model weights kept in optimizer are copied
75 # from the origin parameters in the memory, so we should make sure the dp sync
76 # does not influence the model weights in optimizer be different with the origin parameters.
77 sync_model_param(model, parallel_mode=ParallelMode.DATA)
78
79 # This function is needed to make sure parameters that are not splitted by tensor parallelism are
80 # the same across tensor parallelism.
81 sync_model_param_within_tp(model)
82
83 return model
84
85
86def initialize_optimizer(model: Union[nn.Module, nn.ModuleList]):

Callers 1

mainFunction · 0.90

Calls 5

NaiveAMPModelClass · 0.90
is_no_pp_or_last_stageFunction · 0.90
sync_model_paramFunction · 0.90
get_moduleMethod · 0.80

Tested by

no test coverage detected