hub / github.com/Turing-Project/WriteGPT / train

Method train

LanguageNetwork/BERT/models/trainer.py:105–168 · view source on GitHub ↗

The main training loops. by iterating over training data (i.e. `train_iter_fct`) and running validation (i.e. iterating over `valid_iter_fct` Args: train_iter_fct(function): a function that returns the train iterator. e.g. something like

(self, train_iter_fct, train_steps, valid_iter_fct=None, valid_steps=-1)

Source from the content-addressed store, hash-verified

103	self.model.train()
104
105	def train(self, train_iter_fct, train_steps, valid_iter_fct=None, valid_steps=-1):
106	"""
107	The main training loops.
108	by iterating over training data (i.e. `train_iter_fct`)
109	and running validation (i.e. iterating over `valid_iter_fct`
110
111	Args:
112	train_iter_fct(function): a function that returns the train
113	iterator. e.g. something like
114	train_iter_fct = lambda: generator(args, *kwargs)
115	valid_iter_fct(function): same as train_iter_fct, for valid data
116	train_steps(int):
117	valid_steps(int):
118	save_checkpoint_steps(int):
119
120	Return:
121	None
122	"""
123	logger.info('Start training...')
124
125	# step = self.optim._step + 1
126	step = self.optimizer._step + 1
127	true_batchs = []
128	accum = 0
129	normalization = 0
130	train_iter = train_iter_fct()
131
132	total_stats = Statistics()
133	report_stats = Statistics()
134	self._start_report_manager(start_time=total_stats.start_time)
135
136	while step <= train_steps:
137
138	reduce_counter = 0
139	for i, batch in enumerate(train_iter):
140	# print(batch.src)
141	# print(len(batch))
142	if self.n_gpu == 0 or (i % self.n_gpu == self.gpu_rank):
143
144	true_batchs.append(batch)
145	normalization += batch.batch_size
146	accum += 1
147	if accum == self.grad_accum_count:
148	reduce_counter += 1
149	if self.n_gpu > 1:
150	normalization = sum(distributed.all_gather_list(normalization))
151
152	self._gradient_accumulation(true_batchs, normalization, total_stats, report_stats)
153
154	report_stats = self._maybe_report_training(step, train_steps, self.optimizer.learning_rate,
155	report_stats)
156	true_batchs = []
157	accum = 0
158	normalization = 0
159	if step % self.save_checkpoint_steps == 0 and self.gpu_rank == 0:
160	self._save(step)
161
162	step += 1

Callers 1

__init__Method · 0.45

Calls 5

_start_report_managerMethod · 0.95

_gradient_accumulationMethod · 0.95

_maybe_report_trainingMethod · 0.95

_saveMethod · 0.95

StatisticsClass · 0.90

Tested by

no test coverage detected