Method get_action

examples/reinforcement_learning/tutorial_C51.py:214–222 · view source on GitHub ↗

(self, obv)

Source from the content-addressed store, hash-verified

212	self.optimizer = tf.optimizers.Adam(learning_rate=lr)
213
214	def get_action(self, obv):
215	eps = epsilon(self.niter)
216	if args.train and random.random() < eps:
217	return int(random.random() * out_dim)
218	else:
219	obv = np.expand_dims(obv, 0).astype('float32') * ob_scale
220	qdist = np.exp(self._qvalues_func(obv).numpy())
221	qvalues = (qdist * vrange).sum(-1)
222	return qvalues.argmax(1)[0]
223
224	@tf.function
225	def _qvalues_func(self, obv):

tutorial_C51.pyFile · 0.45

_qvalues_funcMethod · 0.95

sumMethod · 0.80

no test coverage detected