hub / github.com/ddbourgin/numpy-ml / __init__

Method init

numpy_ml/rl_models/agents.py:392–420 · view source on GitHub ↗

A Monte-Carlo learning agent trained using either first-visit Monte Carlo updates (on-policy) or incremental weighted importance sampling (off-policy). Parameters ---------- env : :class:`gym.wrappers` or :class:`gym.envs` instance The en

(self, env, off_policy=False, temporal_discount=0.9, epsilon=0.1)

Source from the content-addressed store, hash-verified

390
391	class MonteCarloAgent(AgentBase):
392	def __init__(self, env, off_policy=False, temporal_discount=0.9, epsilon=0.1):
393	"""
394	A Monte-Carlo learning agent trained using either first-visit Monte
395	Carlo updates (on-policy) or incremental weighted importance sampling
396	(off-policy).
397
398	Parameters
399	----------
400	env : :class:`gym.wrappers` or :class:`gym.envs` instance
401	The environment to run the agent on.
402	off_policy : bool
403	Whether to use a behavior policy separate from the target policy
404	during training. If False, use the same epsilon-soft policy for
405	both behavior and target policies. Default is False.
406	temporal_discount : float between [0, 1]
407	The discount factor used for downweighting future rewards. Smaller
408	values result in greater discounting of future rewards. Default is
409	0.9.
410	epsilon : float between [0, 1]
411	The epsilon value in the epsilon-soft policy. Larger values
412	encourage greater exploration during training. Default is 0.1.
413	"""
414	super().__init__(env)
415
416	self.epsilon = epsilon
417	self.off_policy = off_policy
418	self.temporal_discount = temporal_discount
419
420	self._init_params()
421
422	def _init_params(self):
423	E = self.env_info

Callers

nothing calls this directly

Calls 2

_init_paramsMethod · 0.95

__init__Method · 0.45

Tested by

no test coverage detected

Method __init__

Source from the content-addressed store, hash-verified

Callers

Calls 2

Tested by

Method init