MCPcopy Index your code
hub / github.com/ddbourgin/numpy-ml / __init__

Method __init__

numpy_ml/rl_models/agents.py:392–420  ·  view source on GitHub ↗

A Monte-Carlo learning agent trained using either first-visit Monte Carlo updates (on-policy) or incremental weighted importance sampling (off-policy). Parameters ---------- env : :class:`gym.wrappers` or :class:`gym.envs` instance The en

(self, env, off_policy=False, temporal_discount=0.9, epsilon=0.1)

Source from the content-addressed store, hash-verified

390
391class MonteCarloAgent(AgentBase):
392 def __init__(self, env, off_policy=False, temporal_discount=0.9, epsilon=0.1):
393 """
394 A Monte-Carlo learning agent trained using either first-visit Monte
395 Carlo updates (on-policy) or incremental weighted importance sampling
396 (off-policy).
397
398 Parameters
399 ----------
400 env : :class:`gym.wrappers` or :class:`gym.envs` instance
401 The environment to run the agent on.
402 off_policy : bool
403 Whether to use a behavior policy separate from the target policy
404 during training. If False, use the same epsilon-soft policy for
405 both behavior and target policies. Default is False.
406 temporal_discount : float between [0, 1]
407 The discount factor used for downweighting future rewards. Smaller
408 values result in greater discounting of future rewards. Default is
409 0.9.
410 epsilon : float between [0, 1]
411 The epsilon value in the epsilon-soft policy. Larger values
412 encourage greater exploration during training. Default is 0.1.
413 """
414 super().__init__(env)
415
416 self.epsilon = epsilon
417 self.off_policy = off_policy
418 self.temporal_discount = temporal_discount
419
420 self._init_params()
421
422 def _init_params(self):
423 E = self.env_info

Callers

nothing calls this directly

Calls 2

_init_paramsMethod · 0.95
__init__Method · 0.45

Tested by

no test coverage detected