MCPcopy Index your code
hub / github.com/ddbourgin/numpy-ml / __init__

Method __init__

numpy_ml/rl_models/agents.py:102–162  ·  view source on GitHub ↗

r""" A cross-entropy method agent. Notes ----- The cross-entropy method [1]_ [2]_ agent only operates on ``envs`` with discrete action spaces. On each episode the agent generates `n_theta_samples` of the parameters (:math:`\theta`) for its be

(self, env, n_samples_per_episode=500, retain_prcnt=0.2)

Source from the content-addressed store, hash-verified

100
101class CrossEntropyAgent(AgentBase):
102 def __init__(self, env, n_samples_per_episode=500, retain_prcnt=0.2):
103 r"""
104 A cross-entropy method agent.
105
106 Notes
107 -----
108 The cross-entropy method [1]_ [2]_ agent only operates on ``envs`` with
109 discrete action spaces.
110
111 On each episode the agent generates `n_theta_samples` of the parameters
112 (:math:`\theta`) for its behavior policy. The `i`'th sample at
113 timestep `t` is:
114
115 .. math::
116
117 \theta_i &= \{\mathbf{W}_i^{(t)}, \mathbf{b}_i^{(t)} \} \\
118 \theta_i &\sim \mathcal{N}(\mu^{(t)}, \Sigma^{(t)})
119
120 Weights (:math:`\mathbf{W}_i`) and bias (:math:`\mathbf{b}_i`) are the
121 parameters of the softmax policy:
122
123 .. math::
124
125 \mathbf{z}_i &= \text{obs} \cdot \mathbf{W}_i + \mathbf{b}_i \\
126 p(a_i^{(t + 1)}) &= \frac{e^{\mathbf{z}_i}}{\sum_j e^{z_{ij}}} \\
127 a^{(t + 1)} &= \arg \max_j p(a_j^{(t+1)})
128
129 At the end of each episode, the agent takes the top `retain_prcnt`
130 highest scoring :math:`\theta` samples and combines them to generate
131 the mean and variance of the distribution of :math:`\theta` for the
132 next episode:
133
134 .. math::
135
136 \mu^{(t+1)} &= \text{avg}(\texttt{best_thetas}^{(t)}) \\
137 \Sigma^{(t+1)} &= \text{var}(\texttt{best_thetas}^{(t)})
138
139 References
140 ----------
141 .. [1] Mannor, S., Rubinstein, R., & Gat, Y. (2003). The cross entropy
142 method for fast policy search. In *Proceedings of the 20th Annual
143 ICML, 20*.
144 .. [2] Rubinstein, R. (1997). optimization of computer simulation
145 models with rare events, *European Journal of Operational Research,
146 99*, 89–112.
147
148 Parameters
149 ----------
150 env : :meth:`gym.wrappers` or :meth:`gym.envs` instance
151 The environment to run the agent on.
152 n_samples_per_episode : int
153 The number of theta samples to evaluate on each episode. Default is 500.
154 retain_prcnt: float
155 The percentage of `n_samples_per_episode` to use when calculating
156 the parameter update at the end of the episode. Default is 0.2.
157 """
158 super().__init__(env)
159

Callers

nothing calls this directly

Calls 2

_init_paramsMethod · 0.95
__init__Method · 0.45

Tested by

no test coverage detected