hub / github.com/ddbourgin/numpy-ml / __init__

Method init

numpy_ml/rl_models/agents.py:102–162 · view source on GitHub ↗

r""" A cross-entropy method agent. Notes ----- The cross-entropy method [1]_ [2]_ agent only operates on ``envs`` with discrete action spaces. On each episode the agent generates `n_theta_samples` of the parameters (:math:`\theta`) for its be

(self, env, n_samples_per_episode=500, retain_prcnt=0.2)

Source from the content-addressed store, hash-verified

100
101	class CrossEntropyAgent(AgentBase):
102	def __init__(self, env, n_samples_per_episode=500, retain_prcnt=0.2):
103	r"""
104	A cross-entropy method agent.
105
106	Notes
107	-----
108	The cross-entropy method [1]_ [2]_ agent only operates on ``envs`` with
109	discrete action spaces.
110
111	On each episode the agent generates `n_theta_samples` of the parameters
112	(:math:`\theta`) for its behavior policy. The `i`'th sample at
113	timestep `t` is:
114
115	.. math::
116
117	\theta_i &= \{\mathbf{W}_i^{(t)}, \mathbf{b}_i^{(t)} \} \\
118	\theta_i &\sim \mathcal{N}(\mu^{(t)}, \Sigma^{(t)})
119
120	Weights (:math:`\mathbf{W}_i`) and bias (:math:`\mathbf{b}_i`) are the
121	parameters of the softmax policy:
122
123	.. math::
124
125	\mathbf{z}_i &= \text{obs} \cdot \mathbf{W}_i + \mathbf{b}_i \\
126	p(a_i^{(t + 1)}) &= \frac{e^{\mathbf{z}_i}}{\sum_j e^{z_{ij}}} \\
127	a^{(t + 1)} &= \arg \max_j p(a_j^{(t+1)})
128
129	At the end of each episode, the agent takes the top `retain_prcnt`
130	highest scoring :math:`\theta` samples and combines them to generate
131	the mean and variance of the distribution of :math:`\theta` for the
132	next episode:
133
134	.. math::
135
136	\mu^{(t+1)} &= \text{avg}(\texttt{best_thetas}^{(t)}) \\
137	\Sigma^{(t+1)} &= \text{var}(\texttt{best_thetas}^{(t)})
138
139	References
140	----------
141	.. [1] Mannor, S., Rubinstein, R., & Gat, Y. (2003). The cross entropy
142	method for fast policy search. In *Proceedings of the 20th Annual
143	ICML, 20*.
144	.. [2] Rubinstein, R. (1997). optimization of computer simulation
145	models with rare events, *European Journal of Operational Research,
146	99*, 89–112.
147
148	Parameters
149	----------
150	env : :meth:`gym.wrappers` or :meth:`gym.envs` instance
151	The environment to run the agent on.
152	n_samples_per_episode : int
153	The number of theta samples to evaluate on each episode. Default is 500.
154	retain_prcnt: float
155	The percentage of `n_samples_per_episode` to use when calculating
156	the parameter update at the end of the episode. Default is 0.2.
157	"""
158	super().__init__(env)
159

Callers

nothing calls this directly

Calls 2

_init_paramsMethod · 0.95

__init__Method · 0.45

Tested by

no test coverage detected

Method __init__

Source from the content-addressed store, hash-verified

Callers

Calls 2

Tested by

Method init