hub / github.com/google-deepmind/acme / run_episode

Method run_episode

acme/environment_loop.py:76–142 · view source on GitHub ↗

Run one episode. Each episode is a loop which interacts first with the environment to get an observation and then give that observation to the agent in order to retrieve an action. Returns: An instance of `loggers.LoggingData`.

(self)

Source from the content-addressed store, hash-verified

74	self._observers = observers
75
76	def run_episode(self) -> loggers.LoggingData:
77	"""Run one episode.
78
79	Each episode is a loop which interacts first with the environment to get an
80	observation and then give that observation to the agent in order to retrieve
81	an action.
82
83	Returns:
84	An instance of `loggers.LoggingData`.
85	"""
86	# Reset any counts and start the environment.
87	start_time = time.time()
88	episode_steps = 0
89
90	# For evaluation, this keeps track of the total undiscounted reward
91	# accumulated during the episode.
92	episode_return = tree.map_structure(_generate_zeros_from_spec,
93	self._environment.reward_spec())
94	timestep = self._environment.reset()
95	# Make the first observation.
96	self._actor.observe_first(timestep)
97	for observer in self._observers:
98	# Initialize the observer with the current state of the env after reset
99	# and the initial timestep.
100	observer.observe_first(self._environment, timestep)
101
102	# Run an episode.
103	while not timestep.last():
104	# Generate an action from the agent's policy and step the environment.
105	action = self._actor.select_action(timestep.observation)
106	timestep = self._environment.step(action)
107
108	# Have the agent observe the timestep and let the actor update itself.
109	self._actor.observe(action, next_timestep=timestep)
110	for observer in self._observers:
111	# One environment step was completed. Observe the current state of the
112	# environment, the current timestep and the action.
113	observer.observe(self._environment, timestep, action)
114	if self._should_update:
115	self._actor.update()
116
117	# Book-keeping.
118	episode_steps += 1
119
120	# Equivalent to: episode_return += timestep.reward
121	# We capture the return value because if timestep.reward is a JAX
122	# DeviceArray, episode_return will not be mutated in-place. (In all other
123	# cases, the returned episode_return will be the same object as the
124	# argument episode_return.)
125	episode_return = tree.map_structure(operator.iadd,
126	episode_return,
127	timestep.reward)
128
129	# Record counts.
130	counts = self._counter.increment(episodes=1, steps=episode_steps)
131
132	# Collect the results and combine with counts.
133	steps_per_second = episode_steps / (time.time() - start_time)

Callers 2

runMethod · 0.95

test_one_episodeMethod · 0.45

Calls 9

incrementMethod · 0.80

reward_specMethod · 0.45

resetMethod · 0.45

observe_firstMethod · 0.45

select_actionMethod · 0.45

stepMethod · 0.45

observeMethod · 0.45

updateMethod · 0.45

get_metricsMethod · 0.45

Tested by 1

test_one_episodeMethod · 0.36