MCPcopy Index your code
hub / github.com/rlcode/reinforcement-learning / compute_gae

Function compute_gae

3-atari/2-ppo.py:60–70  ·  view source on GitHub ↗
(rewards, values, dones, last_value)

Source from the content-addressed store, hash-verified

58
59
60def compute_gae(rewards, values, dones, last_value):
61 advantages = np.zeros_like(rewards, dtype=np.float32)
62 gae = 0.0
63 for t in reversed(range(len(rewards))):
64 next_v = last_value if t == len(rewards) - 1 else values[t + 1]
65 next_nonterminal = 1.0 - dones[t]
66 delta = rewards[t] + GAMMA * next_v * next_nonterminal - values[t]
67 gae = delta + GAMMA * GAE_LAMBDA * next_nonterminal * gae
68 advantages[t] = gae
69 returns = advantages + values
70 return advantages, returns
71
72
73if __name__ == "__main__":

Callers 1

2-ppo.pyFile · 0.70

Calls

no outgoing calls

Tested by

no test coverage detected