MCPcopy
hub / github.com/tensorlayer/TensorLayer / train_critic

Method train_critic

examples/reinforcement_learning/tutorial_PPO.py:141–153  ·  view source on GitHub ↗

Update actor network :param reward: cumulative reward batch :param state: state batch :return: None

(self, reward, state)

Source from the content-addressed store, hash-verified

139 return kl_mean
140
141 def train_critic(self, reward, state):
142 """
143 Update actor network
144 :param reward: cumulative reward batch
145 :param state: state batch
146 :return: None
147 """
148 reward = np.array(reward, dtype=np.float32)
149 with tf.GradientTape() as tape:
150 advantage = reward - self.critic(state)
151 loss = tf.reduce_mean(tf.square(advantage))
152 grad = tape.gradient(loss, self.critic.trainable_weights)
153 self.critic_opt.apply_gradients(zip(grad, self.critic.trainable_weights))
154
155 def update(self):
156 """

Callers 1

updateMethod · 0.95

Calls 1

gradientMethod · 0.80

Tested by

no test coverage detected