A variant of SGD that scales the step size by running average of the recent step norms. Parameters ---------- lr : Theano SharedVariable Initial learning rate tpramas: Theano SharedVariable Model parameters grads: Theano variable Gradients of co
(lr, tparams, grads, x, mask, y, cost)
| 300 | |
| 301 | |
| 302 | def rmsprop(lr, tparams, grads, x, mask, y, cost): |
| 303 | """ |
| 304 | A variant of SGD that scales the step size by running average of the |
| 305 | recent step norms. |
| 306 | |
| 307 | Parameters |
| 308 | ---------- |
| 309 | lr : Theano SharedVariable |
| 310 | Initial learning rate |
| 311 | tpramas: Theano SharedVariable |
| 312 | Model parameters |
| 313 | grads: Theano variable |
| 314 | Gradients of cost w.r.t to parameres |
| 315 | x: Theano variable |
| 316 | Model inputs |
| 317 | mask: Theano variable |
| 318 | Sequence mask |
| 319 | y: Theano variable |
| 320 | Targets |
| 321 | cost: Theano variable |
| 322 | Objective fucntion to minimize |
| 323 | |
| 324 | Notes |
| 325 | ----- |
| 326 | For more information, see [Hint2014]_. |
| 327 | |
| 328 | .. [Hint2014] Geoff Hinton, *Neural Networks for Machine Learning*, |
| 329 | lecture 6a, |
| 330 | http://cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf |
| 331 | """ |
| 332 | |
| 333 | zipped_grads = [theano.shared(p.get_value() * numpy_floatX(0.), |
| 334 | name='%s_grad' % k) |
| 335 | for k, p in tparams.items()] |
| 336 | running_grads = [theano.shared(p.get_value() * numpy_floatX(0.), |
| 337 | name='%s_rgrad' % k) |
| 338 | for k, p in tparams.items()] |
| 339 | running_grads2 = [theano.shared(p.get_value() * numpy_floatX(0.), |
| 340 | name='%s_rgrad2' % k) |
| 341 | for k, p in tparams.items()] |
| 342 | |
| 343 | zgup = [(zg, g) for zg, g in zip(zipped_grads, grads)] |
| 344 | rgup = [(rg, 0.95 * rg + 0.05 * g) for rg, g in zip(running_grads, grads)] |
| 345 | rg2up = [(rg2, 0.95 * rg2 + 0.05 * (g ** 2)) |
| 346 | for rg2, g in zip(running_grads2, grads)] |
| 347 | |
| 348 | f_grad_shared = theano.function([x, mask, y], cost, |
| 349 | updates=zgup + rgup + rg2up, |
| 350 | name='rmsprop_f_grad_shared') |
| 351 | |
| 352 | updir = [theano.shared(p.get_value() * numpy_floatX(0.), |
| 353 | name='%s_updir' % k) |
| 354 | for k, p in tparams.items()] |
| 355 | updir_new = [(ud, 0.9 * ud - 1e-4 * zg / tensor.sqrt(rg2 - rg ** 2 + 1e-4)) |
| 356 | for ud, zg, rg, rg2 in zip(updir, zipped_grads, running_grads, |
| 357 | running_grads2)] |
| 358 | param_up = [(p, p + udn[1]) |
| 359 | for p, udn in zip(tparams.values(), updir_new)] |
nothing calls this directly
no test coverage detected