MCPcopy
hub / github.com/dmlc/dgl / SparseAdam

Class SparseAdam

python/dgl/optim/pytorch/sparse_optim.py:647–959  ·  view source on GitHub ↗

r"""Node embedding optimizer using the Adam algorithm. This optimizer implements a sparse version of Adagrad algorithm for optimizing :class:`dgl.nn.NodeEmbedding`. Being sparse means it only updates the embeddings whose gradients have updates, which are usually a very small portion

Source from the content-addressed store, hash-verified

645
646
647class SparseAdam(SparseGradOptimizer):
648 r"""Node embedding optimizer using the Adam algorithm.
649
650 This optimizer implements a sparse version of Adagrad algorithm for
651 optimizing :class:`dgl.nn.NodeEmbedding`. Being sparse means it only
652 updates the embeddings whose gradients have updates, which are usually
653 a very small portion of the total embeddings.
654
655 Adam maintains a :math:`Gm_{t,i,j}` and `Gp_{t,i,j}` for every parameter
656 in the embeddings, where
657 :math:`Gm_{t,i,j}=beta1 * Gm_{t-1,i,j} + (1-beta1) * g_{t,i,j}`,
658 :math:`Gp_{t,i,j}=beta2 * Gp_{t-1,i,j} + (1-beta2) * g_{t,i,j}^2`,
659 :math:`g_{t,i,j} = lr * Gm_{t,i,j} / (1 - beta1^t) / \sqrt{Gp_{t,i,j} / (1 - beta2^t)}` and
660 :math:`g_{t,i,j}` is the gradient of the dimension :math:`j` of embedding :math:`i`
661 at step :math:`t`.
662
663 NOTE: The support of sparse Adam optimizer is experimental.
664
665 Parameters
666 ----------
667 params : list[dgl.nn.NodeEmbedding]
668 The list of dgl.nn.NodeEmbeddings.
669 lr : float
670 The learning rate.
671 betas : tuple[float, float], Optional
672 Coefficients used for computing running averages of gradient and its square.
673 Default: (0.9, 0.999)
674 eps : float, Optional
675 The term added to the denominator to improve numerical stability
676 Default: 1e-8
677 use_uva : bool, Optional
678 Whether to use pinned memory for storing 'mem' and 'power' parameters,
679 when the embedding is stored on the CPU. This will improve training
680 speed, but will require locking a large number of virtual memory pages.
681 For embeddings which are stored in GPU memory, this setting will have
682 no effect.
683 Default: True if the gradients are generated on the GPU, and False
684 if the gradients are on the CPU.
685 dtype : torch.dtype, Optional
686 The type to store optimizer state with. Default: th.float32.
687
688 Examples
689 --------
690 >>> def initializer(emb):
691 th.nn.init.xavier_uniform_(emb)
692 return emb
693 >>> emb = dgl.nn.NodeEmbedding(g.num_nodes(), 10, 'emb', init_func=initializer)
694 >>> optimizer = dgl.optim.SparseAdam([emb], lr=0.001)
695 >>> for blocks in dataloader:
696 ... ...
697 ... feats = emb(nids, gpu_0)
698 ... loss = F.sum(feats + 1, 0)
699 ... loss.backward()
700 ... optimizer.step()
701 """
702
703 def __init__(
704 self,

Callers 9

run_clientFunction · 0.90
test_sparse_adamFunction · 0.90
test_sparse_adam_uvaFunction · 0.90
test_sparse_adam_dtypeFunction · 0.90
start_sparse_adam_workerFunction · 0.90
test_DeepWalkFunction · 0.90

Calls

no outgoing calls

Tested by 9

run_clientFunction · 0.72
test_sparse_adamFunction · 0.72
test_sparse_adam_uvaFunction · 0.72
test_sparse_adam_dtypeFunction · 0.72
start_sparse_adam_workerFunction · 0.72
test_DeepWalkFunction · 0.72