MCPcopy
hub / github.com/PRIME-RL/PRIME / offload_fsdp_grad

Function offload_fsdp_grad

training/verl/utils/fsdp_utils.py:71–75  ·  view source on GitHub ↗
(module)

Source from the content-addressed store, hash-verified

69
70
71def offload_fsdp_grad(module):
72 for _, param in module.named_parameters():
73 if param.grad is not None:
74 param.grad = param.grad.to("cpu", non_blocking=True)
75 torch.cuda.empty_cache()
76
77
78def load_fsdp_grad(module, device_id):

Callers 1

init_modelMethod · 0.90

Calls 2

named_parametersMethod · 0.80
toMethod · 0.80

Tested by

no test coverage detected