MCPcopy
hub / github.com/FareedKhan-dev/train-llm-from-scratch / whiten

Function whiten

src/post_training/ppo.py:60–65  ·  view source on GitHub ↗

Normalize advantages to zero mean / unit std over masked (response) positions.

(advantages: torch.Tensor, mask: torch.Tensor)

Source from the content-addressed store, hash-verified

58
59
60def whiten(advantages: torch.Tensor, mask: torch.Tensor) -> torch.Tensor:
61 """Normalize advantages to zero mean / unit std over masked (response) positions."""
62 m = mask.float()
63 mean = masked_mean(advantages, m)
64 var = masked_mean((advantages - mean) ** 2, m)
65 return ((advantages - mean) / (var.sqrt() + 1e-8)) * m
66
67
68def ppo_policy_loss(

Callers 3

mainFunction · 0.90
test_whitenFunction · 0.90
verify_ppo_optimizesFunction · 0.90

Calls 1

masked_meanFunction · 0.90

Tested by 1

test_whitenFunction · 0.72