| 1 | """Live-progress GRPO-ascends-reward proof (prints every few iters). See verify_rl_optimizes.py.""" |
| 2 | import torch |
| 3 | from src.models.transformer import Transformer |
| 4 | from src.post_training.grpo import group_advantages, grpo_loss |
nothing calls this directly
no test coverage detected