File train_reward.py

scripts/train_reward.py:None–None · view source on GitHub ↗

Source from the content-addressed store, hash-verified

1	"""
2	Train the reward model on preference pairs with the Bradley-Terry loss.
3
4	Initializes the reward backbone from the SFT checkpoint, adds a scalar reward head, and

nothing calls this directly

mainFunction · 0.70

no test coverage detected