File train_dpo.py

scripts/train_dpo.py:None–None · view source on GitHub ↗

Source from the content-addressed store, hash-verified

1	"""
2	Direct Preference Optimization (and ORPO / KTO variants) on preference pairs.
3
4	The policy is initialized from the SFT checkpoint; a frozen deep copy of it serves as the

nothing calls this directly

mainFunction · 0.70

no test coverage detected