MCPcopy
hub / github.com/karpathy/nanochat / chat_rl.py

File chat_rl.py

scripts/chat_rl.py:None–None  ·  view source on GitHub ↗

Source from the content-addressed store, hash-verified

1"""
2Reinforcement learning on GSM8K via "GRPO".
3
4I put GRPO in quotes because we actually end up with something a lot

Callers

nothing calls this directly

Calls 15

autodetect_device_typeFunction · 0.90
compute_initFunction · 0.90
DummyWandbClass · 0.90
load_modelFunction · 0.90
EngineClass · 0.90
GSM8KClass · 0.90
print0Function · 0.90
get_base_dirFunction · 0.90
save_checkpointFunction · 0.90
compute_cleanupFunction · 0.90
get_batchFunction · 0.85
run_gsm8k_evalFunction · 0.85

Tested by

no test coverage detected