Hacker Neus
Does RL Incentivize Reasoning in LLMs Beyond the Base Model?
(limit-of-rlvr.github.io)
84 points
by leodriesch
7 days ago |
38 comments