Hacker Neus
Reinforcement Learning from Human Feedback
(rlhfbook.com)
94 points
by onurkanbkrc
9 hours ago |
5 comments
https://arxiv.org/abs/2504.12501