Reward Hacking in Reinforcement Learning
Reward hacking presents challenges in reinforcement learning due to flaws in reward functions, particularly impacting language models, necessitating further research and mitigation strategies.
Lilian Weng · Thu, 28 Nov 2024 00:00:00 +0000