Tag: Reinforcement Learning (3 articles)

ChatGPT voice mode is a weaker model

Simon Willison reveals a counterintuitive fact: ChatGPT's voice mode runs on an older, weaker GPT-4o-era model, creating a massive gap between user expectations and reality.

Simon Willison · 2026-04-10T15:56:02+00:00

Reward Hacking in Reinforcement Learning

Reward hacking presents challenges in reinforcement learning due to flaws in reward functions, particularly impacting language models, necessitating further research and mitigation strategies.

Lilian Weng · Thu, 28 Nov 2024 00:00:00 +0000

The Transformer Family Version 2.0

Lilian Weng's new article deeply explores the evolution and new features of Transformers, revealing their ongoing impact in natural language processing.

Lilian Weng · Fri, 27 Jan 2023 00:00:00 +0000