ChatGPT voice mode is a weaker model
Simon Willison reveals a counterintuitive fact: ChatGPT's voice mode runs on an older, weaker GPT-4o-era model, creating a massive gap between user expectations and reality.
Simon Willison · 2026-04-10T15:56:02+00:00
Reward Hacking in Reinforcement Learning
Reward hacking presents challenges in reinforcement learning due to flaws in reward functions, particularly impacting language models, necessitating further research and mitigation strategies.
Lilian Weng · Thu, 28 Nov 2024 00:00:00 +0000
The Transformer Family Version 2.0
Lilian Weng's new article deeply explores the evolution and new features of Transformers, revealing their ongoing impact in natural language processing.
Lilian Weng · Fri, 27 Jan 2023 00:00:00 +0000