Adversarial Attacks on LLMs
This article explores adversarial attacks on large language models (LLMs), including types of attacks, threat models, and their impact on the safety of generated text, revealing significant challenges in AI safety.
Lilian Weng · Wed, 25 Oct 2023 00:00:00 +0000