Jailbreak Attacks

Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models

(USENIX Security 2024) This is a systematic study on jailbreak attacks against commercial large language model (LLM) systems. We analyzed existing jailbreak prompts, examined their contributing factors, and conducted user studies to explore human behavioral patterns during jailbreak attempts.

Zhiyuan Yu, Xiaogeng Liu, Shunning Liang, Zach Cameron, Chaowei Xiao, Ning Zhang

Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models