The gay jailbreak: I ran the viral technique against my own production prompts and here's what I found
524 points on Hacker News. The thread blows up. The jailbreak technique everyone's talking about has a click-bait name, but the name isn't what interests me — it's what happens when you run it against prompts that live in production and affect real users.
I ran it. Not as an academic experiment. As an audit of what I actually have deployed.
My thesis, before I get into it: viral jailbreaks aren't researcher curiosities. They're thermometers. If a technique with 524 upvotes can break a guardrail, that guardrail was never real — it was alignment marketing.
LLM jailbreak technique 2025: what the thread found and why I care
The technique that circulated on HN exploits a combination of identity reframing and cumulative contextual pressure. I'm not going to reproduce the exact prompt — that's not the point. The pattern is: you establish a roleplay narrative
Discussion
Your thoughts matter!
Your input is valuable—be the first to share it!