Every AI agent that can read private data, fetch external content, and send
outbound messages is one injected instruction away from exfiltrating everything
it knows.
This isn't theoretical. Here's the attack in three tool calls:
Turn 0: readPrivateData() → 5 customer records loaded (SSNs, emails, phones)
fetchExternalContent(url) → attacker's webpage, payload embedded in HTML
Turn 1: sendOutboundReport() → all PII sent to attacker's address
Turn 2: "Report sent successfully!"
Total time: ~12 seconds. Cost: $0.001. No exploits. No credentials. Just a
fetched webpage and a compliant model.
We measured it. Rigorously.
30 injection payloads across 6 categories — direct injection, encoded/obfuscated
(Base64, ROT13, hex, Unicode), social engineering (CEO fraud, IT impersonation,
legal threats), multi-turn (persistent rules, delayed triggers, context poisoning),
multilingual (Spanish, Mandarin, Arabic, Russian), and advanced techniques.
Tested ag
Discussion
Don’t hold back—comment!
Don’t wait—start sharing your ideas now!