AI Agents May Always Fall for Prompt Injections

May 30, 2026

This academic paper from arXiv (May 17, 2026) by Abdelnabi and Bagdasarian argues that prompt injection, the most critical vulnerability in deployed AI agents, may be impossible to fully prevent. The authors challenge the prevailing defense paradigm of data-instruction separation, showing that current injection classifiers perform at near-chance levels (AUROC 0.43–0.59) when attacks operate through contextual manipulation rather than explicit injection vocabulary. They recast prompt injection through the lens of Contextual Integrity (CI), a privacy theory that judges information flow compliance with contextual norms defined by five parameters: sender, receiver, subject, information type, and transmission principle. Using this framework, they demonstrate three classes of failures: (1) attacks that corrupt parameter inference (e.g., fabricating user quotes or prior approvals) achieving 96.7% success against an email assistant, (2) norm grounding failures where agents execute out-of-scope requests 29.9–36.2% of the time without interaction history, and (3) flow separation failures where agents collapse authorization across simultaneous information flows in up to 65% of cases. The authors present an impossibility argument: an adversary can always construct a context where a blocked flow appears legitimate, or a defender who tightens norms will block genuinely legitimate flows. They conclude that current safety training may degrade both security and utility, and advocate for CI-grounded red-teaming and layered architectures that verify claims against ground truth. Experiments cover frontier models including GPT-5.4, Claude Sonnet 4-6, Gemini 3-Pro, and Meta SecAlign.

https://arxiv.org/html/2605.17634v1

Search This Blog

Appsec adventures

AI Agents May Always Fall for Prompt Injections

Comments

Post a Comment

Popular posts from this blog

Prompt Engineering Demands Rigorous Evaluation

SecObserve: Simplified Vulnerability and License Management for CI/CD Pipelines

OWASP ZAP 2.16.0 Introduces Key Updates and Enhancements