ToolJack: Hijacking AI Agent Perception via Bridge Exploitation
ToolJack is an attack methodology that manipulates the trust boundary between AI agents and their tools. After achieving local compromise, an attacker can extract session credentials, pivot across devices, and intercept the bridge protocol between Claude Desktop and its browser extension. This enables Phantom Tab Injection (fabricating tabs only the agent sees) and Tool Relay Spoofing (replacing legitimate tool responses with attacker-controlled data), leading to Remote Listener Indirect Prompt Injection—actively constructing a poisoned environment around the agent. Testing showed complete control over the agent's perceived context, but Anthropic's model-level safety alignment consistently blocked autonomous code execution. The research concludes that infrastructure requires cryptographic tool attestation and device-bound tokens, while model alignment serves as a critical last line of defense.
Comments
Post a Comment