Posts

Showing posts from July, 2026

Breaking AI Agents: Memory Poisoning in Lakera MindfulChat

Paulo Cesar documents his journey through the first lab of Lakera's Agent Breaker simulation, focusing on a memory poisoning attack against an AI assistant with persistent memory. The challenge involves an attacker who has already compromised the application's database and can insert arbitrary entries into the AI's memory logs, with the goal of manipulating the model's trusted historical context. The objective is to poison the assistant's memory so that it becomes obsessed with Winnie the Pooh, responding with related content regardless of what users ask, across five progressively difficult levels. Cesar details his techniques for each level, evolving from simple system prompt injections at the Novice level to more sophisticated methods at higher levels, including using realistic user memories, longitudinal framing to create believable context, and ultimately behavioral preference poisoning at the Legendary level, which proved most reliable by embedding the maliciou...