Posts

Breaking AI Agents: Memory Poisoning in Lakera MindfulChat

Paulo Cesar documents his journey through the first lab of Lakera's Agent Breaker simulation, focusing on a memory poisoning attack against an AI assistant with persistent memory. The challenge involves an attacker who has already compromised the application's database and can insert arbitrary entries into the AI's memory logs, with the goal of manipulating the model's trusted historical context. The objective is to poison the assistant's memory so that it becomes obsessed with Winnie the Pooh, responding with related content regardless of what users ask, across five progressively difficult levels. Cesar details his techniques for each level, evolving from simple system prompt injections at the Novice level to more sophisticated methods at higher levels, including using realistic user memories, longitudinal framing to create believable context, and ultimately behavioral preference poisoning at the Legendary level, which proved most reliable by embedding the maliciou...

LF AI & Data Security and Compliance Work Group

This GitHub repository serves as the central hub for the LF AI & Data Foundation's Security and Compliance Work Group, which is dedicated to developing a comprehensive security and compliance strategy for AI-enabled applications. The group operates through two specialized subgroups focused on Use Cases and Threat Modeling and Risk and Compliance, and collaborates with major standards organizations like OWASP, OpenSSF, and NIST. The repository contains meeting information, project assets, whitepapers, and references to related security standards, all aimed at fostering secure AI development and reducing risk in regulated environments.  https://github.com/lfai/security-and-compliance

AVE – Agentic Vulnerability Enumeration

AVE is a behavioral classification standard for agentic AI components (skill files, MCP servers, system prompts, and plugins), providing stable identifiers and scoring for vulnerabilities that traditional CVE/OSV standards cannot describe. It assigns AVE IDs to 51 distinct attack classes (e.g., metamorphic payloads, tool poisoning, MCP tool hook hijacking), scores them using OWASP AIVSS v0.8 with a 10-factor Agentic Amplification and Reachability Score (AARS), and maps every record to frameworks like OWASP MCP Top 10 and MITRE ATLAS. The reference implementation (Bawbel Scanner) detects these vulnerabilities in CI pipelines, and the open schema (Apache 2.0) allows any security tool to integrate AVE IDs into their findings.  https://github.com/bawbel/ave

AITBM – AI Trust Benchmarking and Maturity Framework

AITBM is a bias-resistant framework for quantifying AI security risk without subjective guesswork. It uses a three-layer architecture: Intrinsic Vulnerability Profile (21 sub-metrics across 5 security axes), Operational Risk Posture (deployment context), and Assurance Confidence Index (evidence freshness). It produces a mathematically grounded composite score (ERS) that preserves multi-dimensional signal. Key features: deterministic rubrics (0–4 scoring), agentic-native threat modeling, tiered assessment pathways, and alignment with 16 external frameworks (OWASP, MITRE ATLAS, NIST AI RMF, ISO/IEC 42001, EU AI Act). Includes specification, worked examples, website with calculator, and Docker deployment.  https://github.com/ninedter/AITBM

MITRE ATLAS Agent: An Open-Source AI Assistant for Exploring the ATLAS Framework

This open-source AI assistant, built with Langflow, provides three ways to explore the MITRE ATLAS Framework: a natural-language chat interface for learning about tactics, techniques, mitigations, and case studies; an MCP server and API for integration into other tools and workflows; and full customizability of prompts and subagents. It uses a hierarchical system where an orchestrator manages three specialist agents for semantic search, structured data lookup, and knowledge graph generation. The agent can be run locally with the uv tool and configured with any compatible LLM provider, and it exposes its flows as MCP tools, making it a practical resource for security research and workflow integration.   https://github.com/mitre-atlas/atlas-knowledge-base-agent

Model Context Protocol (MCP) Explained: From Integration Problem to Production Deployment

This article explains Anthropic's Model Context Protocol (MCP) in three levels of difficulty. Level one covers why MCP matters: it solves the integration problem of connecting multiple AI clients to multiple tools by replacing M times N custom adapters with just M plus N protocol implementations. Level two details the architecture, explaining how hosts, clients, and servers work together, and the key primitives: tools, resources, and prompts. Level three addresses real-world deployment concerns, including transport options (stdio for local, HTTP for remote), security considerations like authentication and sandboxing, and decisions around local versus remote hosting. The article concludes that MCP provides a scalable, standardized foundation for building AI systems that reliably interact with external data and software.  https://machinelearningmastery.com/model-context-protocol-explained-in-3-levels-of-difficulty/

MRAgent: Graph Memory for LLM Agents

MRAgent is a retrieval-augmented QA system that builds a graph-structured episodic memory from long, multi-session dialogues instead of using simple vector retrieval. It operates in two phases: first, it rewrites dialogue turns into self-contained sentences (resolving pronouns, converting dates), extracts keywords, and stores everything as a graph with nodes for key entities, episodes, topics, and personal facts; second, it answers questions by running a tool-calling reasoning loop where the LLM uses seven specialized graph query tools (e.g., by topic, time, personal info, event context) to retrieve relevant memory and produce an answer. Evaluated on LoCoMo and LongMemEval benchmarks, the system uses OpenRouter for LLM access, caches intermediate results to avoid redundant work, and includes an LLM-as-judge evaluation script—treating memory as a reconstructed graph rather than retrieved chunks for more structured, context-aware querying in long-form conversational AI.  https://gith...