Posts

The Orchard Bug and the Unfolding Cybersecurity Reckoning

Ben Goertzel argues that the Zcash Orchard bug, which allowed potential counterfeit ZEC creation and was undetected for four years until an AI model (Anthropic's Opus 4.8) found it, is an early tremor of a coming wave where AI will expose latent vulnerabilities across all software, from crypto to traditional finance. He contends the solution is well-known—formal verification and "correct-by-construction" derivation from mathematical specifications, as being implemented in the ASI:chain project—but the software industry has avoided it due to cost and speed pressures. Goertzel concludes that the same AI capabilities driving this reckoning can also enable large-scale formal verification, making the threat and remedy essentially the same technology, and urges a broad shift toward making mathematical proof the default expectation in software development.   https://bengoertzel.substack.com/p/the-orchard-bug-and-the-unfolding

The New MCP Specification: What Security Teams Must Prepare For

This Akamai blog post by Maxim Zavodchik, Segev Fogel, and Gal Meiri analyzes the upcoming July 28, 2026, update to the Model Context Protocol (MCP), which transitions it to an enterprise-grade, stateless architecture. While the update eliminates major protocol-level risks like session hijacking and weak authentication, it shifts critical security responsibilities to developers, introducing new attack surfaces including cross-agent workflow hijacking via untrusted client state objects, client-controlled metadata manipulation, header confusion attacks, stored XSS in new MCP interactive apps, and denial-of-service risks from long-running background tasks. The authors conclude that security teams must now treat all client-provided state and metadata as untrusted, enforce cryptographic verification, output encoding, and resource quotas, as the protocol's security posture now depends entirely on implementation quality rather than protocol-level guarantees.  https://www.akamai.com/blog/s...

Prompt Injection as Role Confusion

This ICML 2026 paper by Ye, Cui, and Hadfield-Menell presents a theory that prompt injection attacks succeed because LLMs perceive roles (like user , assistant , tool , think ) through insecure surface features like writing style rather than through the secure structural tags themselves. Using role probes to measure internal token perceptions, the authors demonstrate that sounding like a privileged role (e.g., mimicking reasoning style) overrides the actual tag, enabling attacks like CoT Forgery where fake reasoning in a user prompt achieves a ~60% success rate, while simply destyling the text drops success to 10%. They argue that roles are a hacked-together format trick that became critical cognitive and security infrastructure, and that unless models achieve genuine role perception, defense will remain a whack-a-mole game, opening the door to subtler threats like subconscious steering of LLM states for commercial or adversarial purposes.   https://role-confusion.github.io/

Vulnerability Reports Are Not Special Anymore

Filippo Valsorda argues that vulnerability reports are no longer "special" for open source maintainers because LLMs have made finding potential security issues cheap and abundant, shifting the bottleneck to triage and remediation rather than discovery. Confidentiality and embargoes matter less since attackers can also use AI to find flaws, so maintainers should focus on rapid triage, prevention, and integrating AI analysis into CI, while still treating truly exceptional reports from trusted sources with special care. https://words.filippo.io/vuln-reports

Breaking AI Agents: Memory Poisoning in Lakera MindfulChat

Paulo Cesar documents his journey through the first lab of Lakera's Agent Breaker simulation, focusing on a memory poisoning attack against an AI assistant with persistent memory. The challenge involves an attacker who has already compromised the application's database and can insert arbitrary entries into the AI's memory logs, with the goal of manipulating the model's trusted historical context. The objective is to poison the assistant's memory so that it becomes obsessed with Winnie the Pooh, responding with related content regardless of what users ask, across five progressively difficult levels. Cesar details his techniques for each level, evolving from simple system prompt injections at the Novice level to more sophisticated methods at higher levels, including using realistic user memories, longitudinal framing to create believable context, and ultimately behavioral preference poisoning at the Legendary level, which proved most reliable by embedding the maliciou...

LF AI & Data Security and Compliance Work Group

This GitHub repository serves as the central hub for the LF AI & Data Foundation's Security and Compliance Work Group, which is dedicated to developing a comprehensive security and compliance strategy for AI-enabled applications. The group operates through two specialized subgroups focused on Use Cases and Threat Modeling and Risk and Compliance, and collaborates with major standards organizations like OWASP, OpenSSF, and NIST. The repository contains meeting information, project assets, whitepapers, and references to related security standards, all aimed at fostering secure AI development and reducing risk in regulated environments.  https://github.com/lfai/security-and-compliance

AVE – Agentic Vulnerability Enumeration

AVE is a behavioral classification standard for agentic AI components (skill files, MCP servers, system prompts, and plugins), providing stable identifiers and scoring for vulnerabilities that traditional CVE/OSV standards cannot describe. It assigns AVE IDs to 51 distinct attack classes (e.g., metamorphic payloads, tool poisoning, MCP tool hook hijacking), scores them using OWASP AIVSS v0.8 with a 10-factor Agentic Amplification and Reachability Score (AARS), and maps every record to frameworks like OWASP MCP Top 10 and MITRE ATLAS. The reference implementation (Bawbel Scanner) detects these vulnerabilities in CI pipelines, and the open schema (Apache 2.0) allows any security tool to integrate AVE IDs into their findings.  https://github.com/bawbel/ave