Project Glasswing: what Mythos showed us
This Cloudflare blog post (May 18, 2026) details the company's experience testing Anthropic's Mythos Preview, a security-focused frontier LLM, against over fifty of their own repositories as part of Project Glasswing. Two features stood out: exploit chain construction (combining multiple low-severity bugs into a working exploit chain, reasoning like a senior researcher) and proof generation (writing, compiling, and running exploit code in a scratch environment, iterating on failures). However, the model exhibited inconsistent organic refusals – pushing back on legitimate vulnerability research in unpredictable ways, with semantically equivalent tasks producing opposite outcomes across runs. The post identifies a signal-to-noise problem worsened by memory-unsafe languages (C/C++) and model bias toward speculative findings hedged with "possibly" or "could in theory." The authors argue that pointing generic coding agents at repositories fails due to context limitations (single-stream agents cover <0.1% of a codebase before compaction) and throughput constraints. Instead, Cloudflare built a harness with eight stages: Recon (architecture mapping), Hunt (parallel narrow tasks, ~50 concurrent agents), Validate (adversarial review to disprove findings), Gapfill (re-queue under-covered areas), Dedupe, Trace (cross-repo reachability analysis), Feedback, and Report. Key lessons: narrow scope produces better findings, adversarial review reduces noise, splitting the chain across agents improves reasoning, and parallel narrow tasks beat one exhaustive agent. The post concludes that patching faster is insufficient – teams need architectural defenses that make exploitation harder even when bugs exist, including isolation and global rollout capabilities.
Comments
Post a Comment