Project Glasswing: what Mythos showed us

May 30, 2026

This Cloudflare blog post (May 18, 2026) details the company's experience testing Anthropic's Mythos Preview, a security-focused frontier LLM, against over fifty of their own repositories as part of Project Glasswing. Two features stood out: exploit chain construction (combining multiple low-severity bugs into a working exploit chain, reasoning like a senior researcher) and proof generation (writing, compiling, and running exploit code in a scratch environment, iterating on failures). However, the model exhibited inconsistent organic refusals – pushing back on legitimate vulnerability research in unpredictable ways, with semantically equivalent tasks producing opposite outcomes across runs. The post identifies a signal-to-noise problem worsened by memory-unsafe languages (C/C++) and model bias toward speculative findings hedged with "possibly" or "could in theory." The authors argue that pointing generic coding agents at repositories fails due to context limitations (single-stream agents cover <0.1% of a codebase before compaction) and throughput constraints. Instead, Cloudflare built a harness with eight stages: Recon (architecture mapping), Hunt (parallel narrow tasks, ~50 concurrent agents), Validate (adversarial review to disprove findings), Gapfill (re-queue under-covered areas), Dedupe, Trace (cross-repo reachability analysis), Feedback, and Report. Key lessons: narrow scope produces better findings, adversarial review reduces noise, splitting the chain across agents improves reasoning, and parallel narrow tasks beat one exhaustive agent. The post concludes that patching faster is insufficient – teams need architectural defenses that make exploitation harder even when bugs exist, including isolation and global rollout capabilities.

https://blog.cloudflare.com/cyber-frontier-models

Search This Blog

Appsec adventures

Project Glasswing: what Mythos showed us

Comments

Post a Comment

Popular posts from this blog

Prompt Engineering Demands Rigorous Evaluation

SecObserve: Simplified Vulnerability and License Management for CI/CD Pipelines

OWASP ZAP 2.16.0 Introduces Key Updates and Enhancements