Our First Outage from LLM-Written Code

The Sketch team shared how a series of outages in July 2025 were caused by a subtle bug introduced by code refactored with the help of a large language model. After deployment, the system worked normally at first but soon suffered from CPU spikes and slowdowns, with the problem oddly triggered whenever the CEO logged in. In the process of diagnosing, they temporarily blocked the CEO’s account, which seemed to solve the issue until it happened again. The root cause was traced to a small change during an automated file move: a break statement had been replaced with continue, creating an infinite loop. This seemingly minor alteration slipped past human review, buried among otherwise harmless changes. To address it, the team improved their agent to preserve code exactly during moves and suggested that better tooling, such as cross-hunk change detection in Git, could help catch similar issues in the future. 

https://sketch.dev/blog/our-first-outage-from-llm-written-code

Comments

Popular posts from this blog

Secure Vibe Coding Guide: Best Practices for Writing Secure Code

KEVIntel: Real-Time Intelligence on Exploited Vulnerabilities

OWASP SAMM Skills Framework Enhances Software Security Roles