Our First Outage from LLM-Written Code

August 24, 2025

The Sketch team shared how a series of outages in July 2025 were caused by a subtle bug introduced by code refactored with the help of a large language model. After deployment, the system worked normally at first but soon suffered from CPU spikes and slowdowns, with the problem oddly triggered whenever the CEO logged in. In the process of diagnosing, they temporarily blocked the CEO’s account, which seemed to solve the issue until it happened again. The root cause was traced to a small change during an automated file move: a break statement had been replaced with continue, creating an infinite loop. This seemingly minor alteration slipped past human review, buried among otherwise harmless changes. To address it, the team improved their agent to preserve code exactly during moves and suggested that better tooling, such as cross-hunk change detection in Git, could help catch similar issues in the future.

https://sketch.dev/blog/our-first-outage-from-llm-written-code

Search This Blog

Appsec adventures

Our First Outage from LLM-Written Code

Comments

Post a Comment

Popular posts from this blog

Prompt Engineering Demands Rigorous Evaluation

Secure Vibe Coding Guide: Best Practices for Writing Secure Code

KEVIntel: Real-Time Intelligence on Exploited Vulnerabilities