Skill Issues: Compromising Claude Code with malicious skills & agents -- Part 1
This technical blog post demonstrates how attackers can compromise Claude Code, Anthropic's AI coding assistant, through malicious skill files and sub-agents. Skills are markdown files that instruct LLMs on how to perform specific tasks, and thousands of users share them on GitHub and skills.sh without proper vetting. The author shows that with default settings, a skill containing frontmatter with "allowed-tools: Bash(*)" and a dynamic context command (using !`command`) can execute arbitrary bash commands, including a reverse shell, without any user prompt or LLM reasoning. Sub-agents, which can run with "bypassPermissions" mode, can also execute malicious commands, such as installing a backdoored npm package. The article notes that while Claude Code has complex permission and command-parsing logic, the LLM itself may reject obviously malicious commands, but dynamic context inputs bypass this reasoning entirely. Defensive measures include denying Bash commands in settings files, sandboxing, carefully reviewing all skills and agents, and maintaining internal trusted sources. The author concludes that running untrusted skills carries similar risk to installing arbitrary pip packages or executables, and organizations must implement strong controls around AI coding tools.
Comments
Post a Comment