Inside Claude Managed Agents: Reverse-Engineering the Security Boundaries of Anthropic's Hosted Agent Runtime

This Pluto Security blog post reverse-engineers Anthropic's Claude Managed Agents (a hosted runtime where Claude runs autonomously in cloud containers with bash, file I/O, web access, and MCP tools). Key findings include: the sandbox uses gVisor with a three-layer egress control system (the same isolation engine as Claude Cowork); all outbound traffic routes through a JWT-authenticated egress proxy with TLS inspection; the JWT is readable by any process in the sandbox and reveals organization metadata, session ID, and allowed hosts; even in "limited" networking mode, six additional Anthropic infrastructure hosts (including sentry.io and a staging endpoint) are silently injected into the egress JWT beyond user configuration. Three independent layers prevent proxy bypass (no DNS, network firewall, JWT validation). The vault credential proxy is identified as the platform's strongest security property — vault secrets never enter the sandbox, structurally preventing credential theft via prompt injection. However, the default configuration is maximally permissive (all eight tools enabled, `always_allow` permission policy, unrestricted networking). The post notes that the documentation lacks dedicated security/hardening guidance (likely a beta-stage gap) and provides hardening recommendations including disabling the default toolset, using limited networking, storing all credentials in vaults, and monitoring session events. The post also notes that runtime safety classifiers (like those documented in Claude Code) could not be confirmed for Managed Agents. 

https://pluto.security/blog/inside-claude-managed-agents

Comments

Popular posts from this blog

Prompt Engineering Demands Rigorous Evaluation

SecObserve: Simplified Vulnerability and License Management for CI/CD Pipelines

Secure Vibe Coding Guide: Best Practices for Writing Secure Code