Posts

TealTiger: Runtime Guardrails and Governance for AI Agents

The TealTiger project (formerly AgentGuard) positions itself as a security and governance layer for AI agents, focused on runtime policy enforcement, auditability, and compliance. The SDK supports frameworks like LangChain, CrewAI, AutoGen, and MCP-based agents, adding controls such as tool whitelisting, spend limits, human approval gates, PII protection, and signed audit trails. The ecosystem also emphasizes enterprise governance mappings for standards like CPS 230, ISO 42001, and the EU AI Act. The rebrand from AgentGuard to TealTiger preserved APIs while consolidating the Python and TypeScript SDKs under a unified identity.  https://github.com/agentguard-ai/tealtiger

OpenTaint vs Semgrep vs CodeQL: Where SAST Tools Lose the Dataflow

The article compares Semgrep, CodeQL, and OpenTaint across five increasingly complex XSS scenarios in a Java Spring application. It argues that Semgrep struggles once analysis crosses function boundaries, CodeQL weakens on deep object graphs and virtual dispatch, while OpenTaint maintains taint tracking through builders, constructor chains, and interface calls using Semgrep-style rules interpreted semantically rather than syntactically. The piece frames the core challenge of SAST as preserving dataflow visibility as software architecture accumulates abstraction layers. https://opentaint.org/blog/semgrep-vs-codeql-vs-opentaint/

Adversarial Distillation of American AI Models (NSTM-4)

This April 23, 2026 memorandum from the White House Office of Science and Technology Policy (OSTP) addresses the threat of industrial-scale adversarial distillation of U.S. frontier AI models by foreign entities, principally based in China. The document states that these campaigns leverage tens of thousands of proxy accounts and jailbreaking techniques to systematically extract capabilities from American AI models at a fraction of the cost, enabling foreign actors to release models that appear comparable on benchmarks while deliberately stripping security protocols and mechanisms that ensure models are "ideologically neutral and truth-seeking." While the U.S. supports legitimate AI distillation (producing smaller, lighter-weight models from advanced systems), the administration announces four actions: sharing threat information with U.S. AI companies, enabling private sector coordination, developing best practices to identify and mitigate industrial-scale distillation, and ex...

Skill Issues: How We Discovered Supply Chain Attack Vectors in an AI Agent Skills Marketplace

 Orca Security's research team discovered four supply chain attack primitives in a prominent AI agent skills marketplace (where developers install reusable prompt-based extensions for AI coding agents). The primitives include: (1) install count inflation — unauthenticated GET requests can trivially spoof popularity metrics; (2) non-deterministic security scanning — skills are scanned only at creation and again only when they become popular, creating a window for malicious modifications; (3) silent skill override — installing a skill with the same name as an existing one silently replaces it with no warning; and (4) no fine-grained updates — the update command refreshes all installed skills at once with no diff or changelog. The researchers demonstrated three end-to-end attack flows (bait-and-switch, nested skill injection, and delayed weaponization via update) that achieved persistent code execution through malicious skills that passed the platform's security audits. Real-world...

Inside Claude Managed Agents: Reverse-Engineering the Security Boundaries of Anthropic's Hosted Agent Runtime

This Pluto Security blog post reverse-engineers Anthropic's Claude Managed Agents (a hosted runtime where Claude runs autonomously in cloud containers with bash, file I/O, web access, and MCP tools). Key findings include: the sandbox uses gVisor with a three-layer egress control system (the same isolation engine as Claude Cowork); all outbound traffic routes through a JWT-authenticated egress proxy with TLS inspection; the JWT is readable by any process in the sandbox and reveals organization metadata, session ID, and allowed hosts; even in "limited" networking mode, six additional Anthropic infrastructure hosts (including sentry.io and a staging endpoint) are silently injected into the egress JWT beyond user configuration. Three independent layers prevent proxy bypass (no DNS, network firewall, JWT validation). The vault credential proxy is identified as the platform's strongest security property — vault secrets never enter the sandbox, structurally preventing creden...

Your AI Assistant Is Leaking Your Conversations

This research disclosure reveals structural privacy risks in four major generative AI products — Perplexity, Anthropic's Claude, xAI's Grok, and OpenAI's ChatGPT — caused by third-party trackers embedded in LLM services that leak user conversations, identities, and sensitive metadata. The researchers found 13+ third-party trackers across the four platforms, including Meta Pixel, Google Analytics, TikTok, Datadog, Intercom, and Segment. Key findings include: conversation URLs (often publicly accessible permalinks) are disclosed to advertising and tracking services; trackers can link activity to user identities via cookies and email hashes; and in Grok's case, shared conversations generate publicly accessible screenshot images with verbatim message content exposed in Open Graph metadata. The disclosure also documents that Claude forwards user events server-to-side to eleven ad platforms (Meta, LinkedIn, TikTok, Reddit, Google, Amplitude, Iterable, HubSpot, Pinterest, Pods...

Claude Platform documentation about Workload Identity Federation

This Claude Platform documentation page describes Workload Identity Federation (WIF), which lets workloads authenticate to the Claude API using short-lived OpenID Connect (OIDC) tokens from an identity provider (IdP) instead of long-lived static API keys. Supported IdPs include AWS IAM, Google Cloud, GitHub Actions, Kubernetes service accounts, SPIFFE, Microsoft Entra ID, and Okta. The workflow involves: the IdP issuing a JWT to the workload; the Anthropic SDK exchanging the JWT for a short-lived Anthropic access token; and the SDK sending the token on every request while automatically refreshing it before expiry. Key concepts include service accounts (non-human identities in an Anthropic organization), federation issuers (registered OIDC providers with issuer URL and JWKS source), and federation rules (which bridge issuers to service accounts with match conditions, target, and authorization scope). The page includes setup instructions, SDK client examples (Python, TypeScript, Go, Java...