Posts

Showing posts from May, 2026

CLIProxyAPI Turns AI CLIs into a Unified API

CLIProxyAPI is an open-source proxy that converts tools like Claude Code, Gemini CLI, and Codex into OpenAI-compatible APIs. Built in Go, it provides unified routing, streaming, multi-account management, and provider abstraction for AI agent workflows. The project is gaining attention as a way to centralize access to multiple LLM ecosystems while simplifying orchestration and rate-limit management.  https://github.com/router-for-me/CLIProxyAPI

Microsoft Open-Sources RAMPART and Clarity for AI Agent Safety

Microsoft introduced two open-source tools, RAMPART and Clarity, aimed at embedding safety and security into the AI agent development lifecycle. RAMPART is a pytest-native framework that converts red-team findings into repeatable CI/CD safety tests, helping developers continuously evaluate agent behavior against adversarial and benign scenarios. Clarity focuses earlier in the process, helping teams formalize assumptions, risks, and design intent before implementation. The initiative reflects a broader “shift-left” approach to AI security, where safety becomes part of everyday engineering workflows rather than a post-deployment audit. Microsoft positions the tools as practical defenses for increasingly autonomous AI agents that can execute code, access sensitive systems, and trigger real-world actions.  https://www.microsoft.com/en-us/security/blog/2026/05/20/introducing-rampart-and-clarity-open-source-tools-to-bring-safety-into-agent-development-workflow/

TealTiger: Runtime Guardrails and Governance for AI Agents

The TealTiger project (formerly AgentGuard) positions itself as a security and governance layer for AI agents, focused on runtime policy enforcement, auditability, and compliance. The SDK supports frameworks like LangChain, CrewAI, AutoGen, and MCP-based agents, adding controls such as tool whitelisting, spend limits, human approval gates, PII protection, and signed audit trails. The ecosystem also emphasizes enterprise governance mappings for standards like CPS 230, ISO 42001, and the EU AI Act. The rebrand from AgentGuard to TealTiger preserved APIs while consolidating the Python and TypeScript SDKs under a unified identity.  https://github.com/agentguard-ai/tealtiger

OpenTaint vs Semgrep vs CodeQL: Where SAST Tools Lose the Dataflow

The article compares Semgrep, CodeQL, and OpenTaint across five increasingly complex XSS scenarios in a Java Spring application. It argues that Semgrep struggles once analysis crosses function boundaries, CodeQL weakens on deep object graphs and virtual dispatch, while OpenTaint maintains taint tracking through builders, constructor chains, and interface calls using Semgrep-style rules interpreted semantically rather than syntactically. The piece frames the core challenge of SAST as preserving dataflow visibility as software architecture accumulates abstraction layers. https://opentaint.org/blog/semgrep-vs-codeql-vs-opentaint/

Adversarial Distillation of American AI Models (NSTM-4)

This April 23, 2026 memorandum from the White House Office of Science and Technology Policy (OSTP) addresses the threat of industrial-scale adversarial distillation of U.S. frontier AI models by foreign entities, principally based in China. The document states that these campaigns leverage tens of thousands of proxy accounts and jailbreaking techniques to systematically extract capabilities from American AI models at a fraction of the cost, enabling foreign actors to release models that appear comparable on benchmarks while deliberately stripping security protocols and mechanisms that ensure models are "ideologically neutral and truth-seeking." While the U.S. supports legitimate AI distillation (producing smaller, lighter-weight models from advanced systems), the administration announces four actions: sharing threat information with U.S. AI companies, enabling private sector coordination, developing best practices to identify and mitigate industrial-scale distillation, and ex...

Skill Issues: How We Discovered Supply Chain Attack Vectors in an AI Agent Skills Marketplace

 Orca Security's research team discovered four supply chain attack primitives in a prominent AI agent skills marketplace (where developers install reusable prompt-based extensions for AI coding agents). The primitives include: (1) install count inflation — unauthenticated GET requests can trivially spoof popularity metrics; (2) non-deterministic security scanning — skills are scanned only at creation and again only when they become popular, creating a window for malicious modifications; (3) silent skill override — installing a skill with the same name as an existing one silently replaces it with no warning; and (4) no fine-grained updates — the update command refreshes all installed skills at once with no diff or changelog. The researchers demonstrated three end-to-end attack flows (bait-and-switch, nested skill injection, and delayed weaponization via update) that achieved persistent code execution through malicious skills that passed the platform's security audits. Real-world...

Inside Claude Managed Agents: Reverse-Engineering the Security Boundaries of Anthropic's Hosted Agent Runtime

This Pluto Security blog post reverse-engineers Anthropic's Claude Managed Agents (a hosted runtime where Claude runs autonomously in cloud containers with bash, file I/O, web access, and MCP tools). Key findings include: the sandbox uses gVisor with a three-layer egress control system (the same isolation engine as Claude Cowork); all outbound traffic routes through a JWT-authenticated egress proxy with TLS inspection; the JWT is readable by any process in the sandbox and reveals organization metadata, session ID, and allowed hosts; even in "limited" networking mode, six additional Anthropic infrastructure hosts (including sentry.io and a staging endpoint) are silently injected into the egress JWT beyond user configuration. Three independent layers prevent proxy bypass (no DNS, network firewall, JWT validation). The vault credential proxy is identified as the platform's strongest security property — vault secrets never enter the sandbox, structurally preventing creden...

Your AI Assistant Is Leaking Your Conversations

This research disclosure reveals structural privacy risks in four major generative AI products — Perplexity, Anthropic's Claude, xAI's Grok, and OpenAI's ChatGPT — caused by third-party trackers embedded in LLM services that leak user conversations, identities, and sensitive metadata. The researchers found 13+ third-party trackers across the four platforms, including Meta Pixel, Google Analytics, TikTok, Datadog, Intercom, and Segment. Key findings include: conversation URLs (often publicly accessible permalinks) are disclosed to advertising and tracking services; trackers can link activity to user identities via cookies and email hashes; and in Grok's case, shared conversations generate publicly accessible screenshot images with verbatim message content exposed in Open Graph metadata. The disclosure also documents that Claude forwards user events server-to-side to eleven ad platforms (Meta, LinkedIn, TikTok, Reddit, Google, Amplitude, Iterable, HubSpot, Pinterest, Pods...

Claude Platform documentation about Workload Identity Federation

This Claude Platform documentation page describes Workload Identity Federation (WIF), which lets workloads authenticate to the Claude API using short-lived OpenID Connect (OIDC) tokens from an identity provider (IdP) instead of long-lived static API keys. Supported IdPs include AWS IAM, Google Cloud, GitHub Actions, Kubernetes service accounts, SPIFFE, Microsoft Entra ID, and Okta. The workflow involves: the IdP issuing a JWT to the workload; the Anthropic SDK exchanging the JWT for a short-lived Anthropic access token; and the SDK sending the token on every request while automatically refreshing it before expiry. Key concepts include service accounts (non-human identities in an Anthropic organization), federation issuers (registered OIDC providers with issuer URL and JWKS source), and federation rules (which bridge issuers to service accounts with match conditions, target, and authorization scope). The page includes setup instructions, SDK client examples (Python, TypeScript, Go, Java...

Replaced all Chrome extensions with own vibe-coded ones for safety

Pieter Levels (@levelsio) posted that within 1.5 hours he replaced all his Chrome extensions with his own "vibe-coded" extension called SuperLevels, after one of his existing extensions updated and suddenly wanted to read his entire browser history (which he suspected was to sell to an ad company). The SuperLevels extension includes: Tab Cleaner (auto-closes tabs after inactivity with host-based exclusions), Cookie Editor (nuke all cookies or edit any), Redirect Tracer (view redirect chains), Dark Mode (per-site or all sites), X Dim (changes X background back to dark blue), Music Finder (records and identifies songs), and restores Maps and View Image links that are hidden in the EU. He stated he deleted all other extensions except uBlock Origin, because controlling the source code is much safer.  https://x.com/levelsio/status/2046271694042505451 (I completely understand this guy. Bob has brought back that lovely feeling of coding again.)

Behind the Scenes Hardening Firefox with Claude Mythos Preview

This Mozilla Hacks article details how the Firefox team used AI models, particularly Claude Mythos Preview, to identify and fix an unprecedented number of latent security bugs. The authors explain that the dynamic shifted dramatically over a few months due to more capable models and improved techniques for harnessing them — moving from AI-generated "slop" to a scalable hardening pipeline using agentic harnesses that can create and run reproducible test cases. The article provides a sample of 12 discovered bugs (from a total of 271 fixed in Firefox 150), including 15-year-old XSLT bugs, race conditions over IPC leading to sandbox escapes, JIT optimization flaws, and RLBox sandbox bypasses. The pipeline involved parallelized scanning across VMs, integration with the full security bug lifecycle, and iteration with Firefox engineers. The article notes that the models were unable to circumvent Firefox's layered defenses (e.g., frozen prototypes), demonstrating the payoff of pr...

Microsoft AntiSSRF

Microsoft AntiSSRF is a security-developed, exhaustively-tested secure code library that provides robust URL validation to mitigate Server-Side Request Forgery (SSRF) vulnerabilities. It is available as an easy-to-use drop-in library for both .NET (NuGet package: Microsoft.Security.AntiSSRF) and Node.js (npm package: @microsoft/antissrf) applications. The library automatically validates URLs and network connections, rejecting unsafe input, and provides an agent that ensures HTTP requests cannot reach internal or sensitive IP addresses. The repository emphasizes that all incoming HTTP requests are untrusted, including user-provided URLs, data from external APIs, configuration values, and even requests from backend applications. Microsoft also provides Dusseldorf, an open-source dynamic SSRF testing tool, as a complementary testing resource. The library was released publicly in May 2026 with version 1.0.0 for .NET.  https://github.com/microsoft/AntiSSRF

Dirty Frag: Universal Linux LPE

Dirty Frag is a vulnerability class discovered and reported by Hyunwoo Kim (@V4bel) that chains two Linux kernel vulnerabilities — CVE-2026-43284 (xfrm-ESP Page-Cache Write) and CVE-2026-43500 (RxRPC Page-Cache Write) — to obtain root privileges on major Linux distributions. The vulnerabilities have an effective lifetime of approximately 9 years. Unlike race-condition exploits, Dirty Frag is a deterministic logic bug with no timing window, no kernel panic on failure, and a very high success rate. The two vulnerabilities are chained because xfrm-ESP (present on most distributions) requires namespace creation privileges, which Ubuntu sometimes blocks via AppArmor, while RxRPC (loaded by default on Ubuntu) does not require namespace privileges — together they cover each other's blind spots across all major distributions. Tested distributions include Ubuntu 24.04.4, RHEL 10.1, openSUSE Tumbleweed, CentOS Stream 10, AlmaLinux 10, and Fedora 44. The repository includes proof-of-concept e...

The FTC Is Already Regulating AI — Most Companies Just Haven't Noticed

This article argues that the FTC is already actively regulating AI using existing authority under Section 5 of the FTC Act (prohibition on unfair or deceptive practices), without needing new laws from Congress. The enforcement sweep, called Operation AI Comply (launched September 2024), targets three violation categories: unsubstantiated performance claims (e.g., Workado claimed 98% accuracy but tested at 53%), capability claims that don't hold up (e.g., DoNotPay's "robot lawyer" resulted in a $193,000 settlement), and AI tools that enable deception (e.g., Rytr's testimonial generator). The largest penalty so far was an $18 million judgment against Air AI in March 2026. The March 2026 FTC AI Policy Statement extends enforcement to AI agents and automated decision-making, with key provisions including disclosure requirements (consumers must know they're interacting with AI), logging of decision-making criteria and inputs/outputs, substantiation for all "AI...

Announcing CISCO Foundry Security Spec

Cisco announced the open-source Foundry Security Spec, a battle-tested blueprint for building an agentic security evaluation system. The specification is model-agnostic and stack-agnostic, designed to help organizations shift from noisy, hallucinated alerts to verifiable security findings. Foundry is published as two main artifacts: the "spec" (eight core agent roles, five extension roles, a finding lifecycle, a coordination substrate, and roughly 130 functional requirements with rationale) and the "constitution" (eleven inviolable principles each based on real production failures). The system wraps frontier LLMs in orchestration, roles, and guardrails to produce bounded, prioritized, verifiable findings with a clear "done" signal and auditable provenance. Foundry is meant to be used with GitHub's spec-kit and pairs with Cisco's previously open-sourced Project CodeGuard (donated to CoSAI) to create a self-improving flywheel: CodeGuard rules provide...

China's first policy framework for AI agents.

On May 8, 2026, China's Cyberspace Administration, National Development and Reform Commission, and Ministry of Industry and Information Technology jointly released the "Implementation Opinions on the Standardized Application and Innovative Development of Intelligent Agents" — China's first policy framework specifically for Agentic AI. The document treats intelligent agents as a future digital infrastructure and governance object, recognizing they are fundamentally different from traditional chatbots due to capabilities like autonomous perception, long-term memory, tool use, cross-platform execution, and multi-agent coordination. The framework balances development with governance, emphasizing "safe and controllable, reliable and trustworthy" principles. Key provisions include: distinguishing decision boundaries (human-only decisions, user-authorized decisions, and autonomous agent decisions); preventing anthropomorphism and emotional dependency (especially fo...