Posts

Showing posts from May, 2026

AI-Driven Development Life Cycle: Reimagining Software Engineering

The article introduces the AI-Driven Development Lifecycle (AI-DLC), a new methodology that positions AI as a central collaborator rather than just an assistant in software development. It argues that traditional AI approaches, such as AI-assisted and AI-autonomous development, yield suboptimal results. AI-DLC operates on two dimensions: AI-powered execution with human oversight, where AI creates plans, asks clarifying questions, and defers key decisions to humans, and dynamic team collaboration, where teams focus on creative problem-solving while AI handles routine tasks. The lifecycle has three phases: Inception, where AI transforms business intent into requirements via Mob Elaboration; Construction, where AI proposes architecture, code, and tests through Mob Construction; and Operations, where AI manages infrastructure and deployments with team oversight. Key benefits include increased velocity, higher quality, more innovation, faster market responsiveness, and improved developer ex...

Prempti - Falco-powered policy and visibility layer for AI coding agents

Prempti is an experimental tool from Falco Security that provides guardrails and real-time visibility for AI coding agents by intercepting tool calls before they execute. It evaluates every shell command, file write/edit/read, web fetch, and MCP call against customizable Falco rules written in YAML, producing verdicts: Allow (proceeds), Deny (blocked with LLM-friendly explanation), or Ask (prompts user for approval). Two operational modes are available: Guardrails mode (default, verdicts enforced) and Monitor mode (observe-only, all calls proceed while verdicts are logged). A default ruleset covers working-directory boundaries, sensitive paths (.env, ~/.ssh/, cloud credentials), sandbox disable attempts, credential access, destructive commands, exfiltration, MCP server config poisoning, skill file injection, and persistence vectors. Users can add custom rules to ~/.prempti/rules/user/. A Claude Code skill is included for interactive rule authoring. Supported agents include Claude Code ...

We hardened zizmor's GitHub Actions static analyzer

Trail of Bits collaborated with zizmor maintainers over three months to bring zizmor's YAML anchor support to full coverage, after attackers exploited a pull_request_target misconfiguration in the aquasecurity/trivy-action GitHub Action to exfiltrate secrets and backdoor LiteLLM on PyPI in March 2026. To stress-test the tool, Trail of Bits built a corpus of 41,253 workflow files from 6,612 high-value open-source repositories (the 10,000 most-starred repos created between 2022-2025 that use GitHub Actions). Only 43 of 41,253 workflows (0.1%) use YAML anchors, but those include foundational projects like Bitcoin Core and Home Assistant. Four anchor handling bugs were found and fixed: aliases in sequences incorrectly flattened (causing crashes or wrong-location findings), anchor prefixes leaking into values, duplicate anchors causing crashes, and the template-injection audit crashing on aliased run values. The corpus also surfaced deserialization edge cases (if: 0 as integer, timeout-...

ExploitBench – Real exploitation is a ladder

ExploitBench is a benchmark from Carnegie Mellon University that measures how far AI agents climb the exploitation ladder, from reaching vulnerable code (T5 coverage) to triggering the bug (T4 reproduction) to building target-specific primitives (T3) to generic arbitrary read/write primitives (T2) to full arbitrary code execution (T1). The first benchmark, v8-bench, targets V8 (the JavaScript and WebAssembly engine inside Chrome, Edge, Node.js, and Cloudflare Workers) with the V8 security sandbox enabled, testing against 41 CVEs. Grading is deterministic with no LLM-as-judge. As of May 18, 2026, the leaderboard shows Claude Mythos Preview (with and without AutoNudge) achieving mean capability scores of 9.90/16 and 9.55/16, and GPT-5.5 (Codex) at 5.51. Mythos Preview reached Tier 1 (full arbitrary code execution) on 21 of 41 CVEs (51%), while GPT-5.5 cracked Tier 1 on 2 CVEs. Claude Opus 4.7 with AutoNudge escaped the V8 sandbox into Tier 2 on one CVE. The cheapest full ACE run cost $14...

How Uber Runs 60,000 AI Agent Tasks Per Week With MCP

This Agentic AI Foundation blog post summarizes a talk by Meghana Somasundara and Rush Tehrani at the MCP Dev Summit North America 2026 about Uber's production-scale MCP deployment. Uber runs 60,000 AI agent executions per week, with over 1,500 active agents monthly and more than 90% of Uber's 5,000-plus engineers using AI tooling every month. The infrastructure is built on MCP, which the authors state "are what make AI usable at Uber." Before MCP, every agent team built bespoke integrations to Uber's 10,000-plus internal services, resulting in hundreds of non-reusable parallel integrations. The solution was a control plane consisting of the MCP Gateway and Registry, which automatically translates Uber's service interface definitions (proto and thrift files) into MCP tool descriptions using an LLM. Everything runs through code as pull requests with security scanning before deployment. Security layers include authentication on by default for sensitive data, a P...

Stateless: The Future of MCP Transports

This blog post from the Agentic AI Foundation (AAIF) summarizes a talk by Shaun Smith (Hugging Face) and Kurtis Van Gent (Google Cloud) at the MCP Dev Summit North America 2026 about making the Model Context Protocol stateless. The motivation comes from operational scale: Google Cloud supports MCP servers for AlloyDB, Spanner, Cloud SQL, Bigtable, and Firestore, while Hugging Face runs over 2,500 MCP servers via Spaces and manages over 20 million tool calls across 40+ databases in a single month, with a single tool call generating over 100 MCP protocol messages. The core problem is that MCP is currently stateful, requiring initialization handshakes and persistent session context, which breaks down behind load balancers where requests can arrive at any server. The proposed solutions include: SEP-1442 (removing the initialization handshake as a required first step, folding protocol negotiation into the first actual request like tools/list), SEP-2322 (fixing elicitation by making it a seq...

scopeshift - An automated tool to test AI models against scope manipulation (deceiving an AI agent about its real target)

scopeshift is an automated tool that sits in the network path of an LLM-driven offensive-security agent and systematically deceives it about its real target through coordinated manipulation of network, DNS, and MCP signals. It operates through four independent subsystems: shift-local (reverse proxy that rewrites responses to make a remote target appear local, including URL substitution, cookie domain stripping, HTML comment injection, header removal, meta tag stripping, and title rewriting), shift-dns (synthesizes TXT attestation records that can include the agent's own egress IP via the $SELF_EGRESS placeholder, with optional A/AAAA redirect to the local proxy and transparent interception of hardcoded DNS), and shift-mcp (a deceptive MCP server that returns operator-configured answers to scope and rules-of-engagement queries). The tool requires Python 3.11+, installs via pipx or uv, and includes a Docker sidecar demo where an unmodified Claude Code agent runs as a sibling containe...

KeyLedger - Unified TUI for inventory, health-check, and tracking every API key issued across your AI providers

KeyLedger is an interactive terminal dashboard built with Bubble Tea that provides a unified view of all API keys issued across AI providers including OpenAI, Anthropic, AWS IAM, Google Cloud IAM, and Mistral. Features include unified inventory querying across providers with configurable timeouts, health scoring that flags stale, idle, or never-used keys with configurable age thresholds, a hierarchical Workspaces view showing workspace and project trees per provider with key counts and risk breakdown, live filtering with expression syntax for name, scope, status, age, owner, and risk score, a snapshots manager for listing, exporting, deleting, and diffing snapshots, periodic watch mode for scheduled collection, and encrypted credential storage using AES-256-GCM encrypted SQLite with no plaintext on disk. Provider management UI allows enabling or disabling providers and configuring credentials interactively. Installation supports go install, building from source, and Docker. Watch mode ...

Our evaluation of Claude Mythos Preview's cyber capabilities | AISI Work

This UK AI Security Institute (AISI) blog post evaluates Anthropic's Claude Mythos Preview (announced April 7, 2026), finding continued improvement on capture-the-flag challenges and significant progress on multi-step cyber-attack simulations. On expert-level CTF tasks (which no model could complete before April 2025), Mythos Preview succeeds 73% of the time. More notably, it is the first model to solve "The Last Ones" (TLO), a 32-step corporate network attack simulation spanning initial reconnaissance to full network takeover (estimated to require 20 human hours), completing it successfully in 3 out of 10 attempts and averaging 22 of 32 steps across all attempts. Claude Opus 4.6, the next best model, averaged 16 steps. The model could not complete an operational technology focused range ("Cooling Tower"), though it got stuck on IT sections rather than OT-specific tasks. The evaluation notes that performance scales with inference compute (tested up to 100M token...

MCP tunnels - Connect Claude to private MCP servers (BETA)

This Claude documentation page describes MCP tunnels, a beta feature that securely connects Claude to Model Context Protocol (MCP) servers running inside private networks without opening inbound ports or exposing services to the public internet. The architecture uses two components deployed inside your network: cloudflared (a tunnel agent that initiates outbound-only connections to Anthropic's tunnel edge) and a proxy (terminates inner TLS, validates IP ranges, and routes requests to upstream MCP servers). Traffic flows over outbound-only connections, eliminating the need for inbound firewall rules, IP allowlisting, or public exposure. Security layers include outer mTLS with IP validation, inner TLS terminating at your proxy (so the transport provider cannot read payloads), and optional OAuth on each MCP server. Prerequisites include a deployment target (Kubernetes or VM with Docker), a tunnel created in the Claude Console, authentication (programmatic via Workload Identity Federat...

The First CVE Wave: Signs That AI-Assisted Vulnerability Discovery Is Reshaping Disclosure Volumes

This VulnCheck blog post (May 14, 2026) analyzes CVE disclosure data and finds sharp year-to-date increases across several software suppliers, including Chrome (+563.2%), VMware (+180.9%), Apache (+170.3%), Mozilla (+156.9%), HPE (+132.3%), and F5 (+113.8%). GitHub CVE issuance is up 476.07%, with GitHub confirming the increase is spread across many reporters and projects rather than concentrated in a single source. The post connects these trends to AI-assisted vulnerability discovery, particularly following Anthropic's April 7, 2026 announcement of Project Glasswing and Claude Mythos Preview. Public examples include: Mozilla working "around the clock using frontier AI models" on Firefox, Microsoft launching its own AI discovery tool and noting that "AI vulnerability findings can scale," Apache seeing a 170% increase with a researcher (Naveen Sunkavally) crediting Claude for discovering ActiveMQ CVE-2026-34197 (now on CISA KEV), and Palo Alto Networks reporting ...

Project Glasswing: what Mythos showed us

This Cloudflare blog post (May 18, 2026) details the company's experience testing Anthropic's Mythos Preview, a security-focused frontier LLM, against over fifty of their own repositories as part of Project Glasswing. Two features stood out: exploit chain construction (combining multiple low-severity bugs into a working exploit chain, reasoning like a senior researcher) and proof generation (writing, compiling, and running exploit code in a scratch environment, iterating on failures). However, the model exhibited inconsistent organic refusals – pushing back on legitimate vulnerability research in unpredictable ways, with semantically equivalent tasks producing opposite outcomes across runs. The post identifies a signal-to-noise problem worsened by memory-unsafe languages (C/C++) and model bias toward speculative findings hedged with "possibly" or "could in theory." The authors argue that pointing generic coding agents at repositories fails due to context lim...

AI Agents May Always Fall for Prompt Injections

This academic paper from arXiv (May 17, 2026) by Abdelnabi and Bagdasarian argues that prompt injection, the most critical vulnerability in deployed AI agents, may be impossible to fully prevent. The authors challenge the prevailing defense paradigm of data-instruction separation, showing that current injection classifiers perform at near-chance levels (AUROC 0.43–0.59) when attacks operate through contextual manipulation rather than explicit injection vocabulary. They recast prompt injection through the lens of Contextual Integrity (CI), a privacy theory that judges information flow compliance with contextual norms defined by five parameters: sender, receiver, subject, information type, and transmission principle. Using this framework, they demonstrate three classes of failures: (1) attacks that corrupt parameter inference (e.g., fabricating user quotes or prior approvals) achieving 96.7% success against an email assistant, (2) norm grounding failures where agents execute out-of-scope...

Exploring AAuth for Agent Identity and Access Management (IAM)

This blog post by Christian Posta provides a hands-on demonstration of AAuth (Agent Auth, pronounced “AY-awth”), an IETF OAuth working group draft specification for agent identity and access management, authored by Dick Hardt (co-author of OAuth 2.0/2.1). The post introduces two resource access modes: (1) Identity-based – an agent asserts its identity with a non-bearer aa-agent+jwt token (issued by an Agent Provider) that the resource verifies and applies local policy; (2) Three-party (PS-managed) – the resource issues a 401 challenge with an aa-resource+jwt, the agent exchanges it at its Person Server for an aa-auth+jwt auth token, then retries the request. The demo includes a full working implementation with an AAuth Person Server (also acting as Agent Provider), Agentgateway (policy enforcement point), an Envoy ExtAuthz service that turns any resource into an AAuth resource, and Python/Go libraries. All source code is available on GitHub. The recommended starting point is the AAuth ...

Getting LLMs Drunk to Find Remote Linux Kernel OOB Writes (and More)

This blog post by Asim Viladi Oglu Manizada (April 28, 2026) describes how a custom “overengineered, self-orchestrating team of vulnerability-hunting agents” discovered 20+ CVEs over several months, including two remote unauthenticated out‑of‑bounds writes in the Linux kernel’s ksmbd (CVE‑2026‑31432, CVE‑2026‑31433). The author’s harness uses a “drunk” Qwen 3.5 27B derivative and GPT‑5.3‑Codex to find vulnerabilities. Key findings span Linux kernel (ksmbd), Docker, OpenSSL, CUPS (remote RCE to root chain), HAProxy, Caddy, Traefik, udisks, systemd‑machined, etcd, Squid, nginx, Firewalld, dnsmasq, CoreDNS, util‑linux, RabbitMQ, Asterisk, MySQL, and MariaDB. The post highlights three vulnerability categories: (1) “docs ↔ code mismatches” (e.g., Docker AuthZ bypass CVE‑2026‑34040, Caddy case‑sensitivity bypasses), (2) memory corruption bugs found via focused LLM analysis, and (3) compositional logic flaws (e.g., CUPS chain). The author concludes that LLMs can now find vulnerabilities auton...

AI-powered honeypots: Turning the tables on malicious AI agents

This Cisco Talos blog post (April 29, 2026) argues that generative AI allows defenders to rapidly deploy adaptive, convincing honeypots (e.g., Linux shells, IoT devices) using simple text prompts, making deception scalable and cost‑effective. AI‑driven attacks prioritize speed over stealth, making them vulnerable to simulated systems that exploit the lack of true awareness in AI agents. The author provides a proof‑of‑concept Python implementation: a TCP listener with a basic authentication “vulnerability” (username `admin` / password `password123`), then forwards authenticated attacker commands to a ChatGPT instance (gpt‑3.5‑turbo) instructed to act as a Linux bash shell belonging to a Python learner. The system prompt can be changed to impersonate other environments (e.g., a smart fridge running Busybox). The key insight is that while a skilled human attacker may not be fooled for long, the target is malicious AI agents – automated attackers that can be tricked, misled, and studied in...

LLM Honeypot vs. Cryptojacking: Understanding the Enemy

This blog post by Mario Candela (founder of Beelzebub) demonstrates how an LLM‑powered honeypot captured and analyzed a cryptojacking attack. The attacker’s bot first cleared competing malware (killing processes like `xmrig`, `cnrig`), changed the root password, then downloaded and executed a script from `c3pool.org` to install the XMRig miner for Monero (XMR). The honeypot used was Beelzebub – a low‑code, AI‑native framework configured as an SSH LLM honeypot (with GPT‑4o as the backend). The bot’s commands revealed system reconnaissance (OS, uptime, GPU/CPU specs, network) followed by deployment of the miner connecting to the attacker’s wallet. The author traced the public wallet address to a mining pool, finding that 20 XMR (≈$4,126) had been paid out. He reported the wallet to the c3pool team, who removed all infected miners. The post concludes by promoting Beelzebub’s managed platform for security deception, automated AI red teaming, and real‑time malware analysis.  https://bee...

Your CFO Was on the Video Call. Except It Wasn’t Your CFO

This LinkedIn article by Jim Barnebee (CEO of AIM-E) describes the Arup deepfake fraud incident, where a finance manager at a global engineering firm joined a video conference with what appeared to be the company’s CFO and several senior colleagues. All participants were AI‑generated deepfakes – real‑time, interactive, responding to questions – and the finance manager authorized $25 million in wire transfers. The article argues this marks a new baseline for enterprise security, as real‑time, multi‑participant interactive video deepfakes have become indistinguishable from reality. It notes that 85% of organizations have experienced at least one synthetic media threat in the past 12 months, and cyber‑enabled fraud has overtaken ransomware as the top CEO concern (World Economic Forum). Attack patterns now include real‑time deepfake conferences, AI‑cloned voice vishing, hyper‑personalized LLM phishing, and compromised internal AI agents with real authority. Recommended defenses include: ki...

Scanning MCP Servers with ZAP

On May 21, 2026, the ZAP project announced a new MCP Integration add‑on that enables the OWASP Zed Attack Proxy (ZAP) to scan Model Context Protocol (MCP) servers. MCP servers expose tools, resources, and prompts to AI assistants over JSON‑RPC; they are treated as a new kind of API. The add‑on imports an MCP server’s endpoints into ZAP by performing the MCP handshake, enumerating all exposed tools/resources/prompts, and sending representative requests (with string arguments populated by ZAP’s value generator) to capture JSON‑RPC requests/responses in the history and sites tree. Once imported, all existing ZAP capabilities apply: passive scanning, active scanning, fuzzing, and reporting. The add‑on is available from the ZAP desktop Marketplace, from the Automation Framework via a new `mcp-import` job, and from GitHub Actions using the `zaproxy/action-mcp-scan` action. The post warns that active scans send mutating `tools/call` requests, so they should be run against test deployments. MC...

GitHub - scadastrangelove/asamm: Agentic SAMM - An OWASP SAMM Extension for AI-Driven Development

Agentic SAMM is an extension to the OWASP Software Assurance Maturity Model (SAMM) for AI‑driven development. It addresses security assurance for systems where context (documents, issues, tool descriptions, retrieved web pages, CI logs) becomes part of the control plane, tool calls are security boundaries, and the development workflow itself is an attack surface. The framework introduces a threat taxonomy organized by entry points (not consequences), a two‑path adoption model (migration for existing SAMM programs / greenfield for new builds), 21 controls across five SAMM function families (Governance, Design, Implementation, Verification, Operations) with evidence‑based maturity levels (L1/L2/L3), and a structured audit methodology with three audit tracks. Current version is v0.3.0‑draft (May 2026), with recent additions including trust grading, delegation calibration, two new controls (AG‑04 Inter‑Agent Trust Protocol, AI‑06 Agent Identity and Credential Governance), delegated evidenc...

GitHub - OWASP/Agent-Security-Regression-Harness: Executable security regression testing for agentic applications and MCP-integrated systems

The OWASP Agent Security Regression Harness is an open-source, vendor‑neutral tool for running reproducible security regression scenarios against agentic applications and systems integrated with the Model Context Protocol (MCP). It helps teams verify that changes to prompts, models, tools, memory, approval flows, or MCP integrations do not reintroduce known security failures. The current CLI supports loading scenarios, validating assertions, running against live HTTP targets, local Python callables, OpenAI Agents SDK, MCP workflows, and LangChain/LangGraph invoke targets. Implemented assertions include `no_denied_tool_call` (denylist/allowlist for tool calls), `goal_integrity` (detecting goal drift), `memory_isolation` (checking for forbidden markers like secrets), and `no_external_recipient`. The harness produces machine‑readable JSON results and can exit with a non‑zero code when regressions are detected (CI gate). The project is in early Incubator development (Apache 2.0 license), w...

OpenTaint | The Open Source Taint Analysis Engine for the AI Era

OpenTaint is an open-source taint analysis engine designed for the AI era, providing whole‑program, inter‑procedural dataflow analysis to track untrusted data across function boundaries, persistence layers, aliases, and asynchronous code. It finds vulnerabilities that AST‑pattern matchers miss, and allows LLM agents to encode discovered flaws as reusable AST‑pattern rules. The engine covers 20+ vulnerability classes (SQL injection, XSS, SSRF, SpEL injection, command injection, etc.) and is particularly thorough for Spring Boot applications (Java/Kotlin, with Python and Go on the roadmap). Unlike many commercial or dual‑licensed tools (Semgrep Pro, CodeQL), OpenTaint offers full inter‑procedural analysis under Apache 2.0 and MIT licenses at no cost for any codebase, including closed‑source commercial projects. It supports existing Semgrep rule syntax, models JPA persistence flows (including stored injections across requests), and provides deterministic scans in minutes of CPU without pe...

Solving the Identity Crisis for AI Agents (Uber Engineering Blog)

This Uber engineering blog post (May 22, 2026) describes how the company extended its identity and access technology stack to support AI agents at scale. The key problems addressed are: (1) existing identity models are built for humans and workloads, not for agents acting on behalf of others; and (2) original provenance (user, intermediate agents) is lost across multi-agent hops. Uber’s solution includes an **Agent Registry** (source of truth for agent-to-workload mapping), a **Security Token Service (STS)** that mints short‑lived, single‑hop JWT tokens with full actor chain attribution, an **AI Agent Mesh** for agent‑to‑agent communication, an **MCP Gateway** for policy enforcement, and an **AI Gateway** with guardrails for external model calls. Every token exchange is cryptographically anchored in SPIRE workload identities. The system is adopted by thousands of internal agents, with P99 latency under 40 milliseconds. The post outlines Uber’s long‑term vision across three layers: Iden...

GitHub - HarborGuard/HarborGuard: Modern image vulnerability scanning & patching platform with multi-tool integration

HarborGuard is a container security scanning platform that provides a web interface for managing and visualizing security assessments of Docker images. It integrates multiple industry-standard tools including Trivy, Grype, Syft, Dockle, OSV Scanner, and Dive. Key features include multi-dimensional vulnerability scatterplots, layer-by-layer Docker image analysis, severity-based findings management, and automated patch capabilities. The platform is deployable via Docker (recommended), supports external PostgreSQL databases, S3-compatible object storage for distributed deployments, and various notification integrations (Microsoft Teams, Slack, Gotify, Apprise). Configuration is managed through environment variables with sensible defaults. The project is licensed under AGPL-3.0, has 618 GitHub stars, and is primarily written in TypeScript (96.8%).  https://github.com/HarborGuard/HarborGuard

GitHub Hacked: Internal Repositories Exposed via Poisoned VS Code Extension

GitHub warned that a developer downloaded a malicious VS Code extension, leading to the theft of about 3,800 internal repositories. The attack, attributed to the TeamPCP threat actor (now reportedly selling the data with Lapsus$ for $95,000), does not appear to have compromised customer data. The poisoned extension may have been a compromised version of Nx Console, which was live for only 18 minutes. Security experts highlight a growing trend of attackers targeting developer workstations by exploiting trusted tools rather than using zero-day exploits.  https://www.bankinfosecurity.com/github-hacked-internal-repositories-offered-for-sale-a-31739

IBM and Red Hat Commit $5 Billion to Project Lightwell, Aiming to Fix Open-Source Security at Scale

Overwhelmed by an AI‑driven flood of security reports, open‑source maintainers are burning out. In response, IBM and Red Hat have launched Project Lightwell — a $5 billion, 20,000‑engineer initiative using AI to find and fix vulnerabilities across open‑source software. Lightwell will act as a trusted intermediary: enterprises feed information about the OSS they use, Lightwell engineers use AI to hunt for flaws and generate candidate patches, then work with upstream maintainers to merge fixes. Starting with the Maven/Java ecosystem, it will expand to PyPI, npm, Go, and others. The service will be offered via commercial subscriptions (launching within 30 days) that provide vetted fixes and a “stamp of approval” for production use. Critics question what exactly customers pay for if patches go upstream, and whether Lightwell becomes a de facto gatekeeper. No clear answers yet. https://www.zdnet.com/article/open-source-security-is-a-mess-ibm-and-red-hat-bet-5-billion-to-fix-it/

CVE Lite CLI – OWASP Incubator Project for Fast, Developer-Focused JS/TS Vulnerability Scanning

CVE Lite CLI is an OWASP-recognized, free, and local-first dependency vulnerability scanner for JavaScript and TypeScript projects. It scans lockfiles (npm, pnpm, Yarn, Bun) using the OSV advisory database, distinguishes direct vs. transitive vulnerabilities, and provides copy-and-run fix commands. Key features include offline scanning, auto-fix mode (--fix), HTML reports, usage-aware reachability (--usage), SARIF/JSON/CDX output for CI integration, and AI assistant skill installation. The tool shifts vulnerability scanning from slow CI pipelines to the developer's terminal, offering concrete remediation plans instead of just CVE IDs. It is actively maintained on GitHub with 311 stars, 43 forks, and is part of the OWASP ecosystem, complementing tools like Dependency-Check and Dependency-Track.  https://github.com/OWASP/cve-lite-cli

Mini Shai-Hulud Worm Leverages AI Configuration Persistence to Infect IDEs

A multi‑ecosystem worm named Mini Shai-Hulud has compromised hundreds of npm and PyPI packages (over 300 in one 22‑minute wave; 796 total historically). It runs malicious scripts during package installation to harvest credentials, SSH keys, and cloud metadata. Its key innovation, “AI Configuration Persistence,” rewrites IDE and AI assistant configuration files—such as .claude/settings.json, .vscode/tasks.json, and MCP server settings—so that malicious code executes every time a developer opens the workspace. These configuration hooks survive dependency removal, allowing the worm to persist for weeks and enabling further propagation. Defenses include monitoring for unexpected Bun downloads, auditing IDE task runners and Claude Code hooks, using supply‑chain firewalls, rotating exposed credentials, and implementing continuous policy scanning. The attack highlights a shift where supply‑chain compromises embed adversary logic into trusted developer environments, making configuration‑level ...

Ecclesiastical Forensics: Investigating Crime and Misconduct Within Religious Institutions

Ecclesiastical forensics is a niche field that applies investigative methods—ranging from document analysis and digital forensics to behavioral psychology—to matters within religious organizations. It addresses misconduct such as fraud, abuse, and manipulation, often blending canon law, civil law, and institutional dynamics. Key examples include clergy abuse cases in the Catholic Church, the Jonestown tragedy (Jim Jones), Heaven’s Gate, and the Church of Scientology. Challenges include restricted access to church records, jurisdictional complexity, delayed victim reporting, and fragmented evidence. The field requires navigating where sincere belief ends and criminal or coercive control begins. https://shouldicallthecops.com/ecclesiastical-forensics-between-faith-and-crime/

CLIProxyAPI Turns AI CLIs into a Unified API

CLIProxyAPI is an open-source proxy that converts tools like Claude Code, Gemini CLI, and Codex into OpenAI-compatible APIs. Built in Go, it provides unified routing, streaming, multi-account management, and provider abstraction for AI agent workflows. The project is gaining attention as a way to centralize access to multiple LLM ecosystems while simplifying orchestration and rate-limit management.  https://github.com/router-for-me/CLIProxyAPI

Microsoft Open-Sources RAMPART and Clarity for AI Agent Safety

Microsoft introduced two open-source tools, RAMPART and Clarity, aimed at embedding safety and security into the AI agent development lifecycle. RAMPART is a pytest-native framework that converts red-team findings into repeatable CI/CD safety tests, helping developers continuously evaluate agent behavior against adversarial and benign scenarios. Clarity focuses earlier in the process, helping teams formalize assumptions, risks, and design intent before implementation. The initiative reflects a broader “shift-left” approach to AI security, where safety becomes part of everyday engineering workflows rather than a post-deployment audit. Microsoft positions the tools as practical defenses for increasingly autonomous AI agents that can execute code, access sensitive systems, and trigger real-world actions.  https://www.microsoft.com/en-us/security/blog/2026/05/20/introducing-rampart-and-clarity-open-source-tools-to-bring-safety-into-agent-development-workflow/

TealTiger: Runtime Guardrails and Governance for AI Agents

The TealTiger project (formerly AgentGuard) positions itself as a security and governance layer for AI agents, focused on runtime policy enforcement, auditability, and compliance. The SDK supports frameworks like LangChain, CrewAI, AutoGen, and MCP-based agents, adding controls such as tool whitelisting, spend limits, human approval gates, PII protection, and signed audit trails. The ecosystem also emphasizes enterprise governance mappings for standards like CPS 230, ISO 42001, and the EU AI Act. The rebrand from AgentGuard to TealTiger preserved APIs while consolidating the Python and TypeScript SDKs under a unified identity.  https://github.com/agentguard-ai/tealtiger

OpenTaint vs Semgrep vs CodeQL: Where SAST Tools Lose the Dataflow

The article compares Semgrep, CodeQL, and OpenTaint across five increasingly complex XSS scenarios in a Java Spring application. It argues that Semgrep struggles once analysis crosses function boundaries, CodeQL weakens on deep object graphs and virtual dispatch, while OpenTaint maintains taint tracking through builders, constructor chains, and interface calls using Semgrep-style rules interpreted semantically rather than syntactically. The piece frames the core challenge of SAST as preserving dataflow visibility as software architecture accumulates abstraction layers. https://opentaint.org/blog/semgrep-vs-codeql-vs-opentaint/

Adversarial Distillation of American AI Models (NSTM-4)

This April 23, 2026 memorandum from the White House Office of Science and Technology Policy (OSTP) addresses the threat of industrial-scale adversarial distillation of U.S. frontier AI models by foreign entities, principally based in China. The document states that these campaigns leverage tens of thousands of proxy accounts and jailbreaking techniques to systematically extract capabilities from American AI models at a fraction of the cost, enabling foreign actors to release models that appear comparable on benchmarks while deliberately stripping security protocols and mechanisms that ensure models are "ideologically neutral and truth-seeking." While the U.S. supports legitimate AI distillation (producing smaller, lighter-weight models from advanced systems), the administration announces four actions: sharing threat information with U.S. AI companies, enabling private sector coordination, developing best practices to identify and mitigate industrial-scale distillation, and ex...

Skill Issues: How We Discovered Supply Chain Attack Vectors in an AI Agent Skills Marketplace

 Orca Security's research team discovered four supply chain attack primitives in a prominent AI agent skills marketplace (where developers install reusable prompt-based extensions for AI coding agents). The primitives include: (1) install count inflation — unauthenticated GET requests can trivially spoof popularity metrics; (2) non-deterministic security scanning — skills are scanned only at creation and again only when they become popular, creating a window for malicious modifications; (3) silent skill override — installing a skill with the same name as an existing one silently replaces it with no warning; and (4) no fine-grained updates — the update command refreshes all installed skills at once with no diff or changelog. The researchers demonstrated three end-to-end attack flows (bait-and-switch, nested skill injection, and delayed weaponization via update) that achieved persistent code execution through malicious skills that passed the platform's security audits. Real-world...

Inside Claude Managed Agents: Reverse-Engineering the Security Boundaries of Anthropic's Hosted Agent Runtime

This Pluto Security blog post reverse-engineers Anthropic's Claude Managed Agents (a hosted runtime where Claude runs autonomously in cloud containers with bash, file I/O, web access, and MCP tools). Key findings include: the sandbox uses gVisor with a three-layer egress control system (the same isolation engine as Claude Cowork); all outbound traffic routes through a JWT-authenticated egress proxy with TLS inspection; the JWT is readable by any process in the sandbox and reveals organization metadata, session ID, and allowed hosts; even in "limited" networking mode, six additional Anthropic infrastructure hosts (including sentry.io and a staging endpoint) are silently injected into the egress JWT beyond user configuration. Three independent layers prevent proxy bypass (no DNS, network firewall, JWT validation). The vault credential proxy is identified as the platform's strongest security property — vault secrets never enter the sandbox, structurally preventing creden...

Your AI Assistant Is Leaking Your Conversations

This research disclosure reveals structural privacy risks in four major generative AI products — Perplexity, Anthropic's Claude, xAI's Grok, and OpenAI's ChatGPT — caused by third-party trackers embedded in LLM services that leak user conversations, identities, and sensitive metadata. The researchers found 13+ third-party trackers across the four platforms, including Meta Pixel, Google Analytics, TikTok, Datadog, Intercom, and Segment. Key findings include: conversation URLs (often publicly accessible permalinks) are disclosed to advertising and tracking services; trackers can link activity to user identities via cookies and email hashes; and in Grok's case, shared conversations generate publicly accessible screenshot images with verbatim message content exposed in Open Graph metadata. The disclosure also documents that Claude forwards user events server-to-side to eleven ad platforms (Meta, LinkedIn, TikTok, Reddit, Google, Amplitude, Iterable, HubSpot, Pinterest, Pods...

Claude Platform documentation about Workload Identity Federation

This Claude Platform documentation page describes Workload Identity Federation (WIF), which lets workloads authenticate to the Claude API using short-lived OpenID Connect (OIDC) tokens from an identity provider (IdP) instead of long-lived static API keys. Supported IdPs include AWS IAM, Google Cloud, GitHub Actions, Kubernetes service accounts, SPIFFE, Microsoft Entra ID, and Okta. The workflow involves: the IdP issuing a JWT to the workload; the Anthropic SDK exchanging the JWT for a short-lived Anthropic access token; and the SDK sending the token on every request while automatically refreshing it before expiry. Key concepts include service accounts (non-human identities in an Anthropic organization), federation issuers (registered OIDC providers with issuer URL and JWKS source), and federation rules (which bridge issuers to service accounts with match conditions, target, and authorization scope). The page includes setup instructions, SDK client examples (Python, TypeScript, Go, Java...

Replaced all Chrome extensions with own vibe-coded ones for safety

Pieter Levels (@levelsio) posted that within 1.5 hours he replaced all his Chrome extensions with his own "vibe-coded" extension called SuperLevels, after one of his existing extensions updated and suddenly wanted to read his entire browser history (which he suspected was to sell to an ad company). The SuperLevels extension includes: Tab Cleaner (auto-closes tabs after inactivity with host-based exclusions), Cookie Editor (nuke all cookies or edit any), Redirect Tracer (view redirect chains), Dark Mode (per-site or all sites), X Dim (changes X background back to dark blue), Music Finder (records and identifies songs), and restores Maps and View Image links that are hidden in the EU. He stated he deleted all other extensions except uBlock Origin, because controlling the source code is much safer.  https://x.com/levelsio/status/2046271694042505451 (I completely understand this guy. Bob has brought back that lovely feeling of coding again.)

Behind the Scenes Hardening Firefox with Claude Mythos Preview

This Mozilla Hacks article details how the Firefox team used AI models, particularly Claude Mythos Preview, to identify and fix an unprecedented number of latent security bugs. The authors explain that the dynamic shifted dramatically over a few months due to more capable models and improved techniques for harnessing them — moving from AI-generated "slop" to a scalable hardening pipeline using agentic harnesses that can create and run reproducible test cases. The article provides a sample of 12 discovered bugs (from a total of 271 fixed in Firefox 150), including 15-year-old XSLT bugs, race conditions over IPC leading to sandbox escapes, JIT optimization flaws, and RLBox sandbox bypasses. The pipeline involved parallelized scanning across VMs, integration with the full security bug lifecycle, and iteration with Firefox engineers. The article notes that the models were unable to circumvent Firefox's layered defenses (e.g., frozen prototypes), demonstrating the payoff of pr...

Microsoft AntiSSRF

Microsoft AntiSSRF is a security-developed, exhaustively-tested secure code library that provides robust URL validation to mitigate Server-Side Request Forgery (SSRF) vulnerabilities. It is available as an easy-to-use drop-in library for both .NET (NuGet package: Microsoft.Security.AntiSSRF) and Node.js (npm package: @microsoft/antissrf) applications. The library automatically validates URLs and network connections, rejecting unsafe input, and provides an agent that ensures HTTP requests cannot reach internal or sensitive IP addresses. The repository emphasizes that all incoming HTTP requests are untrusted, including user-provided URLs, data from external APIs, configuration values, and even requests from backend applications. Microsoft also provides Dusseldorf, an open-source dynamic SSRF testing tool, as a complementary testing resource. The library was released publicly in May 2026 with version 1.0.0 for .NET.  https://github.com/microsoft/AntiSSRF

Dirty Frag: Universal Linux LPE

Dirty Frag is a vulnerability class discovered and reported by Hyunwoo Kim (@V4bel) that chains two Linux kernel vulnerabilities — CVE-2026-43284 (xfrm-ESP Page-Cache Write) and CVE-2026-43500 (RxRPC Page-Cache Write) — to obtain root privileges on major Linux distributions. The vulnerabilities have an effective lifetime of approximately 9 years. Unlike race-condition exploits, Dirty Frag is a deterministic logic bug with no timing window, no kernel panic on failure, and a very high success rate. The two vulnerabilities are chained because xfrm-ESP (present on most distributions) requires namespace creation privileges, which Ubuntu sometimes blocks via AppArmor, while RxRPC (loaded by default on Ubuntu) does not require namespace privileges — together they cover each other's blind spots across all major distributions. Tested distributions include Ubuntu 24.04.4, RHEL 10.1, openSUSE Tumbleweed, CentOS Stream 10, AlmaLinux 10, and Fedora 44. The repository includes proof-of-concept e...

The FTC Is Already Regulating AI — Most Companies Just Haven't Noticed

This article argues that the FTC is already actively regulating AI using existing authority under Section 5 of the FTC Act (prohibition on unfair or deceptive practices), without needing new laws from Congress. The enforcement sweep, called Operation AI Comply (launched September 2024), targets three violation categories: unsubstantiated performance claims (e.g., Workado claimed 98% accuracy but tested at 53%), capability claims that don't hold up (e.g., DoNotPay's "robot lawyer" resulted in a $193,000 settlement), and AI tools that enable deception (e.g., Rytr's testimonial generator). The largest penalty so far was an $18 million judgment against Air AI in March 2026. The March 2026 FTC AI Policy Statement extends enforcement to AI agents and automated decision-making, with key provisions including disclosure requirements (consumers must know they're interacting with AI), logging of decision-making criteria and inputs/outputs, substantiation for all "AI...

Announcing CISCO Foundry Security Spec

Cisco announced the open-source Foundry Security Spec, a battle-tested blueprint for building an agentic security evaluation system. The specification is model-agnostic and stack-agnostic, designed to help organizations shift from noisy, hallucinated alerts to verifiable security findings. Foundry is published as two main artifacts: the "spec" (eight core agent roles, five extension roles, a finding lifecycle, a coordination substrate, and roughly 130 functional requirements with rationale) and the "constitution" (eleven inviolable principles each based on real production failures). The system wraps frontier LLMs in orchestration, roles, and guardrails to produce bounded, prioritized, verifiable findings with a clear "done" signal and auditable provenance. Foundry is meant to be used with GitHub's spec-kit and pairs with Cisco's previously open-sourced Project CodeGuard (donated to CoSAI) to create a self-improving flywheel: CodeGuard rules provide...

China's first policy framework for AI agents.

On May 8, 2026, China's Cyberspace Administration, National Development and Reform Commission, and Ministry of Industry and Information Technology jointly released the "Implementation Opinions on the Standardized Application and Innovative Development of Intelligent Agents" — China's first policy framework specifically for Agentic AI. The document treats intelligent agents as a future digital infrastructure and governance object, recognizing they are fundamentally different from traditional chatbots due to capabilities like autonomous perception, long-term memory, tool use, cross-platform execution, and multi-agent coordination. The framework balances development with governance, emphasizing "safe and controllable, reliable and trustworthy" principles. Key provisions include: distinguishing decision boundaries (human-only decisions, user-authorized decisions, and autonomous agent decisions); preventing anthropomorphism and emotional dependency (especially fo...