Appsec adventures

This repository hosts Mantis, a modular and stack-agnostic toolkit developed by Google, designed for AI coding agents to autonomously find, reproduce, and patch vulnerabilities. It provides a sequential set of skills (e.g., `/mantis-plan`, `/mantis-researcher`, `/mantis-reproduce`, `/mantis-patch`) that can be adapted to various domains like hardware, infrastructure as code, or ML pipelines. The toolkit emphasizes a structured pipeline including repository history analysis, architecture summarization, threat modeling, multi-threaded security scanning, deduplication, review, critic validation, crash reproduction in sandboxes, exploit chaining, patching, risk calibration, reflection, and report generation. It comes with strong safety warnings, recommending use only in isolated environments and requiring manual verification of all findings. The project is not an official Google product and is intended for demonstration and adaptation, with future plans for skill self-improvement and integ...

Harnessing Harnesses - Climbing the LLM Hills

This blog post explores the concept of "harnesses" — the orchestration layer around large language models that controls inputs, tools, prompts, models, state, validation, and outputs. The author argues that while model selection and prompt engineering matter, the orchestration layer is where the biggest improvements in capability, cost, and reliability come from. The post reviews several open-source harness frameworks useful for offensive security research, including RAPTOR (which builds a structured pipeline with validation stages), Anthropic's Code Reference (for execution-verified C/C++ findings using ASAN), Baby Naptime (a runtime exploitation loop), Evil Socket's Audit Framework (an eight-stage pipeline), and Visa's Vulnerability Agentic Harness (focused on threat modeling). It also provides guidance on designing custom harnesses, emphasizing stage-specific prompts, context window management, model routing, and the importance of memory and retrieval-augmented...

datadog-saist: AI-Native Static Application Security Testing (SAST) Tool

This repository hosts datadog-saist, an AI-native Static Application Security Testing tool developed by Datadog. Unlike traditional SAST tools that rely on parsing and rule-based analysis, it uses large language models from Anthropic, OpenAI, or Google to detect security vulnerabilities in source code. Currently in preview, it supports Go, Java, Python, C#, JavaScript, TypeScript, and Kotlin. The tool can be used standalone on a laptop and requires an API key for one of the supported LLM providers, but does not require a Datadog account. It generates industry-standard SARIF reports, builds project context for accurate analysis, and offers features like cross-file indexing and configurable concurrency. The project is written in Go, uses Tree-sitter for parsing, and is available under an open-source license. https://github.com/DataDog/datadog-saist

darknet-mcp-server: 66-Tool MCP Server for Dark Web Intelligence

This repository hosts the darknet-mcp-server, a Model Context Protocol server that unifies dark web and threat intelligence into a single platform for AI agents. It provides 66 tools across 16 data sources for breach data lookup, ransomware tracking, Tor .onion access, malware analysis, blockchain intelligence, exploit searching, and stealer log analysis. The server allows AI agents to query all sources in parallel and correlate data, replacing manual workflows across multiple platforms. It can be run with no install using npx, requires optional API keys for premium sources, and includes per-provider rate limiting and TTL caching. The project is part of a broader MCP security suite and is available under an MIT License. https://github.com/badchars/darknet-mcp-server

Cloudflare Security Audit Skill: Multi-Phase AI-Powered Security Audits for Code Repositories

Cloudflare's security-audit-skill is an open-source skill for AI coding agents that performs structured, multi-phase security assessments of software repositories. Rather than relying on ad hoc vulnerability searches, it guides the agent through a six-phase workflow consisting of reconnaissance, parallel vulnerability hunting across multiple attack classes, adversarial validation by independent agents, deduplication, machine-readable report generation, and schema validation. The methodology emphasizes reporting only demonstrably exploitable vulnerabilities with concrete attack scenarios, while reducing false positives through independent verification. Designed as the foundation of Cloudflare's Vulnerability Discovery Harness (VDH), the skill supports coding agents capable of tool use and parallel sub-agents, producing actionable security findings instead of generic best-practice recommendations. https://github.com/cloudflare/security-audit-skill

GhostApproval: A Trust Boundary Gap in AI Coding Assistants

Wiz Research introduces GhostApproval , a category-wide vulnerability affecting multiple AI coding assistants, including Amazon Q Developer, Claude Code, Cursor, Google Antigravity, Augment, and Windsurf. The attack exploits symbolic links (CWE-61) to trick AI agents into writing to sensitive files outside the project workspace, while approval dialogs often display only the benign-looking workspace path rather than the actual target (CWE-451: User Interface Misrepresentation of Critical Information). In some cases, the agent internally recognizes the true destination but presents misleading information to the user, undermining the "human-in-the-loop" security model; in Windsurf, writes were observed to occur before user approval. The research recommends resolving canonical paths before prompting users, clearly displaying the real destination of file operations, enforcing workspace boundary validation, and ensuring that no filesystem changes occur until explicit authorization ...

GitLost: How We Tricked GitHub's AI Agent Into Leaking Private Repositories

Noma Security introduces GitLost , an indirect prompt injection attack against GitHub Agentic Workflows that can coerce AI-powered GitHub agents into disclosing data from private repositories. By embedding malicious instructions inside a seemingly legitimate public GitHub issue, an attacker can exploit agents that have overly broad repository permissions, causing them to retrieve confidential files and publish them in public comments. The research demonstrates that prompt-based guardrails can be bypassed with minor linguistic changes, highlighting that the root cause is architectural rather than a simple implementation bug. The authors recommend enforcing least-privilege permissions, isolating agents that process untrusted content from those with access to sensitive repositories, limiting public output channels, and treating all externally supplied content as untrusted input. https://noma.security/blog/gitlost-how-we-tricked-githubs-ai-agent-into-leaking-private-repos

Build Your Own Vulnerability Harness

Cloudflare presents the architecture behind its AI-powered Vulnerability Discovery Harness (VDH), a model-agnostic framework for large-scale vulnerability research. Rather than relying on a single coding agent, the system orchestrates multiple specialized AI agents across stages including reconnaissance, vulnerability hunting, adversarial validation, deduplication, dependency tracing, feedback, and reporting. Findings are independently verified using a separate Vulnerability Validation System (VVS) powered by a different model, reducing false positives through adversarial cross-checking. The post emphasizes that orchestration—not any specific LLM—is the key to scalable AI-assisted security research, and describes techniques for persistent state management, automated triage, cross-repository analysis, and continuous vulnerability discovery. Cloudflare also releases its initial security-audit skill as an open-source starting point for building similar security workflows. https://bl...

Solving the Identity Crisis for AI Agents

This Uber Engineering blog describes how the company redesigned its identity and access management architecture to securely support production AI agents. Instead of treating agents as generic service accounts, Uber gives each agent a unique cryptographic identity, issues short-lived JWTs through a Security Token Service (STS), and propagates an actor chain that preserves the originating user and every intermediary agent involved in a workflow. The architecture leverages SPIFFE/SPIRE workload identities, scoped credentials, MCP-aware authorization, and end-to-end audit trails, enabling fine-grained access control, accountability, and secure delegation across multi-agent systems while reducing the risks of overprivileged agents and poor attribution. https://www.uber.com/us/en/blog/solving-the-agent-identity-crisis/

Metano SkillTracer: Free AI Agent Skill Security Scanner

SkillTracer is a free security scanner from Metano Labs that analyzes AI agent skills before they are installed or executed. It combines static analysis (SAST) with dynamic sandbox execution (DAST) to detect malicious behaviors such as credential theft, prompt injection, remote code execution, data exfiltration, MCP/tool poisoning, and obfuscated code. The scanner produces transparent risk and threat scores aligned with the OWASP Agentic Top 10, OWASP AIVSS, and MITRE ATLAS, along with evidence-backed reports to help developers evaluate the safety of third-party AI skills. https://labs.metano.ai/scanner

NIST Enrichment Reductions Impact CVE Coverage, Accuracy

This article examines the impact of NIST's decision to prioritize enrichment for only a subset of CVEs in the National Vulnerability Database (NVD). While the change helps the agency address a growing backlog by focusing on high-impact vulnerabilities—such as those in CISA's Known Exploited Vulnerabilities (KEV) catalog, federal software, and critical infrastructure—it also leaves many newly disclosed CVEs without NIST-provided CVSS scores, CPE mappings, or additional analysis. Researchers warn that organizations relying heavily on NVD enrichment may face reduced visibility and less accurate vulnerability prioritization, increasing the need for alternative intelligence sources and risk-based vulnerability management practices. https://www.darkreading.com/vulnerabilities-threats/nist-enrichment-reductions-cve-coverage-accuracy

SecureAI-Scan: Static Analysis for AI and LLM Security Vulnerabilities

SecureAI-Scan is an open-source, local-first static analysis tool designed to identify security issues unique to AI-powered applications that traditional SAST tools often miss. It analyzes JavaScript, TypeScript, and Python codebases for vulnerabilities such as prompt injection, insecure prompt construction, excessive data exposure, unsafe MCP configurations, and other risks mapped to the OWASP LLM Top 10. The tool performs dataflow analysis, generates AI Bills of Materials (AI-BOMs), supports SARIF output and GitHub Code Scanning integration, and is designed to run entirely offline for privacy-sensitive environments. https://github.com/akanthed/SecureAI-Scan

Benchmarking Coding Agents on Databricks' Multi-Million-Line Codebase

This blog presents Databricks' methodology for evaluating AI coding agents against a large-scale, production-grade codebase containing millions of lines of code. Rather than relying on synthetic benchmarks, the evaluation measures how well agents understand complex repositories, navigate dependencies, generate accurate code changes, and solve real engineering tasks. The post discusses the benchmarking framework, key performance metrics, and lessons learned from comparing frontier coding agents, providing practical guidance for organizations looking to assess AI-assisted software development in enterprise environments. https://www.databricks.com/blog/benchmarking-coding-agents-databricks-multi-million-line-codebase

Promptfoo Red Team Quickstart: Automated Security Testing for LLM Applications

This guide walks through using Promptfoo to perform automated red teaming of LLM applications. It demonstrates how to configure targets, generate adversarial prompts, execute security evaluations, and measure vulnerabilities such as prompt injection, jailbreaks, data leakage, and unsafe tool use. The quickstart also shows how Promptfoo integrates into CI/CD pipelines, enabling continuous AI security testing and helping organizations validate the robustness of generative AI applications before deployment. https://www.promptfoo.dev/docs/red-team/quickstart/

Micro-Agent: Frontier Model Performance with Efficient Small Language Models

This blog introduces Micro-Agent , a lightweight agent framework that enables small language models to achieve frontier-level performance on complex reasoning and software engineering tasks. By combining structured workflows, iterative planning, and tool use, Micro-Agent demonstrates that well-orchestrated small models can rival much larger LLMs while significantly reducing inference costs and latency. The post presents benchmark results, discusses the underlying agentic architecture, and highlights the potential for deploying capable AI agents on resource-constrained environments. https://vllm.ai/blog/2026-06-29-micro-agent-frontier-models

LLM Jailbreak Testing with Jailbreaker

This article introduces Jailbreaker , an open-source framework from SpecterOps for systematically testing the resilience of large language models against jailbreak attacks. It explains how security teams can automate prompt injection and jailbreak evaluations, measure the effectiveness of model guardrails, and identify weaknesses before deployment. The post covers the framework's architecture, testing methodology, and integration into AI security assessments, positioning Jailbreaker as a practical tool for red teaming LLM-powered applications and continuously validating AI safety controls. https://specterops.io/blog/2026/06/29/llm-jailbreak-testing-with-jailbreaker

Clone This Repo and I Own Your Machine: Exploiting Git Clone for Remote Code Execution

This research demonstrates how specially crafted Git repositories can abuse features such as submodules, hooks, symbolic links, and filesystem quirks to achieve remote code execution or other unintended behavior on a victim's machine during or shortly after cloning. The article analyzes multiple attack techniques, discusses platform-specific nuances, and highlights how developer workflows can become an attack vector in software supply chain compromises. It also provides practical mitigations, emphasizing the importance of keeping Git up to date, disabling unnecessary features, and exercising caution when cloning untrusted repositories. https://0din.ai/blog/clone-this-repo-and-i-own-your-machine

Bullying LLMs into Submission: Building an Autonomous AI-Powered Zero-Day Hunting Pipeline

This technical deep dive describes how a security researcher built an autonomous vulnerability discovery platform that combines Claude Code, Model Context Protocol (MCP), fuzzing, reverse engineering, retrieval-augmented generation (RAG), and custom tooling to identify zero-day vulnerabilities at scale. The article details a workflow that treats every AI-generated finding as a potential hallucination until validated through multiple verification gates, integrates historical knowledge and bug bounty intelligence to prioritize targets, and continuously improves through feedback from previous campaigns. The result is an AI-assisted vulnerability research system that reduces manual overhead while maintaining rigorous human validation to minimize false positives. https://blog.zsec.uk/bullyingllms

GuardDog 3.0: Smarter Detection for Malicious Open-Source Packages

GuardDog 3.0 is a major update to Datadog's open-source supply chain security scanner for npm, PyPI, and other package ecosystems. The release replaces Semgrep with a YARA-based scanning engine for faster, more scalable static analysis, introduces a new risk scoring system that correlates multiple indicators into an overall maliciousness score, and adds built-in sandboxing using nono-py to safely analyze untrusted packages. The new architecture aims to improve detection accuracy while reducing false positives, making GuardDog more effective at identifying modern software supply chain threats without relying on LLMs or dynamic code execution. https://securitylabs.datadoghq.com/articles/guarddog-3-0-release

Exploitarium: Public Archive of Zero-Day Exploit PoCs and Vulnerability Research

Exploitarium is a curated collection of proof-of-concept (PoC) exploits and vulnerability research targeting a wide range of open-source software. The repository publishes detailed technical write-ups and exploit code, many of which were released before coordinated disclosure to affected maintainers, making it both controversial and highly educational. According to the author, the goal is to lower the barrier to entry for vulnerability research by demonstrating real-world exploitation techniques while encouraging responsible use. The project has attracted significant attention from the cybersecurity community due to its AI-assisted vulnerability discovery workflow and the publication of previously undisclosed security issues. https://github.com/bikini/exploitarium

The Orchard Bug and the Unfolding Cybersecurity Reckoning

Ben Goertzel argues that the Zcash Orchard bug, which allowed potential counterfeit ZEC creation and was undetected for four years until an AI model (Anthropic's Opus 4.8) found it, is an early tremor of a coming wave where AI will expose latent vulnerabilities across all software, from crypto to traditional finance. He contends the solution is well-known—formal verification and "correct-by-construction" derivation from mathematical specifications, as being implemented in the ASI:chain project—but the software industry has avoided it due to cost and speed pressures. Goertzel concludes that the same AI capabilities driving this reckoning can also enable large-scale formal verification, making the threat and remedy essentially the same technology, and urges a broad shift toward making mathematical proof the default expectation in software development. https://bengoertzel.substack.com/p/the-orchard-bug-and-the-unfolding

The New MCP Specification: What Security Teams Must Prepare For

This Akamai blog post by Maxim Zavodchik, Segev Fogel, and Gal Meiri analyzes the upcoming July 28, 2026, update to the Model Context Protocol (MCP), which transitions it to an enterprise-grade, stateless architecture. While the update eliminates major protocol-level risks like session hijacking and weak authentication, it shifts critical security responsibilities to developers, introducing new attack surfaces including cross-agent workflow hijacking via untrusted client state objects, client-controlled metadata manipulation, header confusion attacks, stored XSS in new MCP interactive apps, and denial-of-service risks from long-running background tasks. The authors conclude that security teams must now treat all client-provided state and metadata as untrusted, enforce cryptographic verification, output encoding, and resource quotas, as the protocol's security posture now depends entirely on implementation quality rather than protocol-level guarantees. https://www.akamai.com/blog/s...

Prompt Injection as Role Confusion

This ICML 2026 paper by Ye, Cui, and Hadfield-Menell presents a theory that prompt injection attacks succeed because LLMs perceive roles (like user , assistant , tool , think ) through insecure surface features like writing style rather than through the secure structural tags themselves. Using role probes to measure internal token perceptions, the authors demonstrate that sounding like a privileged role (e.g., mimicking reasoning style) overrides the actual tag, enabling attacks like CoT Forgery where fake reasoning in a user prompt achieves a ~60% success rate, while simply destyling the text drops success to 10%. They argue that roles are a hacked-together format trick that became critical cognitive and security infrastructure, and that unless models achieve genuine role perception, defense will remain a whack-a-mole game, opening the door to subtler threats like subconscious steering of LLM states for commercial or adversarial purposes. https://role-confusion.github.io/

Vulnerability Reports Are Not Special Anymore