Posts

MRAgent: Graph Memory for LLM Agents

MRAgent is a retrieval-augmented QA system that builds a graph-structured episodic memory from long, multi-session dialogues instead of using simple vector retrieval. It operates in two phases: first, it rewrites dialogue turns into self-contained sentences (resolving pronouns, converting dates), extracts keywords, and stores everything as a graph with nodes for key entities, episodes, topics, and personal facts; second, it answers questions by running a tool-calling reasoning loop where the LLM uses seven specialized graph query tools (e.g., by topic, time, personal info, event context) to retrieve relevant memory and produce an answer. Evaluated on LoCoMo and LongMemEval benchmarks, the system uses OpenRouter for LLM access, caches intermediate results to avoid redundant work, and includes an LLM-as-judge evaluation script—treating memory as a reconstructed graph rather than retrieved chunks for more structured, context-aware querying in long-form conversational AI.  https://gith...

Un-Jailbreakable AI Doesn't Exist—But Open, Neural-Symbolic Gets Closest

Perfectly "un-jailbreakable" AI models don't exist—it's an unrealistic goal. But the best way to get close is neural-symbolic AI combined with open-source models, not closed proprietary systems. The real threat isn't simple prompt injection, but "capability-elicitation attacks"—where an AI follows instructions but is gradually coaxed over hundreds of prompts into producing something dangerous. The solution: a "generate, then verify" pipeline. Let the neural model generate outputs, but quarantine risky ones and pass them through a symbolic verification layer (formal analyzers, sandboxes, logic engines) that rigorously judges what the output actually does. This is more reliable than just using another LLM to check things. Why openness helps: An open ecosystem can field a diverse ensemble of specialist verifiers—far better than any single company. Independent verifiers with different blind spots make the system harder to game. While bad actors can...

MLSec Application Security Testing Guide (MLASTG)

The MLASTG is an open-source framework for security testing machine learning (ML) and large language model (LLM) systems, designed for enterprise and defense-grade verification. Inspired by OWASP standards and aligned with MITRE ATLAS, NIST AI RMF, and the EU AI Act, it provides three core components: a verification standard (MLASVS) with 168 verifiable controls across seven categories (e.g., data, model, LLM-specific, supply chain), a testing guide with detailed test cases and Python scripts, and a weakness enumeration (MLASWE). It defines two testing levels—L1 (Standard) and L2 (Defense-in-Depth)—for different risk profiles. The project is in active development (v0.1) and includes an executable CLI and a website deployment, welcoming community contributions across test cases, translations, and new coverage areas.  https://github.com/bb1nfosec/MLASTG

The Jinn Guard: Kernel-Aware Agent Governance Daemon

The Jinn Guard is a research prototype for a kernel-aware governance daemon that enforces safety constraints on autonomous AI agents before they execute any action. It operates over Unix domain sockets, using a multi-stage decision pipeline that includes HMAC-based authentication, agent identity verification, intent allowlisting, behavioral drift detection, and a Z3 SMT solver to check formal policy invariants. The system integrates with eBPF-LSM for kernel-level telemetry and enforcement, and maintains a tamper-evident, hash-chained audit log. The provided benchmarks claim high performance (sub-millisecond decisions) and demonstrate resilience against various attacks (replay, forgery, quota exhaustion). It includes a Python SDK for agent integration, a systemd service, and a Docker-based sandbox for mandatory mediation testing. The project is positioned as a validated prototype with a clear security model, but notes limitations regarding filesystem path resolution and interpreter chai...

Firewall/MDM/EDR for Coding Agents

A security product designed to enforce organization-wide guardrails on AI coding agents. It allows security leaders to set policies scoped by team, which are then enforced across all agents, while engineers can tailor policies to their individual workflows with inline monitors. The product is pre-tuned with over 40 real-world failure modes to provide baseline security that can be adapted to an organization's specific environment. The page offers a guided deployment option where the provider assists with setup and configuration.  https://watcher.apolloresearch.ai/landing/index.html

Securing the Nation Against Advanced Cryptographic Attacks

This executive order establishes a national policy to transition U.S. federal information systems to post-quantum cryptography (PQC) to protect against the threat of future quantum computer attacks. It mandates that all agencies designate a PQC migration lead and sets specific deadlines: high-value assets and high-impact systems must transition to PQC for key establishment by December 31, 2030, and for digital signatures by December 31, 2031. The order directs NIST to initiate a pilot project, requires the Federal Acquisition Regulatory (FAR) Council to propose rules mandating contractor compliance by 2030, and calls for public guidance on a "cryptographic bill of materials." It also tasks relevant agencies with assisting critical infrastructure owners, engaging international partners, and accelerating the validation of cryptographic modules through the NIST program.  https://www.whitehouse.gov/presidential-actions/2026/06/securing-the-nation-against-advanced-cryptographic-at...

Protect U Back: A Local Pre-I/O Audit Gate for AI Agents

Protect U Back (PUB) is a local pre-I/O audit gate and supervisor for AI coding agents, designed to enforce a simple rule: any agent action must leave observable evidence before it is allowed to affect the real world. It operates by intercepting proposed tool calls and filesystem or shell actions, normalizing them into auditable "envelopes," observing the state of a protected surface before and after the action, and deciding to `PASS`, `HOLD`, `KILL`, or `QUARANTINE` the action. The system uses an "X-ray" layer to take snapshots and compute residuals based on a process equation, ensuring that any unobserved or mutated state triggers a `HOLD`. It is not a prompt filter but an action inspector, designed to prevent silent data exfiltration or system modification. The project provides a launcher to run Claude Code or Codex CLI through this gate, and on Linux/WSL2 can additionally confine the agent inside a `bwrap` cage. The repository includes a reproducible credential-...