Brex Open-Sources CrabTrap, an LLM Security Proxy for AI Agents

Brex has released CrabTrap, an open-source HTTP proxy that uses LLMs as judges to monitor and control AI agent behavior in production environments, providing a new layer of security for autonomous systems.

TL;DR

Brex has open-sourced CrabTrap, an HTTP proxy that sits between your orchestration layer and the outside world, using an LLM to judge whether an agent's actions fall within policy before they execute. It is not a firewall in the traditional sense. It is a semantic filter, and the distinction matters for how you deploy it.

Your agent just tried to send an email to a vendor outside your approved supplier list. The action looked syntactically valid. The API call would have gone through. Nothing in your rate limiter, your schema validator, or your auth layer would have caught it. CrabTrap did. That is the problem Brex is solving with CrabTrap, which they released publicly in mid-2025. It is an LLM-as-a-judge HTTP proxy designed to sit inline between an AI agent and the services it calls, evaluating each request against a policy before letting it through. The tool reflects something Brex learned the hard way running financial AI agents in production: traditional security primitives were not built for systems that reason about intent.

The mechanism underneath the proxy

The core idea is straightforward. CrabTrap intercepts HTTP traffic from an AI agent, packages the request into a context bundle, and sends that bundle to a separate LLM configured as a judge. The judge evaluates whether the request violates a policy you have defined in natural language. If it does, CrabTrap blocks the request and logs the decision. If it passes, the request continues. The "LLM-as-a-judge" framing has been circulating in evaluation research for a while now, mostly applied to output quality scoring. Brex applied it to runtime enforcement, which is a different problem with different latency constraints.

Latency is the real engineering constraint

Every request that goes through CrabTrap incurs an additional LLM call. For agents running tight loops, like a coding agent issuing dozens of tool calls per minute, that overhead adds up. Brex's design assumes you configure CrabTrap selectively, not as a universal interceptor on every HTTP call your agent makes.

The proxy is language-agnostic at the transport layer. If your agent speaks HTTP, CrabTrap can sit in front of it. That covers most production agent frameworks, including anything built on top of Claude Code, Goose, or n8n. The policy definitions live outside the agent itself, which is the right call architecturally. Policies that live inside the agent can be overridden by the agent. Policies that live in an external proxy cannot. The judge model does not have to be the same model running the agent. You could use a cheaper, faster model for routine checks and escalate to a heavier model only when the initial judgment is uncertain. Brex has not prescribed a specific model, which leaves the latency and cost tradeoffs open for teams to optimize themselves.

Where teams get this wrong

The most common mistake with any new security layer is treating it as a substitute for the layers that already exist. CrabTrap is not a replacement for your existing auth, your rate limiting, or your schema validation. It is additive. Teams that deploy it as a primary control and thin out their existing controls are creating a single point of failure that depends on an LLM making correct judgments under adversarial conditions. LLM judges can be fooled. Prompt injection is a real attack vector, and an agent that has been compromised through injection may craft requests specifically designed to look policy-compliant to a judge. This is not a hypothetical. Researchers have demonstrated that judge models are not immune to the same injection techniques that affect the models they are judging. The second mistake is writing policies in a way that mirrors natural language ambiguity. A policy that says "do not share sensitive financial data with external parties" sounds clear. In practice, it leaves enormous room for edge cases. What counts as financial data? What counts as external? The judge will interpret those terms, and its interpretation will not always match yours. Policies need to be written with the same precision you would apply to access control rules, not the same looseness you would apply to a Slack message. The third mistake is not logging the judge's reasoning. CrabTrap logs decisions, but if you are not capturing why a request was blocked, you cannot audit the system, tune the policy, or catch cases where the judge is being too aggressive. Treat the judge's reasoning traces as operational data, not noise.

Where this fits in the security timeline for AI agents

Agent security tooling is roughly where API security tooling was in 2010. The threat model was understood in theory. The tools were immature. Teams were either ignoring the problem or building bespoke solutions that did not generalize. The comparison is useful because it tells you how the market typically resolves. By 2015, API gateways had consolidated around a handful of patterns: centralized policy enforcement, traffic inspection, rate limiting, and anomaly detection. Those patterns existed before the gateways did. The gateways made them operationally tractable. CrabTrap is attempting something similar for agent HTTP traffic. The pattern, semantic policy enforcement at the transport layer, is not new. LLM-as-a-judge research has been building toward this application for two years. What Brex is contributing is an open-source implementation that teams can actually deploy, rather than a research paper describing the pattern in the abstract. The historical precedent also suggests a caution. Early API security tools were frequently misconfigured, running in "log only" mode for months before teams had enough confidence in the policies to start blocking. Expect CrabTrap deployments to follow a similar curve. The gap between "installed" and "enforcing" will be significant for many teams.

The case for skepticism

The strongest objection to CrabTrap is not that the approach is wrong. It is that the approach adds an LLM-shaped failure mode to a system that already has too many LLM-shaped failure modes. Your agent fails non-deterministically. You now have a judge that also fails non-deterministically. The intersection of those two failure surfaces is not obviously smaller than the problem you started with. A judge that incorrectly blocks legitimate requests is not a neutral outcome. It breaks your agent's task completion, and in a financial context specifically, that has real operational cost. There is also a question of attacker adaptation. If CrabTrap becomes a standard layer in agent deployments, attackers will start optimizing their injection payloads specifically to pass LLM judge evaluations. The arms race dynamic that has played out in spam filtering and adversarial ML will play out here too. Brex knows this, but the open-source release does give attackers a clear look at the system they are trying to circumvent. The skeptic's version of this argument concludes that deterministic controls, strict capability scoping, minimal permissions, isolated execution environments, are more reliable than probabilistic ones. That argument is correct as far as it goes. It does not go far enough. Deterministic controls cannot evaluate intent. An agent with read access to your CRM and email access to your customers can cause significant damage without ever exceeding its permission scope, if it decides to do something you did not anticipate. That is precisely the gap CrabTrap is trying to address. Whether a probabilistic solution to a probabilistic problem nets out to better security depends entirely on how well you tune it.

Before you mark this as production-ready

Confirm your policy definitions have been tested against at least a week of production traffic logs, not just against synthetic examples you constructed yourself
Verify the judge model you have chosen has a latency profile that does not break your agent's task completion SLAs
Check that CrabTrap is deployed as an additive layer, with existing auth and schema validation still in place underneath it
Ensure judge reasoning traces are being captured and stored somewhere you can query them during an incident
Test prompt injection scenarios against your judge configuration before enabling blocking mode
Set a review cadence for your policies, at minimum monthly, since the agent behaviors you need to constrain will drift as you update the underlying models

If you are running agentic workflows through tools like Claude Code or building orchestration pipelines in n8n, the security gap CrabTrap is targeting is real. The question is whether a semantic proxy that depends on LLM judgment is the right primary control for your threat model, or a useful secondary one. For most production environments, the answer is secondary. That is still worth deploying.