Brex Open-Sources CrabTrap, an LLM Security Proxy for AI Agents
Brex has released CrabTrap, an open-source HTTP proxy that uses LLMs as judges to monitor and control AI agent behavior in production environments, providing a new layer of security for autonomous systems.
April 26, 2026
TL;DR
Brex has open-sourced CrabTrap, an HTTP proxy that sits between your orchestration layer and the outside world, using an LLM to judge whether an agent's actions fall within policy before they execute. It is not a firewall in the traditional sense. It is a semantic filter, and the distinction matters for how you deploy it.
The mechanism underneath the proxy
The core idea is straightforward. CrabTrap intercepts HTTP traffic from an AI agent, packages the request into a context bundle, and sends that bundle to a separate LLM configured as a judge. The judge evaluates whether the request violates a policy you have defined in natural language. If it does, CrabTrap blocks the request and logs the decision. If it passes, the request continues. The "LLM-as-a-judge" framing has been circulating in evaluation research for a while now, mostly applied to output quality scoring. Brex applied it to runtime enforcement, which is a different problem with different latency constraints.Latency is the real engineering constraint
Every request that goes through CrabTrap incurs an additional LLM call. For agents running tight loops, like a coding agent issuing dozens of tool calls per minute, that overhead adds up. Brex's design assumes you configure CrabTrap selectively, not as a universal interceptor on every HTTP call your agent makes.
Where teams get this wrong
The most common mistake with any new security layer is treating it as a substitute for the layers that already exist. CrabTrap is not a replacement for your existing auth, your rate limiting, or your schema validation. It is additive. Teams that deploy it as a primary control and thin out their existing controls are creating a single point of failure that depends on an LLM making correct judgments under adversarial conditions. LLM judges can be fooled. Prompt injection is a real attack vector, and an agent that has been compromised through injection may craft requests specifically designed to look policy-compliant to a judge. This is not a hypothetical. Researchers have demonstrated that judge models are not immune to the same injection techniques that affect the models they are judging. The second mistake is writing policies in a way that mirrors natural language ambiguity. A policy that says "do not share sensitive financial data with external parties" sounds clear. In practice, it leaves enormous room for edge cases. What counts as financial data? What counts as external? The judge will interpret those terms, and its interpretation will not always match yours. Policies need to be written with the same precision you would apply to access control rules, not the same looseness you would apply to a Slack message. The third mistake is not logging the judge's reasoning. CrabTrap logs decisions, but if you are not capturing why a request was blocked, you cannot audit the system, tune the policy, or catch cases where the judge is being too aggressive. Treat the judge's reasoning traces as operational data, not noise.Where this fits in the security timeline for AI agents
Agent security tooling is roughly where API security tooling was in 2010. The threat model was understood in theory. The tools were immature. Teams were either ignoring the problem or building bespoke solutions that did not generalize. The comparison is useful because it tells you how the market typically resolves. By 2015, API gateways had consolidated around a handful of patterns: centralized policy enforcement, traffic inspection, rate limiting, and anomaly detection. Those patterns existed before the gateways did. The gateways made them operationally tractable. CrabTrap is attempting something similar for agent HTTP traffic. The pattern, semantic policy enforcement at the transport layer, is not new. LLM-as-a-judge research has been building toward this application for two years. What Brex is contributing is an open-source implementation that teams can actually deploy, rather than a research paper describing the pattern in the abstract. The historical precedent also suggests a caution. Early API security tools were frequently misconfigured, running in "log only" mode for months before teams had enough confidence in the policies to start blocking. Expect CrabTrap deployments to follow a similar curve. The gap between "installed" and "enforcing" will be significant for many teams.The case for skepticism
The strongest objection to CrabTrap is not that the approach is wrong. It is that the approach adds an LLM-shaped failure mode to a system that already has too many LLM-shaped failure modes. Your agent fails non-deterministically. You now have a judge that also fails non-deterministically. The intersection of those two failure surfaces is not obviously smaller than the problem you started with. A judge that incorrectly blocks legitimate requests is not a neutral outcome. It breaks your agent's task completion, and in a financial context specifically, that has real operational cost. There is also a question of attacker adaptation. If CrabTrap becomes a standard layer in agent deployments, attackers will start optimizing their injection payloads specifically to pass LLM judge evaluations. The arms race dynamic that has played out in spam filtering and adversarial ML will play out here too. Brex knows this, but the open-source release does give attackers a clear look at the system they are trying to circumvent. The skeptic's version of this argument concludes that deterministic controls, strict capability scoping, minimal permissions, isolated execution environments, are more reliable than probabilistic ones. That argument is correct as far as it goes. It does not go far enough. Deterministic controls cannot evaluate intent. An agent with read access to your CRM and email access to your customers can cause significant damage without ever exceeding its permission scope, if it decides to do something you did not anticipate. That is precisely the gap CrabTrap is trying to address. Whether a probabilistic solution to a probabilistic problem nets out to better security depends entirely on how well you tune it.Before you mark this as production-ready
- Confirm your policy definitions have been tested against at least a week of production traffic logs, not just against synthetic examples you constructed yourself
- Verify the judge model you have chosen has a latency profile that does not break your agent's task completion SLAs
- Check that CrabTrap is deployed as an additive layer, with existing auth and schema validation still in place underneath it
- Ensure judge reasoning traces are being captured and stored somewhere you can query them during an incident
- Test prompt injection scenarios against your judge configuration before enabling blocking mode
- Set a review cadence for your policies, at minimum monthly, since the agent behaviors you need to constrain will drift as you update the underlying models
Comments
Some links in this article are affiliate links. Learn more.