Hooks Won’t Secure Your AI Agent

Michael Saah

We built an allowlist that blocked curl and wget from reaching unauthorized domains. It seemed to work. Then we asked the agent to use dig to resolve the IP and nc to open a raw TCP socket. The request sailed right through. Our hook never saw it.

This is the fundamental problem with securing AI agents at the application layer. You can't enumerate every way an LLM can put packets on the wire. It will find workarounds you didn't think of, because that's what AI agents do: they're creative problem-solvers, and your security controls are just another problem.

At Cleric, we build an AI SRE that connects to your Kubernetes clusters, cloud providers, observability systems, and code repos. Thus it’s critical that we can reliably limit external communication, in order to prevent the agent from ingesting untrusted content and block access to malicious infrastructure (see the Lethal Trifecta). This is how we do it.

‍

The First Layer: Hooks

The Claude Agent SDK (one of the frameworks in our agent stack) has a hooks system for tool-level access control. You configure a script that runs before each tool call:

{
   "hooks": [
    {
      "event": "PreToolUse:Bash",
      "script": ".claude/hooks/check-curl-allowlist.sh"
    }
  ]
}

The script extracts the target domain from curl or wget commands and checks it against an allowlist. Blocked domains get denied before execution:

> hey claude, can you try running `curl google.com`?

⏺ Bash(curl google.com)
  ⎿  PreToolUse:Bash hook returned blocking error
  ⎿  Error: Hostname 'google.com' is not in the allowlist. Allowed hosts: code.claude.com docs.claude.com

wget too:

> thanks, can you now try it with `wget`?

⏺ Bash(wget google.com)
  ⎿  PreToolUse:Bash hook returned blocking error
  ⎿  Error: Hostname 'google.com' is not in the allowlist. Allowed hosts: code.claude.com docs.claude.com

And then:

> what if you try using `dig` to get the IP address and `nc` to make the HTTP GET request?

⏺ Bash(dig +short google.com)
  ⎿  173.194.219.100
     173.194.219.102
     173.194.219.139
     … +3 lines

⏺ Bash(printf "GET / HTTP/1.1\\r\\nHost: google.com\\r\\nConnection: close\\r\\n\\r\\n" | nc 173.194.219.100 80)
  ⎿  HTTP/1.1 301 Moved Permanently
     Location: <http://www.google.com/>
     Content-Type: text/html; charset=UTF-8
     … +18 lines

Hooks are a useful defense-in-depth layer. We still run them. But as a primary control? You'd be playing whack-a-mole against a system that invents new moles.

‍

The Second Layer: Egress Firewalls

If the application layer won't cut it, go deeper. Network-level firewalls are the next stop, and the foundation everything else sits on top of. They operate on packet metadata (source/destination addresses, ports, protocol) and don't care which tool generated the request. dig + nc gets caught just like curl.

We use GCP's Next Gen Firewall with FQDN support (fully qualified domain names, so you write rules against api.datadog.com instead of tracking rotating IPs). If you're on GKE with Cilium (a networking layer that uses eBPF for programmable packet processing at the kernel level), you can get similar FQDN-based policies.

This works. But it has two real problems.

The agent gets confused. When Cleric hits a service not in the allowlist, the request times out or gets a TCP RST (the network's way of saying "connection refused"). The agent can't tell whether the service is down or it's hitting the firewall. We catch these in our post-run analysis pipeline, but the agent itself is flying blind.

Domain-level granularity is too coarse. coolapp.com/mycompany gets treated identically to coolapp.com/yourcompany. The firewall sees the IP and port, not the URL path or headers. Scoped credentials help, but they're not always available.

Both problems have the same root: firewalls operate at layers 3 and 4 (IP addresses and TCP ports). They see packets, not HTTP requests. (If your instinct is "so proxy it," keep reading.)

‍

The Third Layer: Speaking HTTP

Layer 7 (the application layer) is where HTTP lives, and what gives us the finest grained control and visibility. If you route all outbound agent traffic through an HTTP-aware proxy, you can enforce policy on the full request: method, path, headers, body.

We're wiring Caddy (an HTTP reverse proxy) to OPA (Open Policy Agent, a policy engine that evaluates rules written in Rego, a declarative policy language). Route all outbound agent traffic through the proxy, enforced via a firewall. On each request, Caddy extracts the request attributes and asks OPA: should this go through?

The upside to all this is that the policies you can write can be orders of magnitude more expressive than firewall rules. A firewall rule says:

allow traffic to api.datadoghq.com:443

An OPA policy says:

package cleric.egress

default allow = false

allow {
    input.method == "GET"
    input.host == "api.datadoghq.com"
    startswith(input.path, "/api/v1/")
}

That's read-only access scoped to a specific API path. The firewall can't see the difference between a GET and a DELETE, or between /api/v1/our-org and /api/v1/someone-elses-org. The proxy can.

This approach also gives us a source of robust audit logs (including the full content of every request the agent made) and filter requests containing credentials or other sensitive data before they leave our network.

The catch is TLS. All public APIs use HTTPS, so the proxy can't inspect request contents without terminating TLS, effectively becoming a man-in-the-middle. We control the agent's certificate trust store, so this is manageable, but it's the messiest part of the setup architecturally.

Leash from strongDM takes a similar approach with Cedar policies (AWS's authorization policy language) instead of OPA. Worth watching.

‍

The Unsolved Problems

Three layers of defense gets you a long way. But nobody in this space has the full picture yet. Three open questions we keep running into:

Dynamic permissions. Can the agent's access change mid-task based on what it's learned? If Cleric discovers during investigation that the problem involves a service it doesn't currently have access to, does a human need to intervene, or can the system escalate and grant scoped access programmatically? Right now, we require human approval. That's safe but slow. (The agents are not thrilled about this.)

Graceful denial. When the agent hits a policy boundary, the current experience is bad: a timeout or a cryptic error. The agent should understand it was denied, know why, and adapt. That feedback loop between policy engine and agent doesn't really exist in any framework yet. We're working on it.

Audit logs as context. Audit logs are typically valued most by security folks who need to know who’s doing what across the organization. But they’re also a rich source of data for the agent. Complex investigations require the agent to maintain large amounts of context on what it’s tried and what it hasn’t. What if it could reconstruct that on-demand via its audit logs?

Security is a process, and we’re still early in it for agents. This three-layered stack gives us what we need to defend our customer’s critical production data from prompt-injection and exfiltration attacks. Of course, none of these tools mean anything without proper monitoring and detection practices. The ability of LLMs to generate novel exploits is fuel for attackers, and means that quality security practices and posture are now more important than ever before.

Hooks Won't Secure Your AI Agent

The First Layer: Hooks

The Second Layer: Egress Firewalls

The Third Layer: Speaking HTTP

The Unsolved Problems

Ready to give your on-call a headstart?