Container Hardening Against Agentic AI

Oct 29, 2025

tl;dr: Container privilege is a lattice you can compose. Capabilities, seccomp, LSMs, and cgroups each form their own partial orders of “allowed behaviors.” By taking meets (intersections) across these lattices, and avoiding dangerous joins (unions), we can systematically squeeze an agentic AI’s room to act, even if it’s reflective, tool-using, and self-adapting.

Why “agentic AI” changes the threat model

Traditional container hardening assumes:

deterministic binaries,
narrow I/O channels,
predictable syscall envelopes.

Agentic AI breaks those assumptions. Agentic AI can autonomously:

explores system APIs (files, sockets, procfs, cgroups),
chains tools (package managers, CLIs),
prompts humans or other services for help (lateral socio-technical movement).

If the container’s “behavior budget” is leaky, the model will find it. Linux capabilities give us a principled control surface to shrink this budget.

Capabilities as a lattice

Linux capabilities split root’s powers into ~40+ named privileges (e.g., CAP_NET_ADMIN, CAP_SYS_PTRACE, CAP_BPF). A capability set is any subset of these names.

The family of all capability sets with the subset order (⊆) forms a Boolean lattice:

Bottom (⊥): ∅ — no capabilities.
Top (⊤): All — full ambient power (effectively root-ish).
Meet (∧): set intersection — “only powers present in both.”
Join (∨): set union — “anything allowed by either.”

This is powerful because:

Lockdown is a sequence of meets across different policies (cap drop, seccomp allow-list, LSM denials).
Any “temporary exception” is a join. You should make joins as local, short-lived, and observable as possible.

A tiny Hasse diagram (subset order) for illustration

Consider S = {CAP_NET_BIND_SERVICE, CAP_NET_ADMIN, CAP_SYS_ADMIN}.

         {BIND, NET, SYSADM}
        /          |        \
   {BIND, NET}  {BIND, SYS}  {NET, SYS}
      /     \       /   \        /    \
   {BIND}  {NET}  {BIND} {SYS}  {NET} {SYS}
        \      \      \     \      \     \
                          ∅

Edges indicate “covers” in the subset order.
Hardening is moving down the diagram (toward ∅).
Any time you “just add CAP_NET_ADMIN,” you’re climbing up.

In practice you won’t draw the whole lattice (it’s enormous). But this mental model clarifies why even “small” adds—like NET_ADMIN—are big structural jumps.

“Gluing” lattices: composing defenses

Capabilities don’t act alone. You have other controls that also form partial orders:

Seccomp: the set of allowed syscalls (subset order).
LSM (SELinux/AppArmor): the set of allowed label transitions & accesses.
Mount namespaces: the set of visible paths and mount flags.
cgroups: the set of resource actions (e.g., device access).

Each is a lattice of behaviors. The container’s effective power is the product lattice:

L_effective = L_caps × L_seccomp × L_lsm × L_mounts × L_cgroups × ...

Your runtime policy is the meet across components:

Policy_effective = Policy_caps ∧ Policy_seccomp ∧ Policy_lsm ∧ ...

A compromise or misconfig in one layer won’t explode the total surface as long as other layers keep meeting it down. This is the “glue.”

Tactics: from lattice theory to YAML and flags

1) Start from ⊥: drop everything, add back surgically

Docker / nerdctl

docker run \
  --cap-drop=ALL \
  --cap-add=NET_BIND_SERVICE \  # if you need to bind :80/:443
  --security-opt no-new-privileges \
  --pids-limit=128 \
  --read-only \
  --tmpfs /tmp:rw,nosuid,nodev,noexec,size=16m \
  --mount type=bind,src=/var/run/sockets/app.sock,dst=/app.sock,ro \
  -e SYSLOG_ADDR=unixgram:///app.sock \
  myimage:latest

Kubernetes (PodSecurityContext)

securityContext:
  runAsNonRoot: true
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  capabilities:
    drop: [”ALL”]
    add: [”NET_BIND_SERVICE”]   # only if absolutely required

Notes:

Prefer NET_BIND_SERVICE over NET_ADMIN. Binding to 80/443 does not require full network admin.
Avoid CAP_SYS_ADMIN (the “god” capability). If you see it, ask why exactly.

2) Bound the bounding set and ambient set

At exec time, the kernel derives Permitted, Effective, Inheritable, Ambient, and Bounding sets. The bounding set (CapBnd in /proc/self/status) caps the maximum capabilities a process can ever gain.

Minimize the bounding set at the entrypoint (init/launcher), before starting any AI agent subprocesses. Many runtimes expose this, but even in raw Linux an early prctl(PR_CAPBSET_DROP, …) can ratchet power down.

3) Use seccomp as an orthogonal meet

Capabilities gate who may do powerful syscalls. Seccomp gates which syscalls exist at all. An agent that can’t call ptrace, bpf, mount, clone3, userfaultfd, or keyctl is much less agentic.

Start from a tight allow-list (e.g., a distro’s “default” profile) and add as needed.
Remember: seccomp ∧ capabilities is strictly stronger than either alone.

4) Seal the filesystem & IPC envelopes

read-only rootfs + tmpfs for write points.
noexec where feasible (/tmp, app workdirs).
private /proc//sys subsets via mount namespaces and maskedPaths/readonlyPaths (K8s).
Consider CAP_IPC_LOCK removal and device cgroup deny-lists.

5) Be wary of the “new” caps

Modern kernels split powers into narrower caps:

CAP_BPF, CAP_PERFMON, CAP_CHECKPOINT_RESTORE
CAP_AUDIT_READ
These look “niche” but are high-leverage for exploration, memory scraping, and persistence. Default to drop.

A concrete capability recipe (web service)

Goal: TLS-terminating HTTP service on :443, no shelling out, no package installs, no tracing, no eBPF.

Capability set:

Add: NET_BIND_SERVICE
Drop (explicit, belt-and-suspenders): SYS_ADMIN, NET_ADMIN, SYS_PTRACE, BPF, PERFMON, SYS_MODULE, SYS_TIME, SYS_BOOT, SYS_NICE, SYS_PACCT, SYS_TTY_CONFIG, SYS_RESOURCE, AUDIT_*, MKNOD, SETPCAP, MAC_*, CHECKPOINT_RESTORE, and anything not explicitly needed.

Seccomp allow:

Basic I/O: read, write, close, recvfrom, sendto, epoll_*, futex, clock_*, nanosleep.
No ptrace, bpf, keyctl, mount, umount2, clone3, setns, io_uring_*, userfaultfd.

FS:

readOnlyRootFilesystem: true
tmpfs at /tmp with noexec
Bind-mount only the certs dir as ro.

This lands you near the bottom of the capability lattice while staying functional.

Observability: prove you’re near ⊥

Inside the container:

# Show effective sets
grep -E ‘Cap(Prm|Eff|Inh|Amb|Bnd)’ /proc/self/status

# Translate to names
capsh --print  # if available

# Show seccomp mode (2 = filter)
grep Seccomp /proc/self/status

Automate checks in CI: fail builds if any image requires CAP_SYS_ADMIN, NET_ADMIN, or BPF unless an allowlist entry explains why and links to a design doc.

Policy patterns that map cleanly to lattice ops

Least privilege by construction: generate pod specs from a capability blueprint; services declare intents (“bind low port”) that compile to named meets (drop-all ∧ add NET_BIND_SERVICE).
Ephemeral joins with leases: for maintenance, issue time-limited capability grants (e.g., via a sidecar that re-execs with a bounded set, then exits). The lease expiry is a forced meet back to baseline.
Cross-lattice guardrails: deny ptrace in both caps and seccomp; deny BPF in both caps and LSM. Dual denial reduces the blast radius of a single misconfig.

Checklist: agentic-AI ready

cap-drop=ALL (or equivalent) and explicit minimal cap-add.
No SYS_ADMIN, NET_ADMIN, SYS_PTRACE, BPF, PERFMON, CHECKPOINT_RESTORE.
Seccomp: allow-list, deny ptrace, bpf, keyctl, mount, clone3, userfaultfd, io_uring_*.
no_new_privileges, read-only rootfs, tmpfs with noexec.
Bounding set minimized at entrypoint; ambient cleared.
CI gate on cap set; regression test (e.g., the Rust 2024 probe above).
Observability: export Cap* from /proc/self/status and seccomp mode.

Closing thought

Thinking in lattices forces discipline: every exception is a join that must be justified; every layer is another meet that buys safety. In an era of exploratory, tool-using AI, this algebra is the difference between “a clever model wandered off” and “it couldn’t—there was nowhere to go.”

Brandon's Substack

Discussion about this post

Ready for more?