Container Hardening Against Agentic AI
tl;dr: Container privilege is a lattice you can compose. Capabilities, seccomp, LSMs, and cgroups each form their own partial orders of “allowed behaviors.” By taking meets (intersections) across these lattices, and avoiding dangerous joins (unions), we can systematically squeeze an agentic AI’s room to act, even if it’s reflective, tool-using, and self-adapting.
Why “agentic AI” changes the threat model
Traditional container hardening assumes:
deterministic binaries,
narrow I/O channels,
predictable syscall envelopes.
Agentic AI breaks those assumptions. Agentic AI can autonomously:
explores system APIs (files, sockets, procfs, cgroups),
chains tools (package managers, CLIs),
prompts humans or other services for help (lateral socio-technical movement).
If the container’s “behavior budget” is leaky, the model will find it. Linux capabilities give us a principled control surface to shrink this budget.
Capabilities as a lattice
Linux capabilities split root’s powers into ~40+ named privileges (e.g., CAP_NET_ADMIN, CAP_SYS_PTRACE, CAP_BPF). A capability set is any subset of these names.
The family of all capability sets with the subset order (⊆) forms a Boolean lattice:
Bottom (⊥):
∅— no capabilities.Top (⊤):
All— full ambient power (effectively root-ish).Meet (∧): set intersection — “only powers present in both.”
Join (∨): set union — “anything allowed by either.”
This is powerful because:
Lockdown is a sequence of meets across different policies (cap drop, seccomp allow-list, LSM denials).
Any “temporary exception” is a join. You should make joins as local, short-lived, and observable as possible.
A tiny Hasse diagram (subset order) for illustration
Consider S = {CAP_NET_BIND_SERVICE, CAP_NET_ADMIN, CAP_SYS_ADMIN}.
{BIND, NET, SYSADM}
/ | \
{BIND, NET} {BIND, SYS} {NET, SYS}
/ \ / \ / \
{BIND} {NET} {BIND} {SYS} {NET} {SYS}
\ \ \ \ \ \
∅
Edges indicate “covers” in the subset order.
Hardening is moving down the diagram (toward
∅).Any time you “just add
CAP_NET_ADMIN,” you’re climbing up.
In practice you won’t draw the whole lattice (it’s enormous). But this mental model clarifies why even “small” adds—like
NET_ADMIN—are big structural jumps.
“Gluing” lattices: composing defenses
Capabilities don’t act alone. You have other controls that also form partial orders:
Seccomp: the set of allowed syscalls (subset order).
LSM (SELinux/AppArmor): the set of allowed label transitions & accesses.
Mount namespaces: the set of visible paths and mount flags.
cgroups: the set of resource actions (e.g., device access).
Each is a lattice of behaviors. The container’s effective power is the product lattice:
L_effective = L_caps × L_seccomp × L_lsm × L_mounts × L_cgroups × ...
Your runtime policy is the meet across components:
Policy_effective = Policy_caps ∧ Policy_seccomp ∧ Policy_lsm ∧ ...
A compromise or misconfig in one layer won’t explode the total surface as long as other layers keep meeting it down. This is the “glue.”
Tactics: from lattice theory to YAML and flags
1) Start from ⊥: drop everything, add back surgically
Docker / nerdctl
docker run \
--cap-drop=ALL \
--cap-add=NET_BIND_SERVICE \ # if you need to bind :80/:443
--security-opt no-new-privileges \
--pids-limit=128 \
--read-only \
--tmpfs /tmp:rw,nosuid,nodev,noexec,size=16m \
--mount type=bind,src=/var/run/sockets/app.sock,dst=/app.sock,ro \
-e SYSLOG_ADDR=unixgram:///app.sock \
myimage:latest
Kubernetes (PodSecurityContext)
securityContext:
runAsNonRoot: true
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: [”ALL”]
add: [”NET_BIND_SERVICE”] # only if absolutely required
Notes:
Prefer
NET_BIND_SERVICEoverNET_ADMIN. Binding to 80/443 does not require full network admin.Avoid
CAP_SYS_ADMIN(the “god” capability). If you see it, ask why exactly.
2) Bound the bounding set and ambient set
At exec time, the kernel derives Permitted, Effective, Inheritable, Ambient, and Bounding sets. The bounding set (CapBnd in /proc/self/status) caps the maximum capabilities a process can ever gain.
Minimize the bounding set at the entrypoint (init/launcher), before starting any AI agent subprocesses. Many runtimes expose this, but even in raw Linux an early prctl(PR_CAPBSET_DROP, …) can ratchet power down.
3) Use seccomp as an orthogonal meet
Capabilities gate who may do powerful syscalls. Seccomp gates which syscalls exist at all. An agent that can’t call ptrace, bpf, mount, clone3, userfaultfd, or keyctl is much less agentic.
Start from a tight allow-list (e.g., a distro’s “default” profile) and add as needed.
Remember: seccomp ∧ capabilities is strictly stronger than either alone.
4) Seal the filesystem & IPC envelopes
read-only rootfs +
tmpfsfor write points.noexecwhere feasible (/tmp, app workdirs).private/proc//syssubsets via mount namespaces andmaskedPaths/readonlyPaths(K8s).Consider
CAP_IPC_LOCKremoval and device cgroup deny-lists.
5) Be wary of the “new” caps
Modern kernels split powers into narrower caps:
CAP_BPF,CAP_PERFMON,CAP_CHECKPOINT_RESTORECAP_AUDIT_READ
These look “niche” but are high-leverage for exploration, memory scraping, and persistence. Default to drop.
A concrete capability recipe (web service)
Goal: TLS-terminating HTTP service on :443, no shelling out, no package installs, no tracing, no eBPF.
Capability set:
Add:
NET_BIND_SERVICEDrop (explicit, belt-and-suspenders):
SYS_ADMIN,NET_ADMIN,SYS_PTRACE,BPF,PERFMON,SYS_MODULE,SYS_TIME,SYS_BOOT,SYS_NICE,SYS_PACCT,SYS_TTY_CONFIG,SYS_RESOURCE,AUDIT_*,MKNOD,SETPCAP,MAC_*,CHECKPOINT_RESTORE, and anything not explicitly needed.
Seccomp allow:
Basic I/O:
read,write,close,recvfrom,sendto,epoll_*,futex,clock_*,nanosleep.No
ptrace,bpf,keyctl,mount,umount2,clone3,setns,io_uring_*,userfaultfd.
FS:
readOnlyRootFilesystem: truetmpfsat/tmpwithnoexecBind-mount only the certs dir as
ro.
This lands you near the bottom of the capability lattice while staying functional.
Observability: prove you’re near ⊥
Inside the container:
# Show effective sets
grep -E ‘Cap(Prm|Eff|Inh|Amb|Bnd)’ /proc/self/status
# Translate to names
capsh --print # if available
# Show seccomp mode (2 = filter)
grep Seccomp /proc/self/status
Automate checks in CI: fail builds if any image requires CAP_SYS_ADMIN, NET_ADMIN, or BPF unless an allowlist entry explains why and links to a design doc.
Policy patterns that map cleanly to lattice ops
Least privilege by construction: generate pod specs from a capability blueprint; services declare intents (“bind low port”) that compile to named meets (drop-all ∧ add
NET_BIND_SERVICE).Ephemeral joins with leases: for maintenance, issue time-limited capability grants (e.g., via a sidecar that re-execs with a bounded set, then exits). The lease expiry is a forced meet back to baseline.
Cross-lattice guardrails: deny
ptracein both caps and seccomp; deny BPF in both caps and LSM. Dual denial reduces the blast radius of a single misconfig.
Checklist: agentic-AI ready
cap-drop=ALL(or equivalent) and explicit minimalcap-add.No
SYS_ADMIN,NET_ADMIN,SYS_PTRACE,BPF,PERFMON,CHECKPOINT_RESTORE.Seccomp: allow-list, deny
ptrace,bpf,keyctl,mount,clone3,userfaultfd,io_uring_*.no_new_privileges, read-only rootfs,tmpfswithnoexec.Bounding set minimized at entrypoint; ambient cleared.
CI gate on cap set; regression test (e.g., the Rust 2024 probe above).
Observability: export Cap* from
/proc/self/statusand seccomp mode.
Closing thought
Thinking in lattices forces discipline: every exception is a join that must be justified; every layer is another meet that buys safety. In an era of exploratory, tool-using AI, this algebra is the difference between “a clever model wandered off” and “it couldn’t—there was nowhere to go.”

