Picture the scene. You self-host. You read a blog post that said “Docker is secure by default,” you ran docker run -d, you went to bed feeling like a sysadmin from the future. Twelve months later, someone shows you a one-line CVE that lets any process inside any container on your box read /etc/shadow from the host. You spit out your coffee.
Here’s the truth nobody puts on the official Docker page: a default Docker container is barely a container at all. It’s a process running as root, with most Linux capabilities, on a writable filesystem, sharing a kernel with everything else on your box. If something inside the container goes wrong — a vulnerable nginx, a php-fpm worker that got popped, a forgotten debug endpoint — the blast radius is wider than you think.
The good news: a hardened container is genuinely close to bulletproof. The bad news: you have to opt in to every single layer of hardening. This guide is the checklist I wish someone had handed me when I started self-hosting WordPress, Vaultwarden, Roundcube, and the other ten things that live on my home server.
No jargon without an explanation. Pretend you’ve never written a Dockerfile in your life. By the end you’ll have a hardened compose file for NGINX/Angie and PHP-FPM, and you’ll understand why every flag is there.
Why default Docker is not actually safe
If you run docker run -d nginx right now, you get all of the following, whether you wanted them or not:
- The container’s PID 1 runs as root. If anything inside the container is exploited and the attacker gets a shell, that shell is root inside the container.
- The dockerd daemon itself runs as root on the host. A container escape (and there have been several historic ones — CVE-2019-5736 was the famous runc escape) lands the attacker straight at host-root.
- The container has most Linux capabilities — CAP_NET_RAW, CAP_NET_BIND_SERVICE, CAP_CHOWN, CAP_SETUID, CAP_SETGID, CAP_KILL, CAP_AUDIT_WRITE and others. Far more than nginx actually needs.
- The container filesystem is writable. An attacker can drop a webshell wherever the running user can write, and it sticks around for the life of the container.
- Process privilege escalation is allowed. Without
no-new-privileges, a setuid binary inside the container can elevate. - The kernel is shared. A kernel exploit (you’ll see “container escape via dirty-pipe” or similar in the headlines every couple of years) hits everyone on the box.
None of this is a Docker bug. It’s a deliberate choice to default to “convenient” over “secure.” The hardening flags exist; you just have to use them. Let’s walk through them, in order of how much they help.
Layer 1: don’t run the daemon as root (rootless mode)
The single biggest hardening win is moving the Docker daemon out of root. Two ways to do it:
- Docker rootless mode — ships with Docker 20.10+. Run
dockerd-rootless-setuptool.sh installas your normal user, and from then ondockercommands talk to your personal daemon, owned by your user. A container escape now lands at your user, not at root. - Podman — designed rootless from day one, no daemon at all (each
podman runspawns a normal user process). Drop-in compatible with mostdocker runand Compose files.
I run a mix on my own machine: Podman for one-shot containers, Docker rootless for the long-running compose stacks. Both deliver the same security win: the privileged Docker daemon is gone.
Trade-offs to know:
- Rootless containers can’t bind to ports below 1024 by default. Solution: put a single privileged reverse proxy in front (or use
net.ipv4.ip_unprivileged_port_start=80as a sysctl). - Some volume-mount and overlay-network features need extra setup (
slirp4netns,fuse-overlayfs). docker run --privilegedis mostly meaningless rootless — which is exactly what you want.
If you take one thing from this whole post, take this: run Docker rootless or use Podman. The other layers are nice; this one is foundational.
Layer 2: make the container filesystem read-only
Most well-behaved server processes never need to write to their own filesystem. nginx serves static files. PHP-FPM reads scripts and writes to a session store and a log. Vaultwarden writes to a single SQLite file. If you flip the container filesystem to read-only and then mount just the paths that need writes, an attacker who drops a webshell into /usr/local/bin/ finds out that file system is read-only the hard way.
In Compose:
services:
web:
image: nginx:1.27-alpine
read_only: true
tmpfs:
- /tmp
- /var/cache/nginx
- /var/run
volumes:
- ./site:/usr/share/nginx/html:ro
The tmpfs mounts give the container a few writable scratch directories backed by RAM — they vanish on restart. Anything else the process tries to write fails with EROFS. nginx, php-fpm, postgres, redis, vaultwarden, postfix, dovecot — all of them will run happily in this mode once you’ve identified the directories they legitimately need writable.
How to find out? Run the container, hit it with normal traffic, then docker exec -it <name> sh -c 'mount | grep -v ro'. Anything writable that isn’t a tmpfs or bind-mount is a candidate for “either move to tmpfs or live without.”
Layer 3: drop every capability, add back only what you need
Linux capabilities are the modern way to slice up what used to be “all of root or none of it.” A normal process running as root has 40+ capabilities. A normal container gets a curated default subset, but it’s still wildly over-privileged for what most apps actually do.
The right approach: drop all of them, then add back only the specific ones the app needs. For nginx in a container that listens on port 80:
services:
web:
image: nginx:1.27-alpine
cap_drop: [ALL]
cap_add: [NET_BIND_SERVICE]
That’s it. NET_BIND_SERVICE is the only capability nginx needs to bind to port 80. With that single allow, an attacker who escapes the nginx process has the privileges of “a process that can bind to a low port.” That’s it. No chown, no setuid, no net_raw, no kill — nothing.
For PHP-FPM that listens on a Unix socket or a TCP port above 1024:
services:
php:
image: php:8.4-fpm-alpine
cap_drop: [ALL]
# NO cap_add at all — PHP-FPM doesn't need a single capability
That feels wrong the first time you see it. It’s correct. Read the capabilities(7) man page if you’re unsure — nothing PHP-FPM does at runtime requires any capability.
Layer 4: no-new-privileges
This one is a single line and a huge win. It tells the kernel: “no process inside this container can ever gain more privileges than it started with.” Setuid binaries are neutered. su stops working. sudo stops working. pkexec stops working.
services:
web:
image: nginx:1.27-alpine
security_opt:
- no-new-privileges:true
Combine this with running as a non-root user inside the container (most well-built images do this already) and you’ve removed an entire family of attack chains. The 2019 runc escape (CVE-2019-5736) and several since have depended on tricking the kernel into running a binary with elevated privileges; no-new-privileges blocks that class outright.
There is no good reason ever to leave this off for a normal application container.
Layer 5: don’t run as root inside the container
Even with the host-level Docker daemon rootless, an application running as UID 0 inside the container is needlessly powerful. Run as a normal user.
Most well-built modern images already do this. The official nginx:alpine image runs as the nginx user (UID 101). The official php:8.4-fpm-alpine image runs FPM workers as www-data (UID 82). If you’re writing your own Dockerfile, you should always include:
RUN adduser -D -u 33 appuser
USER appuser
And in Compose, you can override the runtime user explicitly:
services:
php:
image: deb.myguard.nl/php-fpm:8.4
user: "33:33" # www-data
For an extra layer, enable Docker’s user-namespace remapping (userns-remap in /etc/docker/daemon.json). Even if the process inside the container thinks it’s UID 0, the kernel sees it as an unprivileged UID on the host (e.g. UID 100000). This breaks a class of file-permission attacks across the container/host boundary.
Layer 6: pick a base image that isn’t a Swiss army knife
“Use Alpine for security” is one of those folk-wisdom statements that’s half right. Smaller images do have fewer CVEs by sheer surface-area, but the modern landscape has three serious contenders:
| Base | Size | Has a shell? | Use when |
|---|---|---|---|
| distroless (gcr.io/distroless) | ~20 MB | No (and no package manager) | You ship a single static or interpreted binary. Best for Go, Rust, Java. |
| alpine | ~5 MB | Yes (busybox) | Small, musl-based. Watch out for glibc assumptions in some apps. |
| debian:trixie-slim / ubuntu:24.04 | ~30 MB | Yes (bash) | You need glibc, full apt repos, and predictable behaviour. The most production-friendly of the three. |
The honest answer: match the base to the app. WordPress + PHP-FPM with our myguard packages? debian:trixie-slim, every time — we test against it, the packages install cleanly, the apt sources just work. A Go static binary you wrote yourself? Distroless. A small Node.js script? Alpine.
What matters more than the base image, every single time: rebuild and patch on a schedule. The freshest distroless image with a six-month-old base layer is less secure than yesterday’s debian-slim with apt-updated packages.
Layer 7: secrets don’t belong in your image
The most common own-goal in container security: baking a DATABASE_PASSWORD=hunter2 into a Dockerfile ENV, or worse, into a COPY .env /app/. That credential is now in every layer of every published image, forever. docker history will show it. Anyone who pulls the image can extract it with docker save.
Three good ways to ship secrets to a container, ranked by my preference for self-hosters:
- Bind-mount a file the container reads at startup. Keep the file outside git, lock it down to mode
600.
volumes: [./secrets/db.env:/run/secrets/db.env:ro] - Docker secrets / Swarm secrets — works fine standalone with
docker compose --profile production. Secrets land as files under/run/secrets/inside the container. - External secret store (Bitwarden CLI, age-encrypted files, HashiCorp Vault). Overkill for most homelabs but worth knowing about.
What never to do: ENV DATABASE_PASSWORD=... in the Dockerfile, or --env DATABASE_PASSWORD=... on the command line (it leaks to ps on the host).
Layer 8: network segmentation by default
Every container Docker creates joins the default bridge network and can talk to every other container on that network. That’s the opposite of what you want.
The right pattern: each compose stack gets its own network. The reverse-proxy container joins both the public-facing network and the stack’s internal network. The application containers (PHP-FPM, database, cache) join only the internal network. Now an attacker who pops PHP-FPM can’t ping a neighbouring stack’s postgres.
services:
web:
image: nginx:1.27-alpine
networks: [public, internal]
ports: ["80:80"]
php:
image: deb.myguard.nl/php-fpm:8.4
networks: [internal] # NO public exposure
db:
image: postgres:17-alpine
networks: [internal]
networks:
public:
driver: bridge
internal:
driver: bridge
internal: true # blocks egress to the internet too
The internal: true on the database network is the cherry on top: a database container on an internal-only network cannot exfiltrate data to the public internet even if it’s owned. Most data-stealer malware fails silently against this.
Layer 9: scan your images, every build
Static analysis of container images takes thirty seconds and catches an embarrassing number of issues. The three tools I actually use:
- Trivy (github.com/aquasecurity/trivy) — one binary, scans an image for CVEs in OS packages and language deps.
trivy image nginx:1.27-alpineprints a table you can act on in minutes. - Grype (github.com/anchore/grype) — similar idea, different vuln database. I run both because they don’t always agree.
- Syft (github.com/anchore/syft) — generates a software bill of materials (SBOM) in SPDX/CycloneDX format. Useful when a new CVE drops and you need to know which of your images are affected without re-scanning.
Wire any of these into your CI and reject builds with critical CVEs. For self-hosters without a CI, even a manual trivy image $(docker images -q | head) on a Sunday afternoon catches the worst stuff.
Layer 10: logging that survives the container
When (not if) something goes wrong inside a container, you want logs you can read. Two rules:
- Log to stdout/stderr, not to a file inside the container. The Docker logging driver captures stdout and ships it where you want.
- Pick a real logging driver —
json-filewith rotation,journald, or push to Loki / Vector. The default unboundedjson-filedriver will fill your disk eventually.
In /etc/docker/daemon.json:
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "5"
}
}
Per-container override in Compose:
services:
web:
logging:
driver: journald
options:
tag: "{{.Name}}"
Forensics are only possible if the evidence still exists when you go looking.
Putting it all together: a hardened nginx + PHP-FPM stack
Here’s a complete, real-world docker-compose.yml for a hardened WordPress / PHP stack using our myguard packages. Every flag is one of the ten layers above; no flag is decorative.
services:
web:
image: deb.myguard.nl/angie:1.11.5-alpine
read_only: true
cap_drop: [ALL]
cap_add: [NET_BIND_SERVICE]
security_opt:
- no-new-privileges:true
user: "101:101" # nginx
tmpfs:
- /tmp
- /var/cache/angie
- /var/run
ports: ["80:80", "443:443"]
networks: [public, internal]
volumes:
- ./conf/angie.conf:/etc/angie/angie.conf:ro
- certs:/etc/letsencrypt:ro
- wp-content:/var/www/html/wp-content:ro
logging:
driver: journald
php:
image: deb.myguard.nl/php-fpm:8.4-snuf
read_only: true
cap_drop: [ALL]
security_opt:
- no-new-privileges:true
user: "33:33" # www-data
tmpfs:
- /tmp
- /var/run
networks: [internal] # no public exposure
volumes:
- ./conf/wordpress-strict.rules:/etc/php/8.4/php-snuffleupagus/active.rules:ro
- ./conf/www.conf:/etc/php/8.4/fpm/pool.d/www.conf:ro
- wp-content:/var/www/html/wp-content
logging:
driver: journald
db:
image: mariadb:11-noble
read_only: false # mariadb really does need to write
cap_drop: [ALL]
cap_add: [CHOWN, SETUID, SETGID]
security_opt:
- no-new-privileges:true
networks: [internal]
environment:
MARIADB_ROOT_PASSWORD_FILE: /run/secrets/db_root_password
secrets: [db_root_password]
volumes:
- db-data:/var/lib/mysql
logging:
driver: journald
networks:
public:
driver: bridge
internal:
driver: bridge
internal: true
volumes:
certs:
wp-content:
db-data:
secrets:
db_root_password:
file: ./secrets/db_root_password
The PHP container in this stack also loads Snuffleupagus via the rules file mount — so even if the attacker bypasses all ten Docker layers and gets PHP execution, Snuffleupagus blocks the function calls they’d need to actually do damage. Defence in depth, in three independent layers.
The 60-second checklist
If you remember nothing else from this post, run through this list on every container you deploy:
- Daemon running rootless? (or Podman)
read_only: true+ tmpfs for the writable bits?cap_drop: [ALL]with only the specific caps added back?security_opt: [no-new-privileges:true]?user:set to a non-root UID?- Base image fresh, scanned with Trivy or Grype?
- Secrets via files, not
ENV? - App container on an
internal: truenetwork? - Logging driver with rotation set?
- Restart policy chosen deliberately (
unless-stoppedfor services, notalways)?
Ten flags. Five minutes per stack. A real attacker has to find a real kernel CVE to escape this. Worth the effort.
Frequently asked questions
docker run is significantly less isolated than a default KVM VM. A hardened container (rootless + read-only + cap-drop + no-new-privileges) is comparable to a VM for most threat models, and uses a fraction of the resources. For really high-value workloads where you want a hardware-enforced isolation boundary, use Firecracker or Kata Containers — they wrap each container in a micro-VM.restricted profile.read_only?privileged: true for that one container that needs it?cap_add. If it needs a specific device, mount only that device. privileged: true gives the container almost root-on-host access; reserve it for containers that genuinely manage Docker itself (and even then, look hard for an alternative).read_only + cap_drop limit what the malicious code can do at runtime; SBOM scanning (Syft) catches some known-malicious package signatures. The strongest defence is pinning images by digest (image: nginx@sha256:abc...) rather than tag — that way a re-tagged malicious image can’t quietly slip in on your next pull.Related posts
- PHP Snuffleupagus tutorial — the PHP-interpreter-layer hardening that pairs perfectly with this container-layer hardening.
- How to install ModSecurity and OWASP CRS on NGINX — the HTTP-layer WAF that sits in front.
- Self-hosted Vaultwarden — a real-world hardened compose example end-to-end.
- docker-cms: hardened PHP 8.5 image for WordPress — our pre-hardened image so you don’t have to start from scratch.
- Angie and NGINX Docker images — daily-rebuilt, full-modules, ready to drop into a hardened compose.
See also: Hardened OpenSSH 10.3 for Debian and Ubuntu for the SSH side of host hardening, and the Docker packages overview.