Docker Hardening: Rootless, Read-Only, Distroless (2026)

Docker hardening is the missing manual. Picture the scene: you self-host, you read a blog post that said “Docker is secure by default,” you ran docker run -d, you went to bed feeling like a sysadmin from the future. Twelve months later, someone shows you a one-line CVE that lets any process inside any container on your box read /etc/shadow from the host. You spit out your coffee.

Here’s the truth nobody puts on the official Docker page: a default Docker container is barely a container at all. It’s a process running as root, with most Linux capabilities, on a writable filesystem, sharing a kernel with everything else on your box. If something inside the container goes wrong, a vulnerable nginx, a php-fpm worker that got popped, a forgotten debug endpoint, the blast radius is wider than you think.

The good news: a hardened container is genuinely close to bulletproof. The bad news: you have to opt in to every single layer of hardening. This guide is the checklist I wish someone had handed me when I started self-hosting WordPress, Vaultwarden, Roundcube, and the other ten things that live on my home server.

No jargon without an explanation. Pretend you’ve never written a Dockerfile in your life. By the end you’ll have a hardened compose file for NGINX/Angie and PHP-FPM, and you’ll understand why every flag is there.

Docker hardening for self-hosters, rootless, read-only, cap-drop, distroless — A hardened container is just a normal container with eight extra “no”s.

Table of Contents

Why default Docker is not actually safe (and what Docker hardening means)

If you run docker run -d nginx right now, you get all of the following, whether you wanted them or not:

The container’s PID 1 runs as root. If anything inside the container is exploited and the attacker gets a shell, that shell is root inside the container.
The dockerd daemon itself runs as root on the host. A container escape (and there have been several historic ones: CVE-2019-5736 was the famous runc escape) lands the attacker straight at host-root.
The container has most Linux capabilities: CAP_NET_RAW, CAP_NET_BIND_SERVICE, CAP_CHOWN, CAP_SETUID, CAP_SETGID, CAP_KILL, CAP_AUDIT_WRITE and others. Far more than nginx actually needs.
The container filesystem is writable. An attacker can drop a webshell wherever the running user can write, and it sticks around for the life of the container.
Process privilege escalation is allowed. Without no-new-privileges, a setuid binary inside the container can elevate.
The kernel is shared. A kernel exploit (you’ll see “container escape via dirty-pipe” or similar in the headlines every couple of years) hits everyone on the box.

None of this is a Docker bug. It’s a deliberate choice to default to “convenient” over “secure.” The hardening flags exist; you just have to use them. Let’s walk through them, in order of how much they help.

Layer 1: don’t run the daemon as root (rootless mode)

The single biggest hardening win is moving the Docker daemon out of root. Two ways to do it:

Docker rootless mode: ships with Docker 20.10+. Run dockerd-rootless-setuptool.sh install as your normal user, and from then on docker commands talk to your personal daemon, owned by your user. A container escape now lands at your user, not at root.
Podman: designed rootless from day one, no daemon at all (each podman run spawns a normal user process). Drop-in compatible with most docker run and Compose files.

I run a mix on my own machine: Podman for one-shot containers, Docker rootless for the long-running compose stacks. Both deliver the same security win: the privileged Docker daemon is gone.

Trade-offs to know:

Rootless containers can’t bind to ports below 1024 by default. Solution: put a single privileged reverse proxy in front (or use net.ipv4.ip_unprivileged_port_start=80 as a sysctl).
Some volume-mount and overlay-network features need extra setup (slirp4netns, fuse-overlayfs).
docker run --privileged is mostly meaningless rootless: which is exactly what you want.

If you take one thing from this whole post, take this: run Docker rootless or use Podman. The other layers are nice; this one is foundational.

Layer 2: make the container filesystem read-only

Most well-behaved server processes never need to write to their own filesystem. nginx serves static files. PHP-FPM reads scripts and writes to a session store and a log. Vaultwarden writes to a single SQLite file. If you flip the container filesystem to read-only and then mount just the paths that need writes, an attacker who drops a webshell into /usr/local/bin/ finds out that file system is read-only the hard way.

In Compose:

services:
  web:
    image: nginx:1.27-alpine
    read_only: true
    tmpfs:
      - /tmp
      - /var/cache/nginx
      - /var/run
    volumes:
      - ./site:/usr/share/nginx/html:ro

The tmpfs mounts give the container a few writable scratch directories backed by RAM, they vanish on restart. Anything else the process tries to write fails with EROFS. nginx, php-fpm, postgres, redis, vaultwarden, postfix, dovecot, all of them will run happily in this mode once you’ve identified the directories they legitimately need writable.

How to find out? Run the container, hit it with normal traffic, then docker exec -it <name> sh -c 'mount | grep -v ro'. Anything writable that isn’t a tmpfs or bind-mount is a candidate for “either move to tmpfs or live without.”

Layer 3: drop every capability, add back only what you need

Linux capabilities are the modern way to slice up what used to be “all of root or none of it.” A normal process running as root has 40+ capabilities. A normal container gets a curated default subset, but it’s still wildly over-privileged for what most apps actually do.

The right approach: drop all of them, then add back only the specific ones the app needs. For nginx in a container that listens on port 80:

services:
  web:
    image: nginx:1.27-alpine
    cap_drop: [ALL]
    cap_add: [NET_BIND_SERVICE]

That’s it. NET_BIND_SERVICE is the only capability nginx needs to bind to port 80. With that single allow, an attacker who escapes the nginx process has the privileges of “a process that can bind to a low port.” That’s it. No chown, no setuid, no net_raw, no kill, nothing.

For PHP-FPM that listens on a Unix socket or a TCP port above 1024:

services:
  php:
    image: php:8.4-fpm-alpine
    cap_drop: [ALL]
    # NO cap_add at all — PHP-FPM doesn't need a single capability

That feels wrong the first time you see it. It’s correct. Read the capabilities(7) man page if you’re unsure, nothing PHP-FPM does at runtime requires any capability.

Layer 4: `no-new-privileges`

This one is a single line and a huge win. It tells the kernel: “no process inside this container can ever gain more privileges than it started with.” Setuid binaries are neutered. su stops working. sudo stops working. pkexec stops working.

services:
  web:
    image: nginx:1.27-alpine
    security_opt:
      - no-new-privileges:true

Combine this with running as a non-root user inside the container (most well-built images do this already) and you’ve removed an entire family of attack chains. The 2019 runc escape (CVE-2019-5736) and several since have depended on tricking the kernel into running a binary with elevated privileges; no-new-privileges blocks that class outright.

There is no good reason ever to leave this off for a normal application container.

Layer 5: don’t run as root inside the container

Even with the host-level Docker daemon rootless, an application running as UID 0 inside the container is needlessly powerful. Run as a normal user.

Most well-built modern images already do this. The official nginx:alpine image runs as the nginx user (UID 101). The official php:8.4-fpm-alpine image runs FPM workers as www-data (UID 82). If you’re writing your own Dockerfile, you should always include:

RUN adduser -D -u 33 appuser
USER appuser

And in Compose, you can override the runtime user explicitly:

services:
  php:
    image: deb.myguard.nl/php-fpm:8.4
    user: "33:33"   # www-data

For an extra layer, enable Docker’s user-namespace remapping (userns-remap in /etc/docker/daemon.json). Even if the process inside the container thinks it’s UID 0, the kernel sees it as an unprivileged UID on the host (e.g. UID 100000). This breaks a class of file-permission attacks across the container/host boundary.

Layer 6: pick a base image that isn’t a Swiss army knife

“Use Alpine for security” is one of those folk-wisdom statements that’s half right. Smaller images do have fewer CVEs by sheer surface-area, but the modern landscape has three serious contenders:

Base	Size	Has a shell?	Use when
distroless (gcr.io/distroless)	~20 MB	No (and no package manager)	You ship a single static or interpreted binary. Best for Go, Rust, Java.
alpine	~5 MB	Yes (busybox)	Small, musl-based. Watch out for `glibc` assumptions in some apps.
debian:trixie-slim / ubuntu:24.04	~30 MB	Yes (bash)	You need glibc, full apt repos, and predictable behaviour. The most production-friendly of the three.

The honest answer: match the base to the app. WordPress + PHP-FPM with our myguard packages? debian:trixie-slim, every time, we test against it, the packages install cleanly, the apt sources just work. A Go static binary you wrote yourself? Distroless. A small Node.js script? Alpine.

What matters more than the base image, every single time: rebuild and patch on a schedule. The freshest distroless image with a six-month-old base layer is less secure than yesterday’s debian-slim with apt-updated packages.

Layer 7: secrets don’t belong in your image

The most common own-goal in container security: baking a DATABASE_PASSWORD=hunter2 into a Dockerfile ENV, or worse, into a COPY .env /app/. That credential is now in every layer of every published image, forever. docker history will show it. Anyone who pulls the image can extract it with docker save.

Three good ways to ship secrets to a container, ranked by my preference for self-hosters:

Bind-mount a file the container reads at startup. Keep the file outside git, lock it down to mode 600.
volumes: [./secrets/db.env:/run/secrets/db.env:ro]
Docker secrets / Swarm secrets: works fine standalone with docker compose --profile production. Secrets land as files under /run/secrets/ inside the container.
External secret store (Bitwarden CLI, age-encrypted files, HashiCorp Vault). Overkill for most homelabs but worth knowing about.

What never to do: ENV DATABASE_PASSWORD=... in the Dockerfile, or --env DATABASE_PASSWORD=... on the command line (it leaks to ps on the host).

Layer 8: network segmentation by default

Every container Docker creates joins the default bridge network and can talk to every other container on that network. That’s the opposite of what you want.

The right pattern: each compose stack gets its own network. The reverse-proxy container joins both the public-facing network and the stack’s internal network. The application containers (PHP-FPM, database, cache) join only the internal network. Now an attacker who pops PHP-FPM can’t ping a neighbouring stack’s postgres.

services:
  web:
    image: nginx:1.27-alpine
    networks: [public, internal]
    ports: ["80:80"]
  php:
    image: deb.myguard.nl/php-fpm:8.4
    networks: [internal]   # NO public exposure
  db:
    image: postgres:17-alpine
    networks: [internal]

networks:
  public:
    driver: bridge
  internal:
    driver: bridge
    internal: true   # blocks egress to the internet too

The internal: true on the database network is the cherry on top: a database container on an internal-only network cannot exfiltrate data to the public internet even if it’s owned. Most data-stealer malware fails silently against this.

Layer 9: scan your images, every build

Static analysis of container images takes thirty seconds and catches an embarrassing number of issues. The three tools I actually use:

Trivy (github.com/aquasecurity/trivy): one binary, scans an image for CVEs in OS packages and language deps. trivy image nginx:1.27-alpine prints a table you can act on in minutes.
Grype (github.com/anchore/grype): similar idea, different vuln database. I run both because they don’t always agree.
Syft (github.com/anchore/syft): generates a software bill of materials (SBOM) in SPDX/CycloneDX format. Useful when a new CVE drops and you need to know which of your images are affected without re-scanning.

Wire any of these into your CI and reject builds with critical CVEs. For self-hosters without a CI, even a manual trivy image $(docker images -q | head) on a Sunday afternoon catches the worst stuff.

Layer 10: logging that survives the container

When (not if) something goes wrong inside a container, you want logs you can read. Two rules:

Log to stdout/stderr, not to a file inside the container. The Docker logging driver captures stdout and ships it where you want.
Pick a real logging driver: json-file with rotation, journald, or push to Loki / Vector. The default unbounded json-file driver will fill your disk eventually.

In /etc/docker/daemon.json:

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "5"
  }
}

Per-container override in Compose:

services:
  web:
    logging:
      driver: journald
      options:
        tag: "{{.Name}}"

Forensics are only possible if the evidence still exists when you go looking.

Putting it all together: a hardened nginx + PHP-FPM stack

Here’s a complete, real-world docker-compose.yml for a hardened WordPress / PHP stack using our myguard packages. Every flag is one of the ten layers above; no flag is decorative.

services:
  web:
    image: deb.myguard.nl/angie:1.11.5-alpine
    read_only: true
    cap_drop: [ALL]
    cap_add: [NET_BIND_SERVICE]
    security_opt:
      - no-new-privileges:true
    user: "101:101"   # nginx
    tmpfs:
      - /tmp
      - /var/cache/angie
      - /var/run
    ports: ["80:80", "443:443"]
    networks: [public, internal]
    volumes:
      - ./conf/angie.conf:/etc/angie/angie.conf:ro
      - certs:/etc/letsencrypt:ro
      - wp-content:/var/www/html/wp-content:ro
    logging:
      driver: journald

  php:
    image: deb.myguard.nl/php-fpm:8.4-snuf
    read_only: true
    cap_drop: [ALL]
    security_opt:
      - no-new-privileges:true
    user: "33:33"   # www-data
    tmpfs:
      - /tmp
      - /var/run
    networks: [internal]   # no public exposure
    volumes:
      - ./conf/wordpress-strict.rules:/etc/php/8.4/php-snuffleupagus/active.rules:ro
      - ./conf/www.conf:/etc/php/8.4/fpm/pool.d/www.conf:ro
      - wp-content:/var/www/html/wp-content
    logging:
      driver: journald

  db:
    image: mariadb:11-noble
    read_only: false   # mariadb really does need to write
    cap_drop: [ALL]
    cap_add: [CHOWN, SETUID, SETGID]
    security_opt:
      - no-new-privileges:true
    networks: [internal]
    environment:
      MARIADB_ROOT_PASSWORD_FILE: /run/secrets/db_root_password
    secrets: [db_root_password]
    volumes:
      - db-data:/var/lib/mysql
    logging:
      driver: journald

networks:
  public:
    driver: bridge
  internal:
    driver: bridge
    internal: true

volumes:
  certs:
  wp-content:
  db-data:

secrets:
  db_root_password:
    file: ./secrets/db_root_password

The PHP container in this stack also loads Snuffleupagus via the rules file mount, so even if the attacker bypasses all ten Docker layers and gets PHP execution, Snuffleupagus blocks the function calls they’d need to actually do damage. Defence in depth, in three independent layers.

The 60-second Docker hardening checklist

If you remember nothing else from this post, run through this list on every container you deploy:

Daemon running rootless? (or Podman)
read_only: true + tmpfs for the writable bits?
cap_drop: [ALL] with only the specific caps added back?
security_opt: [no-new-privileges:true]?
user: set to a non-root UID?
Base image fresh, scanned with Trivy or Grype?
Secrets via files, not ENV?
App container on an internal: true network?
Logging driver with rotation set?
Restart policy chosen deliberately (unless-stopped for services, not always)?

Ten flags. Five minutes per stack. A real attacker has to find a real kernel CVE to escape this. Worth the effort.

Frequently asked questions

Is Docker actually less secure than a VM?

It depends on the hardening. A default docker run is significantly less isolated than a default KVM VM. A hardened container (rootless + read-only + cap-drop + no-new-privileges) is comparable to a VM for most threat models, and uses a fraction of the resources. For really high-value workloads where you want a hardware-enforced isolation boundary, use Firecracker or Kata Containers, they wrap each container in a micro-VM.

Should I switch from Docker to Podman?

If you’re starting fresh, yes, Podman is rootless-first and the CLI is a drop-in replacement. If you have an existing Docker stack that works, Docker rootless mode gives you ~95% of the security benefit without the migration. Either choice is fine; the wrong choice is “daemon as root.”

What about Kubernetes? Doesn’t it handle all this for me?

Kubernetes ships PodSecurityStandards and admission controllers that can enforce a lot of this at the cluster level, if you configure them. Out of the box, a default Pod is just as permissive as a default Docker container. For a homelab, K8s is overkill; for production, look up “Pod Security Standards” and apply the restricted profile.

My app needs to write to its own filesystem for caching. Can I still use `read_only`?

Yes, mount the specific writable path as a tmpfs (RAM-backed, vanishes on restart) or as a named volume (persistent on disk). Lots of legitimate apps need scratch space; the trick is to whitelist which paths, not flip the whole filesystem writable.

Is Alpine really insecure because of musl?

“Insecure” is the wrong word. Musl has subtly different behaviour from glibc in edge cases (locale, DNS resolution, threading). Some apps assume glibc and break on Alpine in non-obvious ways. The security story is fine; the compatibility story is what bites you. If your app is well-tested on Alpine, ship Alpine. If not, ship debian-slim.

Should I scan images in production, or only at build time?

Both. Build-time scanning catches “this base image already had a CVE when you pulled it.” Runtime scanning catches “a new CVE was published last week against a package in your already-running image.” Trivy can do both; for self-hosters, a weekly cron job that emails you Trivy’s diff is enough.

What about `privileged: true` for that one container that needs it?

There’s almost always a less-bad option. If a container needs a specific capability, add only that capability with cap_add. If it needs a specific device, mount only that device. privileged: true gives the container almost root-on-host access; reserve it for containers that genuinely manage Docker itself (and even then, look hard for an alternative).

Does any of this help against supply-chain attacks (a malicious package in my base image)?

Partly. read_only + cap_drop limit what the malicious code can do at runtime; SBOM scanning (Syft) catches some known-malicious package signatures. The strongest defence is pinning images by digest (image: nginx@sha256:abc...) rather than tag, that way a re-tagged malicious image can’t quietly slip in on your next pull.

PHP Snuffleupagus tutorial: the PHP-interpreter-layer hardening that pairs perfectly with this container-layer hardening.
How to install ModSecurity and OWASP CRS on NGINX: the HTTP-layer WAF that sits in front.
Self-hosted Vaultwarden: a real-world hardened compose example end-to-end.
docker-cms: hardened PHP 8.5 image for WordPress: our pre-hardened image so you don’t have to start from scratch.
Angie and NGINX Docker images: daily-rebuilt, full-modules, ready to drop into a hardened compose.

See also: Hardened OpenSSH 10.3 for Debian and Ubuntu for the SSH side of host hardening, and the Docker packages overview.

Docker Hardening for Self-Hosters: Rootless, Read-Only, Cap-Drop, Distroless (2026 Guide)

Why default Docker is not actually safe (and what Docker hardening means)

Layer 1: don’t run the daemon as root (rootless mode)

Layer 2: make the container filesystem read-only

Layer 3: drop every capability, add back only what you need

Layer 4: `no-new-privileges`

Layer 5: don’t run as root inside the container

Layer 6: pick a base image that isn’t a Swiss army knife

Layer 7: secrets don’t belong in your image

Layer 8: network segmentation by default

Layer 9: scan your images, every build

Layer 10: logging that survives the container

Putting it all together: a hardened nginx + PHP-FPM stack

The 60-second Docker hardening checklist

Frequently asked questions

Is Docker actually less secure than a VM?

Should I switch from Docker to Podman?

What about Kubernetes? Doesn’t it handle all this for me?

My app needs to write to its own filesystem for caching. Can I still use `read_only`?

Is Alpine really insecure because of musl?

Should I scan images in production, or only at build time?

What about `privileged: true` for that one container that needs it?

Does any of this help against supply-chain attacks (a malicious package in my base image)?

Related posts

Why default Docker is not actually safe (and what Docker hardening means)

Layer 1: don’t run the daemon as root (rootless mode)

Layer 2: make the container filesystem read-only

Layer 3: drop every capability, add back only what you need

Layer 4: no-new-privileges

Layer 5: don’t run as root inside the container

Layer 6: pick a base image that isn’t a Swiss army knife

Layer 7: secrets don’t belong in your image

Layer 8: network segmentation by default

Layer 9: scan your images, every build

Layer 10: logging that survives the container

Putting it all together: a hardened nginx + PHP-FPM stack

The 60-second Docker hardening checklist

Frequently asked questions

Is Docker actually less secure than a VM?

Should I switch from Docker to Podman?

What about Kubernetes? Doesn’t it handle all this for me?

My app needs to write to its own filesystem for caching. Can I still use read_only?

Is Alpine really insecure because of musl?

Should I scan images in production, or only at build time?

What about privileged: true for that one container that needs it?

Does any of this help against supply-chain attacks (a malicious package in my base image)?

Related posts

Layer 4: `no-new-privileges`

My app needs to write to its own filesystem for caching. Can I still use `read_only`?

What about `privileged: true` for that one container that needs it?