Using Privileged Ports with Docker Rootless on NixOS

TL;DR: To use Docker in rootless mode on NixOS and still be able to let your containerized workloads bind to [privileged ports](https://www.w3.org/Daemon/User/Installation/PrivilegedPorts.html), put the following in your `configuration.nix` and rebuild: ```nix virtualisation.docker.enable = true; virtualisation.docker.rootless = { enable = true; setSocketVariable = true; }; security.wrappers = { docker-rootlesskit = { owner = "root"; group = "root"; capabilities = "cap_net_bind_service+ep"; source = "${pkgs.rootlesskit}/bin/rootlesskit"; }; }; ``` The purpose of this post is mainly to document how I poked around my NixOS installation to figure the solution out. Until this post is written, this has been an unanswered [question](https://discourse.nixos.org/t/rootless-docker-with-ports-1024/27037). ## Rootless Docker on NixOS This blog site runs on a few Docker containers for ease of management, because Dockerized applications are very easy to backup, migrate and upgrade. During the process of [migrating](https://im.salty.fish/index.php/archives/site-upgrade.html) my blog server, I switched from Debian to NixOS to make my configuration more reproducible and hence further reduce the cost of future migrations. However, to keep the same level of flexibility should I want to reconfigure the blog service, I don't want to let NixOS manage caddy (which powers this website), and instead intend to keep the actual workload containerized. Luckily NixOS has a nice wrapper around Docker that makes it easy: ```nix virtualisation.docker.enable = true; ``` Having this line alone in your `configuration.nix` enables Docker for users in the `docker` group, just like a standard installation of Docker on Debian. I could have stopped here, added myself to the `docker` group and call it a day, but when I was [searching](https://search.nixos.org/options?channel=23.11&from=0&size=50&sort=relevance&type=packages&query=virtualisation.docker) for NixOS options I noticed that it supports running Docker in [rootless mode](https://docs.docker.com/engine/security/rootless/). NixOS supports it with the following configuration: ```nix virtualisation.docker.rootless = { enable = true; setSocketVariable = true; }; ``` ## Problem: Using Privileged Ports Going rootless is a very nice bonus point for enhanced security, and I quickly decided that this website does not require root privileges to work, except when it tries to listen on port 80 (merely for redirecting visitors to port 443) and 443. Docker lists this as a (obvious) limitation of the rootless mode and provides a workaround: > To expose privileged ports (< 1024), set `CAP_NET_BIND_SERVICE` on rootlesskit binary and restart the daemon ... Or add `net.ipv4.ip_unprivileged_port_start=0` to `/etc/sysctl.conf` (or `/etc/sysctl.d`) and run `sudo sysctl --system`. > [name=[Docker documentation](https://docs.docker.com/engine/security/rootless/#exposing-privileged-ports)] [color=#5468ff] However, it is not trivial to apply this trick on NixOS in a reproducible way... ## Podman Some Podman fans might want to shout: *Just use Podman! It is rootless by default!* Yes, I know that, and I have tried Podman. Since privileged ports pose a very fundamental problem, Podman will also need to come up with its own workarounds. Unfortunately, Podman's [official instructions](https://github.com/containers/podman/blob/main/rootless.md) simply tells you to disregard the concept of "privileged ports" in the first place, by taking the second approach suggested by Docker and setting `ip_unprivileged_port_start` to zero. Some people believe that [privileged ports must die](https://news.ycombinator.com/item?id=32669838), and I have to admit this makes some sense to me. However, I am a very conventional security paranoid, and I want to do the thing *right*. It is just personal opinion - but I don't feel it right to disregard privileged ports entirely. ## The NixOS Way Apparently Docker relies on the rootlesskit utility to perform certain namespace operations. Thus, if rootlesskit has the capability to bind to privileged ports (`CAP_NET_BIND_SERVICE`), Docker will also be able to. Let's see why we cannot just directly run `setcap cap_net_bind_service=ep $(which rootlesskit)`: ``` $ which rootlesskit which: no rootlesskit in (/run/wrappers/bin:...) ``` NixOS does not expose rootlesskit to the user environment directly because the user did not explicitly install it (nor Docker - we installed Docker using the NixOS-managed `virtualisation.docker`). But how is Docker able to make use of it then? NixOS's official [build script for Docker](https://github.com/NixOS/nixpkgs/blob/4298c7f1bcfd465d807d9fc099ab905c58dc5af1/pkgs/applications/virtualization/docker/default.nix) (commit `4298c7f`, when this post is written) pulls rootlesskit in as a dependency for [`moby`](https://mobyproject.org/), and generates a wrapper script named `docker-rootless` by: ```nix=99 extraUserPath = lib.optionals (stdenv.isLinux && !clientOnly) (lib.makeBinPath [ rootlesskit slirp4netns fuse-overlayfs ]); ``` ```bash=156 install -Dm755 ./contrib/dockerd-rootless.sh $out/libexec/docker/dockerd-rootless.sh makeWrapper $out/libexec/docker/dockerd-rootless.sh $out/bin/dockerd-rootless \ --prefix PATH : "$out/libexec/docker:$extraPath:$extraUserPath" ``` As we see in the code, the wrapper has a prefixed `$PATH` that contains the location of the `rootlesskit` executable. We can further verify this: ``` $ which dockerd-rootless /run/current-system/sw/bin/dockerd-rootless ``` This wrapper script is just a shebang line followed by around 10 chunks of things that look like: ```bash PATH=${PATH:+':'$PATH':'} PATH=${PATH/':''/nix/store/2xf0l5hw5cjd0b7v5af3lzw6afn4ybrc-rootlesskit-1.1.1/bin'':'/':'} PATH='/nix/store/2xf0l5hw5cjd0b7v5af3lzw6afn4ybrc-rootlesskit-1.1.1/bin'$PATH PATH=${PATH#':'} PATH=${PATH%':'} export PATH ``` and finally this line: ```bash exec "/nix/store/miacnd4f2k0d08gyfc2q0bfm3rv1fvq5-moby-24.0.5/libexec/docker/dockerd-rootless.sh" "$@" ``` However, since the Nix store is read-only (NixOS needs this to ensure reproducibility, the most important guarantee it provides), we cannot simply `setcap` it: ``` # setcap cap_net_bind_service=ep /nix/store/2xf0l5hw5cjd0b7v5af3lzw6afn4ybrc-rootlesskit-1.1.1/bin/rootlesskit Failed to set capabilities on file '/nix/store/2xf0l5hw5cjd0b7v5af3lzw6afn4ybrc-rootlesskit-1.1.1/bin/rootlesskit': Read-only file system ``` ## Capabilities on NixOS: Security Wrappers Occasionally a user would want to use capabilities, and NixOS does support that use case with [security wrappers](https://search.nixos.org/options?channel=23.11&show=security.wrappers&from=0&size=50&sort=relevance&type=packages&query=security.wrappers). These are simple binary wrappers that directly `execve`'s the target. They have to be binary files because capabilities of text scripts, just like the `setuid` bit, will not be honored by Linux for [security reasons](https://unix.stackexchange.com/questions/364/allow-setuid-on-shell-scripts). To create such a wrapper, you would do: ```nix security.wrappers = { rootlesskit = { owner = "root"; group = "root"; capabilities = "cap_net_bind_service+ep"; source = "${pkgs.rootlesskit}/bin/rootlesskit"; }; }; ``` Since rootlesskit is, after all, installed as a NixOS package, you can use `${pkgs.rootlesskit}` to refer to its installation root. After applying your new configuration, you should be able to see your newly created wrapper: ``` $ which rootlesskit /run/wrappers/bin/rootlesskit ``` ## Making Docker Use the Wrapper: NixOS Overlays (and why it will not work) However, Docker will still not be able to pick this wrapper up, and will continue to use the vanilla unwrapped rootlesskit. Why? Because from `dockerd-rootless.sh`'s perspective, `${pkgs.rootlesskit}/bin`, as a *prefix* to its `$PATH`, has higher precedence over `/run/wrappers/bin` in the normal `$PATH`. To change this behavior, it seems that we need to edit the NixOS package. NixOS provides three ways to do so: 1. Grab the official build script, modify it and use the modified version from then on. This is similar to maintaining your own fork of some open-source software and never upstreaming your changes. You will gradually have more and more maintenance burdens and eventually get exhausted just to keep yourself up-to-date. Also, this is not helpful to the open source community and is considered unhealthy by me. 2. Grab the official build script, modify it and upstream your changes. This is the best option, but I am a very new NixOS user and don't have much time to look into it to create an acceptable patch. If you are experienced with NixOS, I recommend you to do this whenever possible. 3. Use an ["overlay"](https://nixos.wiki/wiki/Overlays) to stack your changes on top of the official build script. Let's take a look at an overlay. It uses a syntax resembling this: ```nix final: prev: { whatever = prev.whatever.overrideAttrs(...); } ``` This basically allows you to take some attributes from the official build script (`prev`), override it, and create a new object (`final`) identical to the original but with your changes applied. Theoretically, we can find the code that generates the wrapper script, and remove `rootlesskit` from the `$PATH` prefix applied there. Let's read a simplified version of the official build script at commit `4298c7f`, and see why we cannot do that: ```nix { lib, callPackage, fetchFromGitHub }: rec { dockerGen = { version , ... }: let docker-runc = ...; docker-containerd = ...; docker-tini = ...; moby = buildGoPackage (lib.optionalAttrs stdenv.isLinux rec { ... installPhase = '' ... # rootless Docker install -Dm755 ./contrib/dockerd-rootless.sh $out/libexec/docker/dockerd-rootless.sh makeWrapper $out/libexec/docker/dockerd-rootless.sh $out/bin/dockerd-rootless \ --prefix PATH : "$out/libexec/docker:$extraPath:$extraUserPath" ''; ... in buildGoPackage (lib.optionalAttrs (!clientOnly) { # allow overrides of docker components # TODO: move packages out of the let...in into top-level to allow proper overrides inherit docker-runc docker-containerd docker-tini moby; } // rec { pname = "docker"; inherit version; src = fetchFromGitHub { owner = "docker"; repo = "cli"; rev = cliRev; hash = cliHash; }; ... }); ... } ``` There literally is a comment that reads "TODO: move packages out of the let...in into top-level to allow proper overrides". Since the code we want to override resides in a `let` statement, we don't even have a handle to refer to it, let alone overriding it. If we do it anyway, we'll have to override a great portion of this script - so much code that it almost falls back to maintaining our own version of the build script. ## The "Hack": Make Docker Recognize Our Wrapper, "Properly" Does that mean we are out of options? No! If you pay enough attention to the `dockerd-rootless` wrapper, you may have noticed that it invokes `/nix/store/miacnd4f2k0d08gyfc2q0bfm3rv1fvq5-moby-24.0.5/libexec/docker/dockerd-rootless.sh` at the end - it is a *script* rather than a *binary*! And this is the logic it uses to determine what `rootlesskit` it shall use. ```bash=36 rootlesskit="" for f in docker-rootlesskit rootlesskit; do if command -v $f > /dev/null 2>&1; then rootlesskit=$f break fi done ``` Interestingly, `docker-rootlesskit` has higher precedence over `rootlesskit`, *probably* to allow for the co-existence of a separate `rootlesskit` exclusively used by Docker. Since this name is not yet used anywhere, we can leverage this finding to create a wrapper named `docker-rootlesskit` such that `dockerd` will choose that over the standard `rootlesskit`. This takes us to the final solution documented at the beginning of this post (the TL;DR).