Docker rootless - one masquerading bug to rule them all

Page content

Preferbly I use podman, but sometimes you are forced to use docker. Ideally you then switch to docker rootless, because of, well, security.

I noticed that the docker rootless installation instructions are not ideal (e.g. contain a bug). Took me a while to figure it out though, but wanted to share. Especially since this might mean, more people (who explicitly want or need to run docker) can run docker safer (rootless).

Note: I’m assuming you already setup subuids, subgids, net.ipv4.ip_unprivileged_port_start and systemd lingering to suit your situation. And ensure you have already preinstalled slirp4netns for the networking part.

installation

try #1

So for the installation of docker rootless, I’m just following the official documentation. Also I’m unsetting any potential DOCKER_HOST environment variable here (leftover from an old installation, for example), because this can trigger a warning during installation.

user@host:~$ unset DOCKER_HOST ; curl -fsSL https://get.docker.com/rootless | sh
# Installing stable version 24.0.4
# Executing docker rootless install script, commit: acabc63
# Missing system requirements. Please run following commands to
# install the requirements and run this installer again.
# Alternatively iptables checks can be disabled with SKIP_IPTABLES=1

cat <<EOF | sudo sh -x

modprobe ip_tables
EOF

Apparently it is checking for iptables, which I’m not using, because I’m using firewalld. And even if I would use iptables, the user running docker rootless, would not get permissions to change iptables ;-)

try #2

So, now we’ll skip iptables and see if that works.

user@host:~$ export SKIP_IPTABLES=1  ; curl -fsSL https://get.docker.com/rootless | sh
# Installing stable version 24.0.4
# Executing docker rootless install script, commit: acabc63
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
								 Dload  Upload   Total   Spent    Left  Speed
100 66.3M  100 66.3M    0     0  78.0M      0 --:--:-- --:--:-- --:--:-- 78.0M
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
								 Dload  Upload   Total   Spent    Left  Speed
100 19.4M  100 19.4M    0     0  91.7M      0 --:--:-- --:--:-- --:--:-- 91.2M
+ PATH=/opt/user/bin:/opt/user/.local/bin:/opt/user/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin
+ /opt/user/bin/dockerd-rootless-setuptool.sh install --skip-iptables
[INFO] Creating /opt/user/.config/systemd/user/docker.service
[INFO] starting systemd service docker.service
+ systemctl --user start docker.service
+ sleep 3
+ systemctl --user --no-pager --full status docker.service
● docker.service - Docker Application Container Engine (Rootless)
   Loaded: loaded (/opt/user/.config/systemd/user/docker.service; disabled; vendor preset: enabled)
   Active: active (running) since Thu 2023-07-20 21:07:04 CEST; 3s ago
	 Docs: https://docs.docker.com/go/rootless/
 Main PID: 328829 (rootlesskit)
   CGroup: /user.slice/user-700.slice/user@700.service/docker.service
		   ├─328829 rootlesskit --net=slirp4netns --mtu=65520 --slirp4netns-sandbox=auto --slirp4netns-seccomp=auto --disable-host-loopback --port-driver=builtin --copy-up=/etc --copy-up=/run --propagation=rslave /opt/user/bin/dockerd-ro
otless.sh --iptables=false
		   ├─328839 /proc/self/exe --net=slirp4netns --mtu=65520 --slirp4netns-sandbox=auto --slirp4netns-seccomp=auto --disable-host-loopback --port-driver=builtin --copy-up=/etc --copy-up=/run --propagation=rslave /opt/user/bin/dockerd
-rootless.sh --iptables=false
		   ├─328853 slirp4netns --mtu 65520 -r 3 --disable-host-loopback --enable-sandbox --enable-seccomp 328839 tap0
		   ├─328861 dockerd --iptables=false
		   └─328878 containerd --config /run/user/700/docker/containerd/containerd.toml
+ DOCKER_HOST=unix:///run/user/700/docker.sock
+ /opt/user/bin/docker version
Client:
 Version:           24.0.4
 API version:       1.43
 Go version:        go1.20.5
 Git commit:        3713ee1
 Built:             Fri Jul  7 14:49:50 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.4
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.5
  Git commit:       4ffc614
  Built:            Fri Jul  7 14:51:12 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.7.1
  GitCommit:        1677a17964311325ed1c31e2c0a3589ce6d5c30d
 runc:
  Version:          1.1.7
  GitCommit:        v1.1.7-0-g860f061
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
 rootlesskit:
  Version:          1.1.0
  ApiVersion:       1.1.1
  NetworkDriver:    slirp4netns
  PortDriver:       builtin
  StateDir:         /tmp/rootlesskit2737621506
 slirp4netns:
  Version:          1.2.0
  GitCommit:        656041d45cfca7a4176f6b7eed9e4fe6c11e8383
+ systemctl --user enable docker.service
Created symlink /opt/user/.config/systemd/user/default.target.wants/docker.service → /opt/user/.config/systemd/user/docker.service.
[INFO] Installed docker.service successfully.
[INFO] To control docker.service, run: `systemctl --user (start|stop|restart) docker.service`
[INFO] To run docker.service on system startup, run: `sudo loginctl enable-linger user`

[INFO] Creating CLI context "rootless"
Successfully created context "rootless"
[INFO] Using CLI context "rootless"
Current context is now "rootless"
Warning: DOCKER_HOST environment variable overrides the active context. To use "rootless", either set the global --context flag, or unset DOCKER_HOST environment variable.

[INFO] Make sure the following environment variable(s) are set (or add them to ~/.bashrc):
export PATH=/opt/user/bin:$PATH

[INFO] Some applications may require the following environment variable too:
export DOCKER_HOST=unix:///run/user/700/docker.sock

Docker is now installed nicely in the user’s home directory. We just need to set the DOCKER_HOST variable, so we can actually use the docker commands:

echo 'export DOCKER_HOST=unix://$XDG_RUNTIME_DIR/docker.sock' >> ~/.bashrc
. ~/.bashrc

finished?

The installation is finished and docker rootless is working perfectly. So we’re done, right?

Well, not quite. Although all the containers can be accessed remotely (from the network), but the containers themselves, do not seem to be able to make outgoing connections.

no outgoing connections

From a container, I could not connect towards the outside world.

user@host:~$ docker run -ti alpine ping -c1 1.1.1.1
PING 1.1.1.1 (1.1.1.1): 56 data bytes

--- 1.1.1.1 ping statistics ---
1 packets transmitted, 0 packets received, 100% packet loss

But when I entered the namespace, I could connect to the outside world:

user@host:~$ nsenter -U --preserve-credentials -n -t $(pgrep dockerd) ping -c1 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=255 time=4.20 ms

--- 1.1.1.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 4.501/4.501/4.501/0.000 ms

Then I started to look at the networking:

user@host:~$ nsenter -U --preserve-credentials -n -t $(pgrep dockerd) ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
	inet 127.0.0.1/8 scope host lo
	   valid_lft forever preferred_lft forever
	inet6 ::1/128 scope host 
	   valid_lft forever preferred_lft forever
2: tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc fq_codel state UP group default qlen 1000
	link/ether 5e:4f:2d:09:95:12 brd ff:ff:ff:ff:ff:ff
	inet 10.0.2.100/24 scope global tap0
	   valid_lft forever preferred_lft forever
	inet6 fe80::5c4f:2dff:fe09:9612/64 scope link 
	   valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
	link/ether 02:42:0f:38:47:00 brd ff:ff:ff:ff:ff:ff
	inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
	   valid_lft forever preferred_lft forever
user@host:~$ nsenter -U --preserve-credentials -n -t $(pgrep dockerd) ip route show
default via 10.0.2.2 dev tap0 
10.0.2.0/24 dev tap0 proto kernel scope link src 10.0.2.100 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 

In that namespace, you can see the tap0 NIC (slirp4netns) and docker0 NIC for your containers. So essentially it acts as a router. Looking into the iptables rules there, I noticed them being empty. So I enabled masquerading (blast from the past. e.g. have not used this in a long time):

nsenter -U --preserve-credentials -n -t $(pgrep dockerd) iptables -t nat -A POSTROUTING -o tap0 -j MASQUERADE

And that worked! Containers now have outgoing connection capabilities!

user@host:~$ docker run -ti alpine ping -c1 1.1.1.1
PING 1.1.1.1 (1.1.1.1): 56 data bytes
64 bytes from 1.1.1.1: seq=0 ttl=254 time=4.366 ms

--- 1.1.1.1 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 4.366/4.366/4.366 ms

Unfortunately this configuration does not survive a reboot. It won’t even survive a systemctl --user restart docker.service.

And to be honest, I don’t like the default IP ranges, not at all. So I will modify them.

the docker.service file

So docker rootless creates this service file: ~/.config/systemd/user/docker.service

[Unit]
Description=Docker Application Container Engine (Rootless)
Documentation=https://docs.docker.com/go/rootless/

[Service]
Environment=PATH=/opt/user/bin:/sbin:/usr/sbin:/opt/user/bin:/opt/user/.local/bin:/opt/user/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin
ExecStart=/opt/user/bin/dockerd-rootless.sh  --iptables=false
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always
StartLimitBurst=3
StartLimitInterval=60s
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
Delegate=yes
Type=notify
NotifyAccess=all
KillMode=mixed

[Install]
WantedBy=default.target

Immediately I noticed the iptables bit. Wait, this couldn’t be, could it?

the solution

Looking through the code, I noticed I could use an environment variable to achieve my goal. Note: I’ve added some extra configuration, because I wanted to use different IP ranges.

docker.service (with a custom IP range for tap0):

[Unit]
Description=Docker Application Container Engine (Rootless)
Documentation=https://docs.docker.com/go/rootless/

[Service]
Environment=PATH=/opt/user/bin:/sbin:/usr/sbin:/opt/user/bin:/opt/user/.local/bin:/opt/user/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin
Environment=DOCKERD_ROOTLESS_ROOTLESSKIT_FLAGS="--cidr=192.168.1.0/24"
ExecStart=/opt/user/bin/dockerd-rootless.sh
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always
StartLimitBurst=3
StartLimitInterval=60s
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
Delegate=yes
Type=notify
NotifyAccess=all
KillMode=mixed

[Install]
WantedBy=default.target

.config/docker/daemon.json (with a custom IP/range for docker0):

{
  "bip": "192.168.2.1/24"
}

Now we only need to (restart) docker:

systemctl --user stop docker.service
systemctl --user daemon-reload
systemctl --user start docker.service

PROFIT!