`tar` and Symlinks
Alternate title: “How to fix a failing Docker build that relies on extracting tar contents into a directory that exists as a symlink”. Rejected for length… 🙃
TL;DR: use the -h
option when extracting a tarball if you wish for symlinks at the target extraction destination to be treated as actual directories.
Read on for details…
So, it turns out that extracting a tar file that contains a directory that already exists at the target extraction location produces different behavior depending upon whether the existing target directory exists actually or is a symlink.
Now, having made such a statement, I’m sure you’re thinking, “ahh, there must be a story behind this revelation.” And you would be right to think that. So let me tell you…
I’ve been putting off using Debian bookworm as the base image in my various Docker builds because it was causing build errors. Specifically:
runc run failed: unable to start container process: error during container init: exec: "/bin/sh": stat /bin/sh: no such file or directory
That error made no sense to me. How can /bin/sh
be missing? This started happening when I updated to a specific PHP version Docker image — I don’t remember which version, or when, but it must have been shortly after the release of Debian bookworm and the default base image when pulling, say, php:8.2.28-fpm
became bookworm.
Back at that time, after about 5 minutes, I gave up and specified the Debian bullseye tag: php:8.2.28-fpm-bullseye
. The build worked, and I’ve been carrying on that way ever since — kicking the can down the road.
Well, today, as I was updating a project to PHP 8.4, I initially specified php:8.4.5-fpm
in the Dockerfile, and of course, lo and behold, it was based on bookworm, and it failed with the error above. I confirmed, briefly, that the build would work if I specified the bullseye tag. But then I decided it was time to stop kicking this can down the road.
I will cut out the fumbling intermediate debugging steps and get to the juicy part. As it turns out, one of the first things my Docker builds do is extract a tarball from the S6 Overlay into the root of Docker image’s file structure. Among other things, this tarball contains a bin
directory with some executables.
What I discovered was this: In the bullseye base Debian image, /bin
is an actual directory.
The output of docker run -it --rm --entrypoint "" php:8.4.5-fpm-bullseye bash -c "ls -lah /"
is like so:
You can see that looks normal. Particularly, bin
is a directory, not a symlink.
Ahh, but in bookworm, it is a symlink to /usr/bin
. Observe the result of docker run -it --rm --entrypoint "" php:8.4.5-fpm bash -c "ls -lah /"
:
We now see in the bookworm based image that bin
is a symlink. This is as the root of the problem. I discovered this because I noted, based on the error, that sh
was not in the bin
directory after running my tar extraction. In fact, the only things in the bin
directory were what was in the bin
directory in the S6 tarball. I looked at the contents of bin
before running the tar extraction and saw it containing all the things it should. And that is also when I noticed it was a symlink. Shortening my sleuthing, once again, that eventually led me to this answer on StackExchange. After a little bit more research, sure enough, adding the -h
option to my tar extraction command resulted in an additive merge of the contents of /bin
from the S6 file, just as I wanted.
The Docker build stopped failing, and life could go on with the more recent bookworm based Docker image, rather than continuing to kick the can down the road with bullseye until we come to the point in the future where Docker PHP images aren’t built and tagged for bullseye at all.
Mystery solved. And hopefully, if anyone else runs into this edge case behavior, I’ll have enough Google-foo search juice to lead them here.