Docker has become one of my favorite tools to help with challenges of different build and run environments. When I want to create even a small example project, I usually start by creating first the Dockerfile for the build environment. When I have had to distribute more complex work for the client, images/containers have made the process almost painless.
I wanted to write down some of the things that have made my life much easier while using Docker.
Basic recap
This is post is not intended as a full tutorial / getting started guide, but let’s recap quickly. Basic Dockerfile usually looks something like this:
# base image on top of which we build our own
FROM ubuntu:20.04
# Fetch package index and install packages
RUN apt-get update && apt-get install -y nano
# In which directory next commands will be applied.
# Also last WORKDIR will be CWD when the container is run
WORKDIR /data
# Copy data from the build directory to the image
COPY . /data
# Entrypoint is the default executable to be run when
# using the image
ENTRYPOINT ["ls", "-la"]
I recommend looking through the Docker documentation for the Dockerfile for a lot more options.
To build the image we usually run:
docker build -t name-of-the-image .
And finally we run the image:
# --rm = remove container when it is stopped
docker run --rm name-of-the-image
Some additional useful commands
# List running and stopped containers
docker ps -a
# List disk space usage of Docker
docker system df -v
# Remove non running images/cache. Does not touch volumes (add --volumes to prune volumes too)
# Prompts for confirmation
docker system prune
# Monitor what is happening in the "docker system" by printing messages of started/exiting containers, networks etc
docker system events
Mounting volumes with -v
One of the first things I wanted to do with docker was to modify my host machines file system. This can be easily achieved with mounting volumes / binding. Easy mistake to make is to assume that if you have directory data
in your current directory, and you want to mount it to /data
, that you could do this with
# Current PWD directory structure
# . .. data build
# Following will not mount `data` directory, but will create a
# named volume `data`:
docker run --rm -v data:/data name-of-the-image
# Clean up
docker volume ls
docker volume rm data
Correct way to bind directories is to always use full paths. When using scripts to run the container, this gets bit more tricky as user can have the script in any directory. Fortunately Unix tools help with this:
docker run --rm -v `readlink -f data`:/data name-of-the-image
# Or when running git bash in Windows
docker run --rm -v `pwd -W`/data:/data name-of-the-image
readlink -f relative-path
will return the full path for the relative-path
. In Windows when using git bash, the paths returned by readlink
are mangled and it would be better to use PowerShell, or if only subdirectories need to be mounted, pwd -W
will work.
More robust way to apt-get update && apt-get install
Using just apt-get update && apt-get install
will leave files in the final image and cause problems with caching and increase the image size. It is recommended to clean up after installing packages. Another good option to try out is adding --no-install-recommends
to minimize number of additionally installed packages.
FROM ubuntu:20.04
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential && \
rm -rf /var/lib/apt/lists/*
BuildKit
BuildKit is something that has been integrated into newish versions of Docker to help with the build process. Most notable feature, at least for me, has been how you can handle secrets with it. To use the BuildKit features, the docker build
needs to be run with
DOCKER_BUILDKIT=1 docker build .
Or alternatively you can enable it in /etc/docker/daemon.json
Addition to the env variable, the Dockerfile itself needs to start with following line:
#syntax = docker/dockerfile:1.2
FROM .....
On some earlier versions of Docker you might need to use line:
# syntax = docker/dockerfile:1.0-experimental
FROM .....
Secrets
Sure way to shoot yourself in the foot is to copy any credentials into the image while building it. For example following will leak your ssh keys:
COPY .ssh /home/root/
RUN git clone git@github.com:some/internal_repo.git && \
<build> && \
rm -rf /home/root/.ssh
Correct way to use secrets while building the image is to use build in secrets sharing. This complicates the build command a bit, but things are easily fixed with small bash script:
#!/bin/bash
if [[ $_ == $0 ]]
then
echo "Script must be sourced"
exit 1
fi
# To use ssh credentials (--mount=type=ssh), we need
# ssh-agent. Many desktop environments already have the
# service running in the background, but for more bare
# bones desktop envs might need to start the service
# separately.
if [[ -z "${SSH_AUTH_SOCK}" ]]; then
echo Starting ssh-agent
eval `ssh-agent -s`
ssh-add
fi
# If your organization uses custom DNS, copying the resolv.conf
# could help with the build.
cp /etc/resolv.conf .
# The `--ssh=default` arg is the most important one
# and works 90% of the cases when using git clone.
# .netrc many times contains tokens which are used with
# services such as Artifactory.
# .gitconfig helps with repo tool.
DOCKER_BUILDKIT=1 docker build --secret id=gitconfig,src=$(readlink -f ~/.gitconfig) --secret id=netrc,src=$(readlink -f ~/.netrc) --ssh=default -t image-name .
This script can then be run with
source name_of_the_script.sh
To use the secrets we need to tiny modifications to the Dockerfile, mostly to the lines which need the secrets:
#syntax = docker/dockerfile:1.2
FROM ubuntu:20.04
RUN apt-get update && apt-get install -y git && \
rm -rf /var/lib/apt/lists/*
# If we need custom DNS. This is not considered a secret and is
# left in the final image
COPY resolv.conf /etc/resolv.conf
# To prevent cli asking if we trust the host,
# which requires interaction, we should add
# new hosts to ~/.ssh/known_hosts
RUN mkdir -p ~/.ssh/ && ssh-keyscan -t rsa github.com > ~/.ssh/known_hosts
RUN --mount=type=secret,id=netrc,dst=/root/.netrc \
--mount=type=secret,id=gitconfig,dst=/root/.gitconfig \
--mount=type=ssh \
git clone git@github.com:some/internal_repo.git
For more information, Docker docs helps 🙂.
This post also had quite concise explanation of the alternatives and why not to use them.
Caching
To speed up the build process Docker caches every layer. Sometimes this causes problems and confusion when newest data is not used for building. For example when trying to clone a git repo and building the content, it is easy to forget that the RUN git clone git@github.com:buq2/cpp_embedded_python.git
is in the cache and no new commits will not be copied to the image.
One way to solve this is to completely bypass the cache and run the build with
docker build --no-cache -t name-of-the-image .
Usually this is too harsh option, and it could be better to just trash the cache by changing something before the clone command and this way forcing docker to reclone the source.
FROM ubuntu:20.04
RUN apt-get update && apt-get install -y build-essential && \
rm -rf /var/lib/apt/lists/*
ARG CACHE_TRASH=0
RUN git clone git@github.com:buq2/cpp_embedded_python.git
Now we can build the image with:
docker build --build-arg CACHE_TRASH=$RANDOM -t name-of-the-image
Multi-stage builds
One of the best ways to clean up the the image and save space is multi-stage builds. In multi-stage build data from previously build image is used to construct the final image.
# First stage containing common components
FROM ubuntu:20.04 as base
RUN apt-get update && apt-get install -y \
some dependencies && \
rm -rf /var/lib/apt/lists/*
# Second stage that builds the binaries
FROM base as build
RUN apt-get update && apt-get install -y \
build-essential cmake git && \
rm -rf /var/lib/apt/lists/*
WORKDIR /src
RUN git clone git@github.com:some/repo.git && \
cmake -S repo -B build && \
cmake --build build --parallel 12
# Third stage that is exported to final image
FROM base as final
COPY --from=build /src/build/bin /build/bin
EXECUTABLE ["/build/bin/executable"]
Minimizing image size vs minimizing download time
To make sure that the final image contains only data that was visible in the last layer, we can use --squash
when building the image. This will produce image with only single layer.
docker build --squash -t name-of-the-image
Unfortunately even though the image size is small, it is often slower to download than multilayer image, as Docker will download multiple layers in parallel.
In a one case when trying to minimize image download time for an image which contained a 15GB layer produced by single RUN command, I created a python script which sliced the huge directory structure into multiple smaller directories and then used a multi stage build to copy and reconstruct the directory structure. This way the 15GB download was sped up significantly.
No, I don’t want to choose the keyboard layout
Sometimes when installing packages the build process stops and prompts keyboard layout. To avoid this DEBIAN_FRONTEND
env should be set:
FROM ubuntu:20.04
ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
i3 && \
rm -rf /var/lib/apt/lists/*
Running graphical apps inside of the container
Sometimes it would be nice to run graphical apps inside the container. Fortunatley this is quite easy with x11docker.
curl -fsSL https://raw.githubusercontent.com/mviereck/x11docker/master/x11docker | sudo bash -s -- --update
x11docker x11docker/xfce xfce4-terminal
On Windows GWSL seems to be very easy to use.
GUI apps should also be possible on OSX, but for me xquartz has not worked really well even without Docker, so I have not tried this one out.
GUI apps without x11docker on Linux
It’s possible to run GUI apps in Linux without x11docker, but it’s much more brittle:
docker run -it --rm --net=host -e DISPLAY=$DISPLAY --volume="$HOME/.Xauthority:/root/.Xauthority":ro x11docker/xfce xfce4-terminal
Packaging complex application with Docker and Python script
When packaging an application with Docker, it would feel nice to just send the pull and run commands to the customer. But when the application requires mounting/binding volumes, opening ports, running GUI, it soon becomes nightmare to document and explain the run command for all the use cases.
I have started to write small Python scripts for running the container. My bash skills are bit limiting and every time I have written a bash script for running the container, I have ended up converting it to a Python script.
Here is quick example
#!/usr/bin/env python3
import argparse
import os
def get_image_name(args):
return 'name-of-the-image'
def get_volumes(args):
path = os.path.abspath(os.path.realpath(os.path.expanduser(args.data_path)))
if not os.path.exists(path):
raise RuntimeError('Path {} does not exists'.format(path))
cmd = '-v {path}:/data:ro'.format(path=path)
return cmd
def get_command(args):
return 'ls -la'
def run_docker(args):
cmd = 'docker run {volumes} {image_name} {command}'.format(
volumes=get_volumes(args),
image_name=get_image_name(args),
command=get_command(args))
os.system(cmd)
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--data_path', 'Mounted path')
args = parser.parse_args()
run_docker(args)
if __name__ == '__main__':
main()
Inspecting images and linting
Sometimes you want to get bit more information how the image has been build and what kind of data it contains. Fortunately there is some tools to do this.
Dive is command line tool that processes the image and can display what changed in each of the previous layers. As a Docker tool, it is of course available as a Docker image:
docker run --rm -it \
-v /var/run/docker.sock:/var/run/docker.sock \
wagoodman/dive:latest name-of-the-image-to-analyze
Second useful tool is Hadolint which let’s you know how poorly you made your Dockerfile.
docker run --rm -i hadolint/hadolint < Dockerfile
This Hacker News post listed few other interesting looking tools, but I have had no change to really test them out.