At first glance, writing Dockerfiles appears to be a straightforward process. After all, most basic examples reflect the same set of steps. However, not all Dockerfiles are created equal. There is an optimal way of writing these files to produce the kind of Docker images you want for your final product. If you were to pop the hood, you’d see that Docker images actually consist of file system layers that correlate to the individual build steps involved in the creation of the image.
To build an efficient Docker image, you want to eliminate some of these layers from the final output. Optimizing an image can take a lot of effort as you attempt to filter out what you don’t need. Building these kinds of images has a lot to do with keeping them as small as possible, because large images lead to a host of other issues.
One approach to keeping Docker images small is using multistage builds. A multistage build allows you to use multiple images to build a final product. In a multistage build, you have a single Dockerfile, but can define multiple images inside it to help build the final image.
In this post, you’ll learn about the core concepts of multistage builds in Docker and how they help to create production-grade images. In addition, I will detail how you can create these types of files, as well as highlight some challenges that they present.
Core Concepts of Docker Multistage Builds
Before getting to the details of multistage builds, it’s good to have an understanding of the main idea behind them. You’re already familiar with the starting point of container images, the Dockerfile. Dockerfiles are text files that make it easy to assemble all the relevant commands required to create your container image. These commands in the Dockerfile should be combined whenever possible for the sake of optimization.
When I first came across this idea of building the right kind of Docker image, I asked myself a few questions that may be crossing your mind as you read this. What are the implications of not optimizing an image? Can it really have that much of a negative impact on your application? The answer to the latter question is yes.
As for the implications, you typically end up with bigger images. The reason big images should be avoided is because they increase both potential security vulnerabilities and the surface area for attack. You definitely want to keep things lean by ensuring you only have what your application needs to run successfully in a production environment.
One way of reducing the size of your Docker images is through the use of what is informally known as the builder pattern. The builder pattern uses two Docker images to create a base image for building assets and the second to run it. This pattern was previously implemented through the use of multiple Dockerfiles. It has become an uncommon practice since the introduction and support of multistage builds in Docker 17.06 CE.
For context, it’s good to understand how this was typically done before multistage builds. The following example makes use of a basic React application that is first built and then has its static content served by an Nginx virtual server. Following are the two Dockerfiles used to create the optimized image. In addition, you’ll see a shell script that demonstrates the Docker CLI commands that have to be run in order to achieve this outcome. You can find the source code for this example in this repository.
Dockerfile.build FROM node:12.13.0-alpine WORKDIR /app COPY package*.json ./ RUN npm install COPY . . RUN npm run build
Dockerfile.main FROM nginx EXPOSE 3000 COPY ./nginx/default.conf /etc/nginx/conf.d/default.conf COPY /app/build /usr/share/nginx/html
Build.sh #!/bin/sh echo Building lukondefmwila/react:build docker build -t lukondefmwila:build . -f Dockerfile.build docker create --name extract lukondefmwila:build docker cp extract:/app/build ./app docker rm -f extract echo Building lukondefmwila/react:latest docker build --no-cache -t lukondefmwila/react:latest . -f Dockerfile.main
While using the builder pattern does give you the desired outcome, it presents additional challenges. This process introduces the management overhead that comes with maintaining multiple Dockerfiles—not to mention the cumbersome procedure of running through several Docker CLI commands, even if this can be streamlined by a shell script.
How to Use Docker Multistage Builds
Now that you get the underlying concept, turn your attention to how this translates to the modern implementation of the builder pattern. What the former approach accomplishes with multiple Dockerfiles, the multistage feature does in one. You can get the same results with your builds without the added complexity.
Multistage builds make use of one Dockerfile with multiple FROM instructions. Each of these FROM instructions is a new build stage that can COPY artifacts from the previous stages. By going and copying the build artifact from the build stage, you eliminate all the intermediate steps such as downloading of code, installing dependencies, and testing. All these steps create additional layers, and you want to eliminate them from the final image.
The build stage is named by appending AS name-of-build to the FROM instruction. The name of the build stage can be used in a subsequent FROM and COPY command by providing a convenient way to identify the source layer for files brought into the image build. The final image is produced from the last stage executed in the Dockerfile.
Try taking the example from the previous section that used more than one Dockerfile for the React application and replacing the solution with one file that uses a multistage build.
Dockerfile FROM node:12.13.0-alpine as build WORKDIR /app COPY package*.json ./ RUN npm install COPY . . RUN npm run build FROM nginx EXPOSE 3000 COPY ./nginx/default.conf /etc/nginx/conf.d/default.conf COPY --from=build /app/build /usr/share/nginx/html
This Dockerfile has two FROM commands, with each one constituting a distinct build stage. These distinct commands are numbered internally, stage 0 and stage 1 respectively. However, stage 0 is given a friendly alias of
build. This stage builds the application and stores it in the directory specified by the WORKDIR command. The resultant image is over 420 MB in size.
The second stage starts by pulling the official Nginx image from Docker Hub. It then copies the updated virtual server configuration to replace the default Nginx configuration. Then the COPY –from command is used to copy only the production-related application code from the image built by the previous stage. The final image is approximately 127 MB.