How the Docker Image Is Stored on the Host Machine

22 minute read     Updated:

Sriram Ramanujam %
Sriram Ramanujam

This article explains how to create Docker images. Earthly optimizes the build process for Docker images, ensuring reproducibility and cache efficiency. Learn more about Earthly.

Developers and system administrators can create, deploy, and run distributed applications using the Docker platform. Docker containers, which offer a constant and isolated environment for applications to execute in, are running instances of Docker images. They make it simple to maintain, scale, and guarantee that applications function reliably across many contexts.

However, what exactly is an image? What does it include, and where does it reside after we have built it? This article will answer all these questions by explaining the internals of Docker images and how these images are stored on the host machine.

Docker Images

DockerImages

Docker images are pre-built packages (portable artifacts) that contain all the files, configuration, and dependencies needed to run a piece of software. They are used to build and run Docker containers, which are isolated environments that allow applications to run in a predictable and consistent way, regardless of the host environment.

Images are created using a layered file structure, making it simple to share, reuse, and update them. It is the simple cut-down version of the operating system, however it doesn’t include a kernel or a driver. This is because the host system on which Docker Engine is installed provides these vital elements for the proper functioning of containers. They can be pulled from a centralized image repository such as Docker Hub.

For instance, the containerized version of the Ubuntu 22.04 image is 77.8 MB, whilst the official ISO image weighs 3.6 GB. The size has shrunk by about 98%.

$ docker images

REPOSITORY               TAG       IMAGE ID       CREATED         SIZE
ubuntu                   22.04     a8780b506fa4   5 weeks ago     77.8MB

Docker Layers

DockerLayers

A Docker layer is a modification to an image file system. The addition of each command in the Dockerfile forms a new image layer. Docker images are formed by stacking multiple read-only layers on top of each other. The following example shows layers for each executed command:

$ docker history openjdk:8-jdk-alpine-with-bash-cli-mode

IMAGE          CREATED        CREATED BY                                      SIZE      COMMENT
be9564ccef2f   3 months ago   /bin/sh                                         5.43MB    
a3562aa0b991   3 years ago    /bin/sh -c set -x  && apk add --no-cache   o…   99.3MB

Platforms like Docker and Podman bring the layers together and show them as a single unified object. Each layer also gets its own unique hash identifier for references. The image manifest file contains the layers details associated with the image in the registry.

If multiple images share the same read-only layer, the layer will only be downloaded once. This approach saves space, reduces network load, and reuses parts of images wherever possible.

The snippets below show that the two images (edc5a3f3b57b, d181adc2b1e1) share a common first layer. Here, we use the docker inspect command to get a detailed view of the Docker objects configuration and state. The output also includes information such as layer ID, name, labels, network settings, and so on:

$ docker inspect --format='{{json .RootFS}}' edc5a3f3b57b | jq
{
  "Type": "layers",
  "Layers": [
    "sha256:994393dc58e7931862558d06e46aa2bb17487044f670f310dffe1d24e4d1eec7",
    "sha256:4ccc88f068b780237695d1377f4b6f218e5cc3a31ca3c32e4d1d6394fc46991a"
  ]
}

$ docker inspect --format='{{json .RootFS}}' d181adc2b1e1 | jq
{
  "Type": "layers",
  "Layers": [
    "sha256:994393dc58e7931862558d06e46aa2bb17487044f670f310dffe1d24e4d1eec7",
    "sha256:265ed93d82c9d370b4dc31431c1691d7019667b203745ceaf733cae0daba8374",
    "sha256:d956d5eab6ace64c1fe66c1e5080f39ef5cb35fa632d85376067852a9bbabd2d"
  ]
}

When we start a container, Docker uses its basic image and adds an interim read/write layer on top of it. It helps to store the changes made to the container’s file system. It also makes the containers so flexible and portable. Here, only the temporary layer gets either created or destroyed, the base image remains the same. This way,, it is possible to reuse the same basic image repeatedly without mutating its state.

If we spin up three instances of an image, Docker only uses one copy of the base image to create three temporary layers instead of three copies of the base image. These temporary layers in Docker help to store information about the environment variables, files, and other data that are added to the image during the build process. This layered approach is secure as each layer is separate from the others and cannot see the contents of the other layers.

We generally pull the images from the Docker Hub registry. Though Docker Hub is one of the most popular registries, you can also use others such as Amazon ECR, Redhat Quay, and more.

Let’s use the docker pull command to pull the latest MySQL image from DockerHub.

$ docker pull mysql:latest
latest: Pulling from library/mysql
0ed027b72ddc: Pull complete
0296159747f1: Pull complete
3d2f9b664bd3: Pull complete
df6519f81c26: Pull complete
36bb5e56d458: Pull complete
054e8fde88d0: Pull complete
f2b494c50c7f: Pull complete
132bc0d471b8: Pull complete
135ec7033a05: Pull complete
5961f0272472: Pull complete
75b5f7a3d3a4: Pull complete
Digest: sha256:3d7ae561cf6095f6aca8eb7830e1d14734227b1fb4748092f2be2cfbccf7d614
Status: Downloaded newer image for mysql:latest
docker.io/library/mysql:latest

As we see, the image is downloaded from the Docker Hub blob store in numerous layers. The lines with pull complete in the output denote the image layers.

Here, our mysql:latest image has eleven read-only layers stacked on top of each other to form a single, cohesive image object. Further, let’s get the list of images in our local repository using the docker images command.

$ docker images

REPOSITORY         TAG       IMAGE ID       CREATED         SIZE
mysql              latest    7484689f290f   2 days ago      538MB
ubuntu             22.04     a8780b506fa4   5 weeks ago     77.8MB

Yet another method to view the layers of an image is by using the docker inspect command.

Now let’s examine the mysql:latest image using the docker inspect command and the image ID and obtain the SHA256 hashes for each layer.

$ docker inspect 7484689f290f

Here, the first line in the layer’s section represents the base layer of the image. In the below snippet the first layer or base layer has an SHA256 hash of d3cc7b6aa7bc. The second layer’s hash is a7f421510691 and so on.

[
    {
      "Id": "sha256:7484689f290f1defe06b65befc54cb6ad91a667cf0af59a265ffe76c46bd0478",
       "RepoTags": [
            "mysql:latest"
        ],
..
.. output truncated ..
..
       "RootFS": {
            "Type": "layers",
            "Layers": [
"sha256:d3cc7b6aa7bc15725c1a856ce06fe436da3fbccf0c9c06b04e45f79b3439c154",
"sha256:a7f421510691bf6a7b344d1efb738b3d343e252e7dde114a0dd86d432ef6000c",
"sha256:6ac2db160c6cf3dcf1aff0ced069aa98da28c50cff5cd3c8881c04f42e3ef1fe",
"sha256:7fe65049a2a940ab927d3f5b2cf0687ecffbdf9d7e9df1daaeddb83bc601f3cb",
"sha256:da1824686db37bbf1ffbffea53295aa853731531a14e70bca24eeb6d91fd6327",
"sha256:d410d4efd0e75456011f265fa113b206dd4da9dccf5151bca714ef6c69a3b8cd",
"sha256:60c4dab21dc337e912d518acc56e5b776e3de4f1d277d074831bb678089b87a6",
"sha256:d00057f8969283fed84044f6103036e18a9d776579d705a85472535ba321df25",
"sha256:2f42ce9d7b80a286af13410c0b64e94c90eca3e7597f7fd82a783aa1f68c2373",
"sha256:8408fed6a9d685236cb024ceea39692743b6c52ea6c4c068b22a6475f742e24a",
"sha256:336175ddf157a8f50c0aae8c0726b1462fd41f30f0b7f84caf6bf5cd02f8de77"
            ]
        },
..
.. output truncated ..
..
]

We’ll take a closer look at the specifics of Docker image storage in the next section.

Docker Image Storage

Docker Storage Drivers are the one which controls the storage of container’s writable layers and image layers on the Docker host machine. The writable layer of the container is ideal for storing the ephemeral data that is created during the runtime; however, it does not survive after container deletion. Further, the storage drivers are also knowledgeable about the mechanics of how these levels interact with one another and its arrangements using the manifest files.

Additionally, Docker supports a wide range of storage drivers, including overlay2, fuse-overlay2, btrfs, zfs, aufs, overlay, devicemapper, and vfs. Let’s obtain the storage driver information from the host machine using the docker info command:

$ docker info | grep -i "Storage Driver"
Storage Driver: overlay2

In Docker, the root directory is one that hosts the entire data of Docker images and containers. Let’s identify Docker’s root directory information on the host machine using the docker info command.

$ docker info | grep "Root Dir"
Docker Root Dir: /var/lib/docker

We’ve identified that the root directory of Docker is /var/lib/docker.

Image Internals

Every Docker image has a corresponding JSON structure that contains information on the image’s essential attributes, such as the date, the creator, and runtime configuration like entrypoint, networking, and volumes. Use docker inspect command to view the image attributes:

$ docker inspect alpine:latest
[
    {
      "Id": "sha256:9c6f0724472873bb50a2ae67a9e7adcb57673a183cea8b06eb778dca859181b5",
        "RepoTags": [
            "alpine:latest"
        ],
        "RepoDigests": [
         "alpine@sha256:bc41182d7ef5ffc53a40b044e725193bc10142a1243f395ee852a8d9730fc2ad"
        ],
        "Parent": "",

…
… output truncated …
…

       },
        "Metadata": {
            "LastTagTime": "0001-01-01T00:00:00Z"
        }
    }
]

It is an immutable JSON object that contains the historical information and a cryptographic hash of each image layer. Now let’s try to identify the image files and their contents in the root folder of the Docker Engine.

Docker Image Content View Workflow

Image ID Extraction

The ImageID in Docker is an SHA-256 hash of the image’s content. It also includes all the metadata, directories, and files associated with the image. It is typically used by the Docker engine to identify an image, despite its name or tag. This means that even if two images have the same name and tag, they will have different Image IDs if their contents are different.

To get the Image ID from the local repository, we use the docker images command:

$ docker images
REPOSITORY          TAG       IMAGE ID       CREATED         SIZE
mysql               latest    7484689f290f   2 days ago      538MB
ubuntu              22.04     a8780b506fa4   5 weeks ago     77.8MB

DiffID Identification

The Differential ID or DiffID is a unique identifier for all the Docker image layers. Each layer in a Docker image has its own DiffID, which is calculated based on the contents of the specific image layer. It is used to verify the integrity of an image. Also, when two Docker images share the same set of layers, they will have the same DiffIDs for those layers, which allows for efficient storage. We can use the docker inspect command to get the diffIDs of all layers under the RootFS section:

$ docker inspect 7484689f290f
[
..
.. output truncated ..
..
        "RootFS": {
            "Type": "layers",
            "Layers": [
"sha256:d3cc7b6aa7bc15725c1a856ce06fe436da3fbccf0c9c06b04e45f79b3439c154",
"sha256:a7f421510691bf6a7b344d1efb738b3d343e252e7dde114a0dd86d432ef6000c",
"sha256:6ac2db160c6cf3dcf1aff0ced069aa98da28c50cff5cd3c8881c04f42e3ef1fe",
"sha256:7fe65049a2a940ab927d3f5b2cf0687ecffbdf9d7e9df1daaeddb83bc601f3cb",
"sha256:da1824686db37bbf1ffbffea53295aa853731531a14e70bca24eeb6d91fd6327",
"sha256:d410d4efd0e75456011f265fa113b206dd4da9dccf5151bca714ef6c69a3b8cd",
"sha256:60c4dab21dc337e912d518acc56e5b776e3de4f1d277d074831bb678089b87a6",
"sha256:d00057f8969283fed84044f6103036e18a9d776579d705a85472535ba321df25",
"sha256:2f42ce9d7b80a286af13410c0b64e94c90eca3e7597f7fd82a783aa1f68c2373",
"sha256:8408fed6a9d685236cb024ceea39692743b6c52ea6c4c068b22a6475f742e24a",
"sha256:336175ddf157a8f50c0aae8c0726b1462fd41f30f0b7f84caf6bf5cd02f8de77"
            ]
        },
..
.. output truncated ..
..
]

ChainID Calculation

Calculation

The ChainID is calculated by concatenating the DiffIDs of the layers in the image and hashing the result using the SHA256 algorithm. It is also used to verify the integrity of an image during transmission. Using the SHA256 values for the current and previous layers to determine the chainID. The formula is shown below.

DiffID = ChainID if the layer is the lowest layer among other layers.

If not, ChainID(n) = sha256sum [DiffID(n-1), DiffID(n)]

Let’s calculate the ChainID for Layer 1:

ChainID(Layer 1) = DiffID(Layer 1)

ChainID(Layer 1) = d3cc7b6aa7bc15725c1a856ce06fe436da3fbccf0c9c06b04e45f79b3439c154

Let’s calculate the ChainID for Layer-2:

ChainID(Layer 2) = sha256sum [DiffID(Layer 1), DiffID(Layer 2)

DiffID(Layer 1) = d3cc7b6aa7bc15725c1a856ce06fe436da3fbccf0c9c06b04e45f79b3439c154

DiffID(Layer 2) = a7f421510691bf6a7b344d1efb738b3d343e252e7dde114a0dd86d432ef6000c


$ echo -n 'sha256:d3cc7b6aa7bc15725c1a856ce06fe436da3fbccf0c9c06b04e45f79b3439c154 sha256:a7f421510691bf6a7b344d1efb738b3d343e252e7dde114a0dd86d432ef6000c' | sha256sum

29bd3d7c6e1683e422776b9d3285e8a3f1272f07656fc63a941cb7729a169100

ChainID(Layer 2) = 29bd3d7c6e1683e422776b9d3285e8a3f1272f07656fc63a941cb7729a169100

Next, let’s calculate the ChainID for Layer 3:

ChainID(Layer 3) = sha256sum [DiffID(Layer 2), DiffID(Layer 3)

DiffID(Layer 2) = a7f421510691bf6a7b344d1efb738b3d343e252e7dde114a0dd86d432ef6000c

DiffID(Layer 3) = 6ac2db160c6cf3dcf1aff0ced069aa98da28c50cff5cd3c8881c04f42e3ef1fe


$ echo -n "sha256:a7f421510691bf6a7b344d1efb738b3d343e252e7dde114a0dd86d432ef6000c sha256:6ac2db160c6cf3dcf1aff0ced069aa98da28c50cff5cd3c8881c04f42e3ef1fe" | sha256sum

9b3abbf0ab6402c9bcb9cce411268ffe24573d790f0333d8ae06794313295dbd

ChainID(Layer 3) = 9b3abbf0ab6402c9bcb9cce411268ffe24573d790f0333d8ae06794313295dbd

CacheID

The CacheID is another unique identifier that is assigned to each layer of the image cache. The CacheID is used to determine whether a layer can be reused from the cache instead of being rebuilt during a subsequent build.

The cacheID facilitates retrieval of the real contents for each layer that are indexed. So using the calculated chainID, navigate to the /var/lib/docker/image/overlay2/layerdb/sha256/<ChainID>/ directory to obtain the content index known as cacheID. Also, the directory includes the parent information of the layer and its size.


$ cat /var/lib/docker/image/overlay2/layerdb/sha256/29bd3d7c6e1683e422776b9d3285e8a3f1272f07656fc63a941cb7729a169100/cache-id
5b05639e794f7b0074d8d622843f8816d9e78ac25b6f6f97c49dfda1a39ecd24

Lastly, let’s use the CacheID to navigate to the storage driver path [/var/lib/docker/overlay2/]. To access all the files and directories for that layer, navigate to the diff directory:


$ tree /var/lib/docker/overlay2/5b05639e794f7b0074d8d622843f8816d9e78ac25b6f6f97c49dfda1a39ecd24

/var/lib/docker/overlay2/5b05639e794f7b0074d8d622843f8816d9e78ac25b6f6f97c49dfda1a39ecd24
.
├── committed
├── diff
│   └── etc
│       ├── group
│       ├── group-
│       ├── gshadow
│       ├── gshadow-
│       ├── passwd
│       ├── passwd-
│       ├── shadow
│       └── shadow-
├── link
├── lower
└── work

To sum up, the Docker images are stored in layers, each of which is identified by a unique ID called a DiffID. These layers are tied together with a ChainID. The location of the Docker image files on the host machine depends on the storage driver used by Docker. By default, Docker uses the Overlay2 driver, which stores images in a directory called /var/lib/docker/overlay2. However, other storage drivers may use different directories or file systems.

Conclusion

In summary, understanding how Docker images are stored on the host machine is crucial for developers who work with Docker on a daily basis. They can also gain more insights on how Docker is handling image management internally.

With this in-depth understanding of how Docker images are saved on the host machines, developers can better manage their Docker environments and resolve any problems that may arise. Further, developers may enhance their Docker processes and fully utilize the strength and adaptability of Docker containerization.

And if you’re looking to further boost your Docker workflows, you might want to give Earthly a shot. It offers simpler, more reliable builds, enhancing your productivity and efficiency in managing containerized environments.

Earthly Cloud: Consistent, Fast Builds, Any CI
Consistent, repeatable builds across all environments. Advanced caching for faster builds. Easy integration with any CI. 6,000 build minutes per month included.

Get Started Free

Sriram Ramanujam %
Sriram Ramanujam
A seasoned Global Network & Security Automation Architect with 10+ years of rich industry experience in developing scalable solutions for network automation, engineering design, operation and telemetry. Realizing the projects from a technical concept, design, architecture leads to a full-fledged development, deployment and lifecycle management.
Writers at Earthly work closely with our talented editors to help them create high quality content. This article was edited by:
Bala Priya C %

Bala is a technical writer who enjoys creating long-form content. Her areas of interest include math and programming. She shares her learning with the developer community by authoring tutorials, how-to guides, and more.

Updated:

Published:

Get notified about new articles!
We won't send you spam. Unsubscribe at any time.