How to Use Linux Namespaces and cgroups to Control Docker Performance

19 minute read     Updated:

Avi Singh %
Avi Singh

This article discusses how docker uses Linux namespaces. Earthly uses namespaces to make your build reproducible. Check it out.

Docker is a popular containerization solution for packaging, distributing, and running applications in lightweight environments. However, with growing container density and workload variety comes increased pressure to control container performance. Thankfully, Linux offers powerful tools, including namespaces and control groups (cgroups), that enable fine-grained resource allocation and guarantee the optimal performance of each container.

In this article, you’ll learn more about namespaces and cgroups and how to use them to control Docker performance.

Understanding Linux Namespaces

Linux namespaces provide a mechanism for isolating system resources, enabling processes within a namespace to have their own view of the system, such as process IDs, network interfaces, and file systems. Docker uses namespaces to create isolated containers, each with its own set of resources. This ensures application separation and security.

The following are a few different types of namespaces:

Namespaces in Linux provide a way to isolate and virtualize system resources, thus enhancing security by preventing processes in one namespace from directly interacting with processes in another namespace.

Namespaces increase security by providing a level of isolation that prevents unintended interactions between processes. This isolation is particularly valuable in containerization and virtualization scenarios, where multiple applications or services share the same host system but must be kept separate for security reasons.

Understanding cgroups

cgroups are a Linux kernel feature that enable the management and partitioning of system resources by controlling the resources for a collection of processes. Administrators can use cgroups to allocate resources, set limits, and prioritize processes. Docker utilizes cgroups to control and limit the resources available to containers.

Different types of available cgroups include CPU cgroup, memory cgroup, block I/O cgroup, and device cgroup.

While cgroups are not explicitly designed for security, they play a crucial role in controlling and monitoring the resource usage of processes.

Although namespaces and cgroups may appear similar in definition, they are fundamentally different and serve different purposes. Namespaces perform isolation by creating separate environments for processes that prevent one process from accessing or affecting other processes and/or the system. In contrast, cgroups distribute and limit resources like CPU, memory, and I/O among groups of processes. Often, namespaces and cgroups are used together for process isolation and resource management.

Now that you know more about namespaces and cgroups, it’s time to learn how to use them to control Docker performance.

All the code for this article is available in this GitHub repo.

How to Use Namespaces to Control Docker Performance

To decrease a container’s attack surface, Docker offers the --user option. The default user inside the containers is the root user. With the --user flag, you can specify a non-root user and group to run the first process in containers. This lets you limit the potential impact of any security vulnerabilities.

Using the --user option may hinder application functionality since some applications require elevated privileges to operate efficiently. When this occurs, your container configuration or app configuration may need to be altered accordingly to provide those permissions and grant the required privileges.

This is where a user namespace can help. By default, Docker runs containers with identical user and group IDs as their host system, which means that if an attacker gains entry through any one container in the system, they could potentially escalate their privileges. User namespaces can help mitigate such attacks by remapping the user IDs (UIDs) and group IDs (GIDs) used in the inside container with the outside container.

To better understand this concept, let’s use the user namespace to isolate the containers for security purposes.

How to Use namespaces With Docker

In this scenario, you’ll learn about some of the advantages of namespaces in Docker. Run the following command to create a file in the host machine and make it readable to only the root user:

echo "This is a super sensitive file" | sudo tee /secret
sudo chmod 600 /secret

Let’s pretend that the file /secret is a very sensitive file that should never be accessed by anyone other than the root. But, what happens when you mount the host filesystem in a Docker container? Let’s find out. Run the following command to start a busybox container and mount the host filesystem to it:

sudo docker run -it --rm -v /:/host busybox /bin/sh

Then, try to access the sensitive file:

$ cat /host/secret
This is a super sensitive file

As you can see, the root user in the Docker container has UID 0. When this user (with UID 0) tries to access the file in the host filesystem owned by the root user (who also has UID 0) the system happily obliges. But, this is a huge security risk as the Docker container can now freely read or modify any file on the host machine.

To prevent this, you need to make use of namespaces.

First, make sure that your Linux kernel supports user namespaces. To do so, find the configuration file for your kernel and use grep to search for CONFIG_USER_NS. The following command is an example of the Ubuntu kernel:

grep -E '^CONFIG_USER_NS=' /boot/config-$(uname -r)

The following output confirms that your kernel supports user namespaces:

CONFIG_USER_NS=y

Next, open or create the Docker daemon configuration file (usually located at /etc/docker/daemon.json) and add the following configuration that sets up user namespaces with default remapping:

{
  "userns-remap": "default"
}

Restart the Docker daemon to apply the changes:

sudo systemctl restart docker

Run the container again:

sudo docker run -it --rm -v /:/host busybox /bin/sh

Try to read the file:

$ cat /host/secret

This time, you’ll be faced with a Permission denied error.

To better understand what is happening behind the scenes, execute the following command in interactive mode to run a Docker container:

docker run -it nginx sleep 300

Your output will look like this:

latest: Pulling from library/nginx
af107e978371: Pull complete 
336ba1f05c3e: Pull complete 
8c37d2ff6efa: Pull complete 
51d6357098de: Pull complete 
782f1ecce57d: Pull complete 
5e99d351b073: Pull complete 
7b73345df136: Pull complete 
Digest: sha256:2bdc49f2f8ae8d8dc50ed00f2ee56d00385c6f8bc8a8b320d0a294d9e3b49026
Status: Downloaded newer image for nginx:latest

After running the container, check to see if the container is up and running:

docker ps

In this scenario, you’re running an Nginx container and executing the sleep command inside it.

Run the following command to list all currently running processes and filter the results to only show lines containing the word “sleep”:

ps -aux | grep sleep

The ps command is used to provide information about processes.

Your output will look like this:


osboxes     7216  0.0  0.4 1329172 24576 pts/0   Sl+  13:39   0:00 docker run -it nginx sleep 300
231072      7279  0.0  0.0   2484  1280 pts/0    Ss+  13:39   0:00 sleep 300
osboxes     7329  0.0  0.0  17732  2560 pts/1    S+   13:40   0:00 grep --color=auto sleep

Observe that one sleep 300 process shows up in the list of processes. It is owned by the user with the UID 231072 (this UID may be different for you). Where is this user coming from? Use the following command to look at the file /etc/subuid:

cat /etc/subuid

Your output should look like this:

osboxes:100000:65536
ansible:165536:65536
dockremap:231072:65536

What this tells us is that Docker creates a default user named dockremap with host UID 231072. This user is mapped to the UID 0 inside Docker containers. Any process started by the root user in the container is owned by the UID 231072 on the host, thus protecting it from privilege escalation.

Next, use the docker info command to verify that user namespace support is enabled correctly:

sudo docker info

Your output should look like this:

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.10.4
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.17.3
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 2
  Running: 0
  Paused: 0
  Stopped: 2
 Images: 1
 Server Version: 23.0.6
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 3dce8eb055cbb6872793272b4f20ed16117344f8
 runc version: v1.1.7-0-g860f061
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  userns
  cgroupns
 Kernel Version: 6.2.0-26-generic
 Operating System: Ubuntu 22.04 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 5
 Total Memory: 5.744GiB
 Name: osboxes
 ID: 80a2b682-9225-423a-bd2e-a0a3c61e8cf0
 Docker Root Dir: /var/lib/docker/231072.231072
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

The numbers at the end of the Docker Root Dir line indicate that the daemon runs inside a user namespace. The numbers should match the subordinate user ID of the dockremap user as defined in the /etc/subuid file.

How to Use cgroups to Control Docker Container Resources

Now that you know how you can use namespaces to increase security, let’s use cgroups to configure resource limitations. In this example, you’ll run a Docker container with CPU limits.

Let’s take a look at what cgroups are set up when you run a container. But, before that, you need to know if your system is using cgroup v1 or v2. The easiest way to find that out is to look for the file /sys/fs/cgroup/cgroup.controllers. If it exists, you’re using cgroup v2, otherwise, you’re using cgroup v1.

Run the nginx container and note the container ID from the output:

$ docker run -d nginx

To find the cgroups for this container, you’ll need to look into the following locations, based on the cgroup version and the cgroup driver:

  • /sys/fs/cgroup/memory/docker/<container_id>/ on cgroup v1, cgroupfs driver (default)
  • /sys/fs/cgroup/memory/system.slice/docker-<container_id>.scope/ on cgroup v1, systemd driver
  • /sys/fs/cgroup/docker/<container_id>/ on cgroup v2, cgroupfs driver
  • /sys/fs/cgroup/system.slice/docker-<container_id>.scope/ on cgroup v2, systemd driver (default)

Here, you can find out different metrics for the container. For example, you can read the max CPU allocated to this container by reading the cpu.max file in this directory:

max 100000

This implies that this container is allowed to consume the maximum available CPU on the host. Let’s limit it to half of one CPU. First, kill the container and recreate it with the --cpus option:

docker run --cpus 0.5 -d nginx

The cpu.max file should show the following:

50000 100000

This shows that the container is only allowed 0.5 CPUs. You can adjust the --cpus value according to your desired CPU utilization.

If you want to run a Docker container with memory limits using cgroups, you can use the --memory option with the docker run command. The following example uses the official Nginx image from Docker Hub:

docker run -d --name new-container --memory=256M ubuntu sleep infinity

In this command, -d makes the container run in the background, and --name new-container assigns a name to the container. --memory 256M limits the container to use a maximum of 256 mebibytes of memory, and ubuntu is the image that’s being used. sleep infinity tells the Docker command to run.

Now, run docker ps. You should see that a container named “new-container” is up and running:


CONTAINER ID   IMAGE     COMMAND            CREATED          STATUS          PORTS     NAMES
d67613c74bd6   ubuntu    "sleep infinity"   26 seconds ago   Up 25 seconds             new-container

To verify that the memory limits are applied, run the following docker stats command:

docker stats d67613c74bd6

Your output will look like this:


CONTAINER ID   NAME            CPU %     MEM USAGE / LIMIT   MEM %     NET I/O       BLOCK I/O   PIDS
d67613c74bd6   new-container   0.00%     388KiB / 256MiB     0.15%     3.42kB / 0B   0B / 0B     1

You can see the memory limit for this container is now set to 256 MiB.

To get real-time statistics for the running container, including CPU and memory usage, run the following command:

docker stats new-container --no-stream --format "  {{ json . }}" \
| python3 -m json.tool

Your output will look like this:

{
    "BlockIO": "0B / 0B",
    "CPUPerc": "0.00%",
    "Container": "new-container",
    "ID": "d67613c74bd6",
    "MemPerc": "0.15%",
    "MemUsage": "388KiB / 256MiB",
    "Name": "new-container",
    "NetIO": "3.6kB / 0B",
    "PIDs": "1"
}

In this output, you can see that Docker has applied a memory limit of 256 MiB in the "MemUsage" field.

Conclusion

Mastering Linux namespaces and cgroups is essential for optimizing Docker performance. With the help of these features, administrators can fine-tune resource allocation, enhance security, and ensure the smooth operation of containerized applications.

As the landscape of containerization continues to evolve, a solid grasp of Linux namespaces and cgroups empowers users to harness the full potential of Docker and deliver high-performance, scalable applications. In this article, you learned how namespaces and cgroups are used by Docker, and how you can utilize namespaces for container isolation and cgroups for limiting and monitoring resource usage.

Earthly Cloud: Consistent, Fast Builds, Any CI
Consistent, repeatable builds across all environments. Advanced caching for faster builds. Easy integration with any CI. 6,000 build minutes per month included.

Get Started Free

Avi Singh %
Avi Singh
Avi Singh is a Cloud, DevOps, Cybersecurity, and Big Data Expert with nine years of Industry experience. He has expertise in cloud AWS, Azure, and GCP, as well as DevOps tools and cybersecurity techniques, including ethical hacking and defense against various cyber threats. He also has significant experience in Big Data technologies such as Hadoop and Apache Kafka. Avi creates tutorials and articles focusing on trending technologies.

Published:

Get notified about new articles!
We won't send you spam. Unsubscribe at any time.