Using Kubernetes Persistent Volumes
Table of Contents
The article summarizes the complexities of managing Kubernetes storage. Earthly provides consistent and reproducible builds in any environment. Learn more about Earthly.
Kubernetes persistent volumes provide data storage for stateful applications. They abstract a storage system’s implementation from how it’s consumed by your pods. A persistent volume could store data locally, on a network share, or in a block storage volume provided by a cloud vendor.
Persistent volumes solve the challenges of storing persistent data, such as databases and logs, in Kubernetes. Containers running inside pods are stateless and have ephemeral filesystems. Although your applications can read and write files within their containers, any changes will be lost when the pod is restarted or terminated.
In this article, you’ll learn what persistent volumes are, why they’re important, and how you can get started using them in your cluster. You’ll also see some common management commands for interacting with persistent volumes using kubectl.
Why Persistent Volumes Matter
The ephemeral nature of container filesystems prevents you from spinning up a database server in a Kubernetes pod without a special configuration. Otherwise, you’d lose your data as soon as the pod restarted. You also wouldn’t be able to scale the database deployment, as all the files would be stored within a specific container.
Persistent volumes solve this issue. They’re built atop the simpler volume system, which provides a shared unit of storage that can be accessed by all the containers in a pod. Kubernetes can restore volumes after an individual container crashes and restarts.
Persistent volumes are a higher abstraction that completely decouples storage from the pods that use it. A persistent volume has its own lifecycle, stores data at the cluster or namespace level, and can be shared between multiple pods. Although persistent volumes are used by pods, they never belong to pods. The volume and its data will remain available in the cluster even after all pods that reference it are gone, allowing it to be reattached to new, future pods.
When To Use Persistent Volumes
You should use a persistent volume whenever you have data that needs to outlive individual pods. Unless the data is transitory or specific to a single container, it’s usually best stored in a persistent volume.
Here are some common use cases:
- Database storage: Data in a database should always be stored in a persistent volume so it persists beyond the containers that run the server. You don’t want to wipe your users’ data each time the pod restarts.
- Log storage: Writing container log files to a persistent volume ensures they’ll be available after a crash or termination. If they’re not written to a persistent volume, the crash will destroy the logs that could have helped you debug the issue.
- Protection of important data: Persistent volumes let you avoid accidental data deletion. They include safeguards that prohibit the removal of volumes that are actively used by pods.
- Data independent of pods: Persistent volumes make sense whenever your data is of primary importance in your cluster. They give you the tools to manage data independently of application containers, making it easier to handle backups, performance, and storage capacity allocations.
Creating a Persistent Volume
Persistent volumes may be created either statically or dynamically. A statically created volume means that the volume is manually added to your cluster before it’s used. A dynamically created volume occurs when a non-existing volume is referenced, causing it to be created automatically. You’ll now use kubectl to create a volume with the static method.
To start, you need a YAML file for your persistent volume:
apiVersion: v1
kind: PersistentVolume
metadata:
name: example-pv
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 1Gi
storageClassName: standard
volumeMode: Filesystem
This defines a simple persistent volume with a 1 Gi capacity. A few other configuration options are used to define how the volume is provisioned and accessed.
Access Modes
The accessModes
field defines which nodes and pods can access the volume:
ReadWriteOnce
means all the pods on a single mode are able to read and write data.ReadOnlyMany
andReadWriteMany
allow read-only or read-write access by all the pods across multiple nodes.ReadWriteOncePod
is a new option in Kubernetes v1.22 that permits read-write access by a single pod on a single node.
Volume Mode
A volumeMode
of Filesystem
is the default, and usually desired, behavior. It means the volume will be mounted into pods as a directory in each pod’s filesystem. The alternative value of Block
presents the volume as a raw block storage device without a pre-configured filesystem.
Storage Classes
The storageClassName
is the most important part of the persistent volume’s configuration. The storage classes you can use depend on your cluster’s hosting environment.
The standard
class shown here is available when you’re running your cluster on Google Kubernetes Engine (GKE). You can use azurefile-csi
for clusters on Microsoft Azure Kubernetes Service (AKS) or do-block-storage
with DigitalOcean Managed Kubernetes. If you’re running your own cluster, you can set up a storage class that uses your local disks when you provision your installation. Trying to use a storage class that’s not available in your environment will cause an error when you create your persistent volume.
Adding Your Volume to Your Cluster
Use kubectl to add your new persistent volume to your cluster:
kubectl apply -f pv.yaml $
When running this command, you might see the following error message:
The PersistentVolume "example-pv" is invalid: spec:
Required value: must specify a volume type
This usually occurs when the underlying storage class uses a provisioner to create your storage. The cloud provider is avoiding allocating storage that’s not actively used in your cluster. If this happens, you should use dynamic volume creation to automatically create a persistent volume at the time it’s used. This is covered in the next section.
Linking Persistent Volumes to Pods
Persistent volumes are linked to pods by means of a persistent volume claim. A claim represents a pod’s request to read and write files within a particular volume.
Persistent volume claims are stand-alone objects. Here’s what it looks like to claim the example volume created earlier:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: example-pvc
spec:
storageClassName: ""
volumeName: example-pv
The volumeName
field references the previously created persistent volume. When you link this claim to a pod, the pod will receive access to the example-pv
volume. The empty storageClassName
field is intentional and causes the claim to use the storage class set within the persistent volume’s definition.
Persistent volume claims may implicitly create new volumes instead of referencing existing ones. You should supply the volume’s details as part of the claim’s spec
. Following is the dynamic volume creation method mentioned earlier:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: example-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: standard
The claim now has accessModes
and storageClassName
fields to configure the volume that’ll be created. The volume’s capacity is defined via the resources.requests.storage
field. Please note that this is a slightly different format from a stand-alone persistent volume.
Apply the claim to your cluster using kubectl:
kubectl apply -f pvc.yaml
$
persistentvolumeclaim/example-pvc created
Provided that you’ve specified a storage class that’s available in your cluster, the claim creation should succeed, even if the stand-alone volume creation failed with an error. The storage class will dynamically provision a new persistent volume that satisfies the claim.
Finally, you can link the claim to your pods using the volumes
and volumeMount
fields in the pod manifest:
apiVersion: v1
kind: Pod
metadata:
name: pod-with-pvc
spec:
containers:
- name: pvc-container
image: nginx:latest
volumeMounts:
- mountPath: /pv-mount
name: pv
volumes:
- name: pv
persistentVolumeClaim:
claimName: example-pvc
Then add the pod to your cluster:
kubectl apply -f pvc-pod.yaml
$
pod/pod-with-pvc created
The persistent volume claim is referenced by the pod’s spec.volumes
field. This sets up a pod volume called pv
, which can be included in the containers
section of the manifest and is mounted to /pv-mount
. Files written to this directory in the container will be stored in the persistent volume, letting them outlive the individual container instances.
Demonstrating Persistence
You can verify this behavior with a quick example.
Get a shell to the pod created earlier:
kubectl exec --stdin --tty pod-with-pvc -- sh $
Now write a file to the /pv-mount
directory, which the persistent volume was mounted to:
echo "This file is persisted" > /pv-mount/demo $
Then detach from the container:
exit $
Use kubectl to delete the pod:
kubectl delete pods/pod-with-pvc
$
pod "pod-with-pvc" deleted
Recreate the pod by applying its YAML manifest again:
kubectl apply -f pvc-pod.yaml
$
pod/pod-with-pvc created
Get a shell to the container in the new pod and read the file from /pv-mount/demo
:
kubectl exec --stdin --tty pod-with-pvc -- sh
$ cat /pv-mount/demo
$
This file is persisted
The content of the persistent volume was not affected by the first pod’s deletion. It can be remounted into new pods at any time, preserving everything that’s been previously written.
Managing Persistent Volumes With kubectl
You can retrieve a list of your persistent volumes using kubectl:
kubectl get pv
$
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-f90a46bd-fac0-4cb5-b020-18b3e74dd3b6 1Gi RWO Delete Bound pv-demo/example-pvc do-block-storage 7m52s
Similarly, you can view all your persistent volume claims:
kubectl get pvc
$
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
example-pvc Bound pvc-f90a46bd-fac0-4cb5-b020-18b3e74dd3b6 1Gi RWO do-block-storage 9m
If a volume or claim shows a Pending status, it’s usually because the storage class is still provisioning storage for the volume. You can check what’s holding up the process by using the describe
command to view the object’s event history:
kubectl describe pvc example-pvc
$
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Provisioning 9m30s dobs.csi.digitalocean.com_master_68ea6d30-36fe-4f9f-9161-0db299cb0a9c External provisioner is provisioning volume for claim "pv-demo/example-pvc"
Normal ProvisioningSucceeded 9m24s dobs.csi.digitalocean.com_master_68ea6d30-36fe-4f9f-9161-0db299cb0a9c Successfully provisioned volume pvc-f90a46bd-fac0-4cb5-b020-18b3e74dd3b6
To edit your volumes and claims, it’s usually best to modify your YAML file and reapply it to your cluster with kubectl:
kubectl apply -f changed-file.yaml $
This uses the Kubernetes declarative API model to automatically detect and apply the changes you made. If you’d prefer to use imperative commands, run the edit
command to open the object’s YAML in your editor. Changes will be applied when you save and close the file:
kubectl edit pvc example-pvc $
It’s not possible to change volume properties, such as access mode and storage class. Other fields, like the volume’s capacity, are implementation-dependent: most major storage classes support dynamic resizes, but this isn’t universal. You should consult your Kubernetes provider’s documentation if in doubt.
Don’t manually edit dynamically created persistent volume objects by adding a persistent volume claim. Edit the properties on the claim instead.
To remove a volume or a claim, use the delete
command:
kubectl delete pvc example-pvc
$
persistentvolumeclaim "example-pvc" deleted
This will empty and remove the storage that was provisioned by your provider. The data inside the volume will be non-recoverable unless separate backups have been made. Don’t delete volumes that were dynamically provisioned by a storage class: as with edits and creations, you should interact with the claim they were created for. The storage class will handle the persistent volume object for you.
Conclusion
Persistent volumes in Kubernetes enable data storage independent of pods, interfacing with various types of storage through storage classes. This tutorial familiarized you with persistent volume use cases, their distinction from regular volumes, and their implementation inside a Kubernetes cluster. Additionally, you learned about kubectl commands to interact with volumes, allowing seamless running of stateful applications in Kubernetes without data loss post-container restarts.
As you continue to explore and enhance your Kubernetes workflows, you might want to give Earthly, the efficient build automation tool, a shot. It could be a valuable addition to your development toolkit.
Earthly Cloud: Consistent, Fast Builds, Any CI
Consistent, repeatable builds across all environments. Advanced caching for faster builds. Easy integration with any CI. 6,000 build minutes per month included.