2.4 Kubernetes Storage Demystified: Volumes and Persistent Storage

Kubernetes Storage Demystified: Volumes and Persistent Storage

Welcome back to our Kubernetes journey! Today, we’re diving into a crucial aspect of running applications in Kubernetes: storage. Understanding how Kubernetes handles storage is essential for building stateful applications that need to retain data even when containers restart or pods are rescheduled.

This post will break down the core concepts of Volumes and Persistent Volumes in Kubernetes, using simple language and practical examples. Whether you’re a beginner just starting with Kubernetes or an intermediate user looking to solidify your understanding, this guide will clarify how your applications can effectively manage data within your clusters.

Why is Storage Different in Kubernetes?

Before we dive into the specifics, it’s important to understand why traditional storage methods aren’t a perfect fit for Kubernetes. Kubernetes is designed to be dynamic and ephemeral. Pods, which are the smallest deployable units in Kubernetes, can be created, destroyed, and moved across nodes as needed.

If your application stores data directly within a pod’s filesystem, that data is tied to the lifecycle of that pod. If the pod dies or is moved, your data is lost! This is where Kubernetes Volumes come into play.

Kubernetes Volumes: Giving Pods Storage

Think of a Volume as a directory that’s accessible to all the containers within a pod. It provides a way to share data between containers in the same pod and allows data to persist across container restarts within that pod.

Here’s what you need to know about Volumes:

Lifecycle Tied to the Pod: Volumes are created when a pod is created and exist as long as the pod exists. When the pod is deleted, the volume and its data are also typically deleted.
Multiple Types: Kubernetes supports various types of volumes, each with different characteristics and underlying storage mechanisms. Some common types include:
- emptyDir: A temporary directory that lives as long as the pod. It’s useful for scratch space or sharing data during a pod’s lifecycle.
- hostPath: Mounts a file or directory from the host node’s filesystem into the pod. This can be useful for accessing node-specific resources but can also introduce portability challenges as the storage is tied to a specific node.
- configMap and secret: These allow you to inject configuration data and sensitive information into your pods as files in a volume.
- Network-based storage (e.g., NFS, GlusterFS): These volume types allow you to mount storage from network file systems.

Example: Using an emptyDir Volume

Let’s say you have a pod with two containers: one writes log files, and the other processes them. You can use an emptyDir volume to share these log files:

apiVersion: v1
kind: Pod
metadata:
name: log-processing
spec:
containers:
– name: log-writer
image: your-log-writer-image
volumeMounts:
– name: shared-logs
mountPath: /var/log
– name: log-processor
image: your-log-processor-image
volumeMounts:
– name: shared-logs
mountPath: /data/logs
volumes:
– name: shared-logs
emptyDir: {}

In this example, both containers have access to the /var/log (in log-writer) and /data/logs (in log-processor) directories, which are backed by the same emptyDir volume named shared-logs.

Persistent Volumes (PVs): Storage That Outlives Pods

While Volumes are great for sharing data within a pod, they don’t solve the problem of data persistence across pod deletions or rescheduling. This is where Persistent Volumes (PVs) come into the picture.

Think of a Persistent Volume as a piece of storage in your cluster that has been provisioned by an administrator (or dynamically provisioned). It has a lifecycle independent of any individual pod.

Here’s the key takeaway:

Independent Lifecycle: PVs exist regardless of the pods that are using them. When a pod using a PV is deleted, the PV and its data remain.
PersistentVolumeClaim (PVC): Pods don’t directly interact with PVs. Instead, they create a PersistentVolumeClaim (PVC), which is a request for a specific amount of storage with certain access modes (e.g., read-write once, read-only many). Kubernetes then finds a matching PV (or dynamically provisions one) and binds it to the PVC.
Abstraction: PVs abstract the underlying storage infrastructure from the application. Developers don’t need to know the specifics of the storage hardware; they just request the storage they need through a PVC.

How Persistent Volumes Work (Simplified):

Provisioning: An administrator (or a dynamic provisioner) creates PVs with specific sizes and access modes.
Claiming: A user creates a PVC requesting a certain amount of storage and desired access modes.
Binding: Kubernetes finds a PV that matches the PVC’s requirements and binds them together.
Mounting: A pod can then mount the PVC as a volume, gaining access to the underlying persistent storage.

Example: Using a Persistent Volume

Let’s say you want to deploy a database that needs persistent storage.

1. Create a Persistent Volume (This would typically be done by an administrator):

apiVersion: v1
kind: PersistentVolume
metadata:
name: my-pv
spec:
capacity:
storage: 10Gi
accessModes:
– ReadWriteOnce
hostPath:
path: /data/my-pv # For demonstration purposes – in a real setup, this would be a cloud provider volume, NFS share, etc.

2. Create a PersistentVolumeClaim:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-pvc
spec:
accessModes:
– ReadWriteOnce
resources:
requests:
storage: 5Gi

3. Create a Pod that uses the PVC:

apiVersion: v1
kind: Pod
metadata:
name: my-database
spec:
containers:
– name: postgres
image: postgres:latest
volumeMounts:
– name: data-volume
mountPath: /var/lib/postgresql/data
volumes:
– name: data-volume
persistentVolumeClaim:
claimName: my-pvc

In this example, the my-pvc will be bound to the my-pv (if it meets the requirements). The my-database pod will then mount the storage provided by the PV at /var/lib/postgresql/data, ensuring that the database’s data persists even if the pod is restarted or rescheduled.

Key Takeaways

Volumes provide storage within the lifecycle of a pod, allowing data sharing between containers in the same pod and persistence across container restarts within that pod.
Persistent Volumes (PVs) provide durable storage in your Kubernetes cluster with a lifecycle independent of pods.
PersistentVolumeClaims (PVCs) are requests for PVs by pods. Kubernetes matches PVCs with suitable PVs.
Understanding Volumes and PVs is crucial for deploying stateful applications in Kubernetes that require data persistence.

Further Exploration

This post provides a foundational understanding of Kubernetes storage. To delve deeper, consider exploring:

Storage Classes: Learn how to dynamically provision Persistent Volumes based on defined classes.
Access Modes: Understand the different access modes for PVs (ReadWriteOnce, ReadOnlyMany, ReadWriteMany) and when to use them.
Volume Plugins: Explore the various volume plugins available in Kubernetes for different storage providers.

Mastering Kubernetes storage is a significant step towards running robust and reliable applications in your clusters. Keep experimenting and exploring!