Disclaimer: while CSI is a vendor-agnostic interface, this article will mainly focus on AWS and GCP implementations specifically, since these are the platforms on which we’ve had to conduct migration to CSI.
Introduction
For volume management, Kubernetes originally shipped with in-tree drivers for various cloud providers, interfacing with the underlying platform on which the cluster runs. Those drivers made it possible to abstract away disk operations on the cloud provider (create, resize, destroy, etc.) via Kubernetes’ PersistentVolume API.
Incorporating drivers into the Kubernetes tree brought complications for vendors, as they would have to align with the orchestrator’s release schedule, and for Kubernetes maintainers working to integrate, maintain, and test the vendor code. In an effort to give vendors more autonomy, the community created the CSI specification (short for Container Storage Interface). The specification describes a common API to which a driver needs to adhere to allow Kubernetes to communicate with it directly. This allows vendors to write and publish drivers for their own platform without having to merge them into Kubernetes itself.
CSI drivers first appeared in Kubernetes 1.14, and the pace started to pick up from Kubernetes 1.17 on with the graduation of the first drivers to beta (AWS EBS and GCE PD), and the deprecation of in-tree drivers. The feature, however, remains behind a feature gate until Kubernetes 1.23, and becomes enabled by default from this release forward.
We are continuously supporting our customers by maintaining their clusters and strive to keep up with Kubernetes releases, thus with the upgrade to Kubernetes 1.23 around the corner we needed to plan migrating volumes on the clusters we manage from the in-tree volume driver to the CSI driver.
Getting (most of) the job done: using Kubernetes CSI migration feature
Since the CSI specification radically changes how volume management works on Kubernetes, its implementation was accompanied by the introduction of a PersistentVolume specification addition, representing disks provisioned by a CSI driver. However, PVs are largely immutable objects, to ensure data integrity and limit the risk factor of undesirable changes. As a result, this new CSI PV spec is only available to newly provisioned disks. Existing PV objects will remain under the in-tree, vendor-specific driver spec. To ensure backwards compatibility with those existing PVs, Kubernetes’ storage SIG came up with a feature gate called CSIMigration, which mainly replaces in-tree drivers with shims, rerouting most disk operations logic to the CSI driver installed in the cluster.
With the in-tree drivers deprecated and scheduled for removal, we needed a plan to migrate to CSI. The CSIMigration feature gate was offered on a silver plate, so we decided to opt in while doing routine Kubernetes upgrades on the clusters we manage, making the switch as transparent as possible.
Our initial attempt to use the feature failed, due to an incorrect combination of feature flags on Kubernetes components which ended up disabling the deprecated in-tree drivers without enabling the CSI reroute shims; our understanding at first was that the feature flag responsible for disabling in-tree drivers would also enable the reroute logic. This turned out not to be the case, as a separate feature flag was required for the latter.
This mishap aside, using the CSI migration feature was pretty straightforward, and we integrated this change within a routine Kubernetes upgrade just as we had originally planned. Once enabled, the provisioning, attaching and resizing operations were effectively handled by the CSI driver.
Going the extra mile: moving to true CSI volumes
With the feature gate enabled, we had achieved CSI functionality for most volume operations: attach/detach, provision and resize. However, the specification also brings support for volume snapshots. That functionality fell outside of the scope of the compatibility layer, meaning that pre-CSI PVs cannot be snapshotted by the CSI driver. We would have needed to retain the third-party tool we previously used next to the native feature.
We thought this compromise was not satisfactory, because moving away from third-party tooling for snapshots was one of our goals for this migration. The only way to fully utilize CSI features was to have all PVs be defined with the CSI spec, eliminating the need to resort to the lackluster compatibility layer. This raised a new problem: as previously mentioned, PVs are largely immutable objects, and it is not possible to simply replace the in-tree driver spec with the CSI one without any modification to the underlying disk on the cloud provider. There weren’t many possible solutions aside from recreating PVs and underlying disks, so we began working on a plan to do that, although without any experience with such operations in this particular context.
We initially organized our solution around pv-migrate, wrapping it with some homebrew shell scripting, to implement the following:
- scale down
deployment
/statefulset
(if applicable) - set the PV reclaim policy to
Retain
- remove PV’s
claimRef
- rename the PVC by deleting it and recreating it with a different name to indicate a pre-migration object
- have the cluster administrator create a PVC clone, that will cause Kubernetes and the CSI provisioner to create a new PV with true CSI specification and a new disk on the cloud provider, respectively
- invoke
pv-migrate
to copy data from the old PV to the new one - scale up
deployment
/statefulset
(if applicable)
While this solution got the job done for most volumes, it has a few drawbacks:
- the application must be able to work with plain file and directory storage, any binary or application-specific storage ends up in undefined behaviour
- the process of copying individual files (
pv-migrate
usesrsync
under the hood) can be painfully slow if there are numerous small files
During conversion of existing PVs to CSI, we reached a point where we had to work with applications not using plain file storage, such as database engines. Copying that data using pv-migrate
/rsync
would often result in data loss, so we needed another method to transfer data to new volumes. Instead of relying on the provisioning of volumes on the cloud platform triggered by the creation of Kubernetes PVs, we began cloning cloud volumes directly and crafting PVs with CSI specification that would attach to cloned volumes, as well as corresponding PVCs. This method offered better data integrity guarantees, as it copies data at block level, and tends to be faster overall, a double-win! Eventually, we ended up using the disk cloning method for every remaining volume, even those using plain file storage.
Wrapping up
At this point, we reached the final stage of our migration plan. After correcting our route twice, the only remaining task was actually running the operation on every environment we manage. At the end of the day, we managed to convert all volumes to the CSI spec, giving us access to all CSI features including volume snapshotting, which is a good step in one of the many ways for future-proofing clusters: we were indeed able to move away from unmaintained k8s-snapshots
to tooling leveraging CSI snapshot feature.
Tags: AWS, cloud, containers, csi, disks, gcp, kubernetes, migration, snapshots, volumes