Preventing ClickHouse Storage Deletion with the Altinity Kubernetes Operator reclaimPolicy

The first rule of managing data is simple: don’t lose it. Kubernetes can spin up complex configurations in minutes from a single resource file. Kubernetes can also take data away just as quickly. That could be bad if you do it by mistake. How can we prevent such a calamity?

This blog introduces an important feature of the Altinity Kubernetes Operator for ClickHouse: the reclaimPolicy: Retain property. It prevents persistent volume claims (requests for storage) from being deleted when you delete the ClickHouse cluster. You add it to cluster resource definitions like the following example.

    volumeClaimTemplates:
      - name: storage-vc-template
        reclaimPolicy: Retain
        spec:
          accessModes:
    . . .

The rest of this article provides the full story. The examples use Kubernetes 1.26 on AWS EKS with Altinity Kubernetes Operator for ClickHouse version 0.20.3. They should work on all recent Kubernetes distributions and operator versions. You can find out more about installing the operator here.

Introducing The Storage Deletion Problem

Let’s show how storage deletion can occur accidentally. The following resource definition creates a one-node ClickHouse cluster with a single storage volume

apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
  name: "ch"
spec:
  configuration:
    clusters:
      - name: "simple"
        layout:
          shardsCount: 1
          replicasCount: 1
        templates:
          podTemplate: altinity-stable
          volumeClaimTemplate: storage-vc-template
  templates:
    podTemplates:
      - name: altinity-stable
        spec:
          containers:
          - name: clickhouse
            image: altinity/clickhouse-server:22.8.15.25.altinitystable
    volumeClaimTemplates:
      - name: storage-vc-template
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 10Gi

Put the resource definition in file retain-demo.yaml. We can then load the cluster definition using kubectl apply. 

kubectl apply -f retain-demo.yaml

Here are the resources that Kubernetes creates to implement our cluster, showing how they map to physical processes and storage. 

So far so good. Let’s have a quick look at the PVC and PV using kubectl. We will use a customized form of the kubectl get pvc command to look at the most important properties of the persistent volume claim.

$ kubectl -n test get pvc -o custom-columns="NAME:.metadata.name,\
 POLICY:.metadata.labels.clickhouse\.altinity\.com\/reclaimPolicy,\
 SIZE:.spec.resources.requests.storage,\
 PV:.spec.volumeName"
NAME                                       POLICY    SIZE    PV
storage-vc-template-chi-ch-simple-0-0-0   <none>    10Gi    pvc-90f16e13-9e83-4a78-9c10-06cdd0b1b9db

The POLICY heading is empty. The setting <none> shows that the operator is not protecting storage from accidental deletion. We’ll return to that topic shortly. 

Next, let’s show the persistent volume. We’ll again customize the output columns to see the most interesting parts for this example. 

$ kubectl -n test get pv/pvc-90f16e13-9e83-4a78-9c10-06cdd0b1b9db \
 -o custom-columns="NAME:.metadata.name,\
 STATUS:.status.phase,SIZE:.spec.capacity.storage,\
 CLASS:.spec.storageClassName"
NAME                                        STATUS   SIZE    CLASS
pvc-90f16e13-9e83-4a78-9c10-06cdd0b1b9db   Bound     10Gi   gp2

OK, storage is there and looks correct, including allocated size and AWS EBS gp2 storage type. Now let’s delete the cluster. There are a couple of ways to do this, but the easiest is the following. 

$ kubectl delete chi/ch
clickhouseinstallation.clickhouse.altinity.com "ch" deleted

Let’s check storage again. 

$ kubectl get pvc
No resources found in default namespace.
$ kubectl get pv/pvc-90f16e13-9e83-4a78-9c10-06cdd0b1b9db
Error from server (NotFound): persistentvolumes "pvc-90f16e13-9e83-4a78-9c10-06cdd0b1b9db" not found

Data gone! It would not matter if the cluster had contained many shards and replicas. Upon deletion, all PVCs and PVs vaporize. 

Introducing reclaimPolicy Properties For ClickHouse Storage

The designers of Kubernetes anticipated accidents and introduced the notion of a reclaimPolicy property for persistent volumes. When set to Retain, Kubernetes will not drop the volume even if the PVC that requested it is deleted. Unfortunately, this approach turns out to be complex to manage. See here and here for gory details of recovering volumes.

We like the reclaimPolicy concept but not the complexity. The Altinity Operator therefore introduced the same property but raised it to the PVC level. To protect ClickHouse storage, just add reclaimPolicy: Retain in the volume claim template section of the cluster resource definition like the example below.

    volumeClaimTemplates:
      - name: storage-vc-template
        reclaimPolicy: Retain
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 10Gi

We apply the resource definition as before. 

kubectl apply -f retain-demo.yaml

Once the cluster starts, we can access it using the kubectl exec command. Let’s add a table so there is some data to “lose” if things go wrong. 

kubectl exec -it chi-ch-simple-0-0-0 -- clickhouse-client
. . .
chi-ch-simple-0-0-0.chi-ch-simple-0-0.test.svc.cluster.local :) CREATE TABLE foo (id UInt32) engine=Log

CREATE TABLE foo
(
    `id` UInt32
)
ENGINE = Log

chi-ch-simple-0-0-0.chi-ch-simple-0-0.test.svc.cluster.local :) exit

Finally, let’s check the PVC definition using the special columns we used previously. Note that the POLICY column now has a value. 

$ kubectl -n test get pvc -o custom-columns="NAME:.metadata.name,\
 POLICY:.metadata.labels.clickhouse\.altinity\.com\/reclaimPolicy,\
 SIZE:.spec.resources.requests.storage,\
 PV:.spec.volumeName"
NAME                                      POLICY    SIZE    PV
storage-vc-template-chi-ch-simple-0-0-0   Retain    10Gi    pvc-b8ea609f-edb9-4712-ace1-7b3564ca10d1

You can see from this example that the Altinity Operator implements the reclaimPolicy setting by adding a custom label to the PVC. It looks for the label when deleting clusters. If the label is set, it will not delete the PVC. 

Important safety tip! If you do not see “Retain” under the POLICY heading, stop now and figure out why. The reclaimPolicy: Retain property must be at the same level as the spec: section in the storage template. If you put it inside the spec: section or some other random location, the operator will ignore it and your PVC will not be protected. The same thing applies if you somehow remove the label on the PVC itself. 

We can now delete the cluster as before. We’ll confirm the cluster resource definition is gone, then check for the PVC. 

$ kubectl delete chi/ch
clickhouseinstallation.clickhouse.altinity.com "ch" deleted
$ kubectl get chi
No resources found in default namespace.
$ kubectl get pvc -o custom-columns="NAME:.metadata.name,\
 POLICY:.metadata.labels.clickhouse\.altinity\.com\/reclaimPolicy,\
 SIZE:.spec.resources.requests.storage,\
 PV:.spec.volumeName"
NAME                                      POLICY    SIZE    PV
storage-vc-template-chi-ch-simple-0-0-0   Retain    10Gi    pvc-b8ea609f-edb9-4712-ace1-7b3564ca10d1

This time, the persistent volume claim is still there. This proves that the Altinity operator does not delete any PVC if the claim is marked with reclaimPolicy: Retain in the ClickHouse storage definition. 

Recovering Storage After Accidental Deletion

Recovering storage is simple. Apply the original definition again as follows. 

kubectl apply -f retain-demo.yaml

The Altinity Operator will use the existing storage claims. This works because the operator creates stateful set resources with the same names and properties for volume claims. The stateful set controller will automatically connect pods to the existing PVCs rather than create new ones.  We can connect to our cluster and confirm that the test table is still there. 

kubectl exec -it chi-ch-simple-0-0-0 -- clickhouse-client
. . .
chi-ch-simple-0-0-0.chi-ch-simple-0-0.test.svc.cluster.local :) show tables

SHOW TABLES

┌─name─┐
│ foo  │
└──────┘

As we see, the reclaimPolicy property protected the storage on accidental cluster deletion. It also allowed us to reattach storage by simply reapplying the original cluster definition. 

What If We Really Wanted to Delete the Cluster?

Just delete the PVCs as an extra step after deleting the cluster. So the full deletion procedure looks like: 

$ kubectl delete chi/ch
clickhouseinstallation.clickhouse.altinity.com "ch" deleted
$ kubectl delete pvc/storage-vc-template-chi-ch-simple-0-0-0
persistentvolumeclaim "storage-vc-template-chi-ch-simple-0-0-0" deleted

There is another way to do this. You can use ‘kubectl patch’ to remove the reclaimPolicy setting from your CHI definition, then delete the cluster.  

Conclusion

Guardrails against accidental deletion are a necessity for any system that manages production data. The reclaimPolicy: Retain property serves this purpose for cloud native ClickHouse clusters. 

We use the property ourselves to protect storage from accidental deletion in Altinity.Cloud clusters. It’s just one of many features in the Altinity Kubernetes operator for ClickHouse to help keep data available and secure. If you would like to find out more, check out our GitHub project, join the Altinity Slack Workspace, or contact us directly. We would be glad to answer your questions. 

Until then, be safe out there! Use reclaimPolicy: Retain in all your cluster resource definitions.

Share

Related: