What’s new in the Altinity Kubernetes Operator for ClickHouse®?

The Altinity Operator is the most popular way to run ClickHouse in Kubernetes, whether it is on-prem or in cloud. It quietly celebrated its 5th birthday in 2024. So it is time to go to school and learn something new. As you probably know, in the past ClickHouse required ZooKeeper to coordinate replication. And though the operator could easily spin up ClickHouse clusters and configure replication, it relied on an external ZooKeeper installation.
Two years ago, when the ClickHouse team introduced the replacement for ZooKeeper – ClickHouse Keeper – we planned to add native support to the operator for that. The first version was contributed by a community member a year ago (see the PR from Frank Wong), and after some refactoring we have released it as an experimental feature in the 0.23 version of the operator. However, it lacked many things that operator users expect, like flexible volume management, service templates, and others. That resulted in a lot of confusion and numerous bug reports. It took us a few months to rewrite Keeper support from scratch unifying functionality with ClickHouse operator. We are happy to announce that Keeper is finally supported in Altinity Operator for ClickHouse as of 0.24.0 version!
Note: To keep things clear, throughout this post the word “Keeper” by itself refers to ClickHouse Keeper.
How to manage ClickHouse Keeper with the Altinity Operator
Managing ClickHouse Keeper with the Altinity Operator is very simple and looks familiar to operator users. Just make sure you have installed operator version 0.24.0 or above, and then apply this manifest:
apiVersion: "clickhouse-keeper.altinity.com/v1"
kind: "ClickHouseKeeperInstallation"
metadata:
name: simple-1
spec:
configuration:
clusters:
- name: cluster1
Once applied, the operator will deploy Kubernetes resources to make Keeper to work properly. In order to reference it from ClickHouseInstallation service service/keeper-simple-1
can be used as follows:
apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
name: simple-with-keeper
spec:
configuration:
zookeeper:
nodes:
- host: keeper-simple-1 # This is a service name of chk/simple-1
port: 2181
clusters:
- name: default
replicasCount: 2
That’s it. ClickHouse replicated cluster with Keeper is ready to go!
Of course, examples above do not have persistence and can only be used for demonstration purposes. ClickHouseKeeperInstallation (or, CHK) resource supports all the features needed for production operation, such as volume claims, pod and service templates, replica counts and configuration changes.. Here is an example that shows more features in action:
apiVersion: "clickhouse-keeper.altinity.com/v1"
kind: "ClickHouseKeeperInstallation"
metadata:
name: clickhouse-keeper
spec:
defaults:
templates:
podTemplate: default
volumeClaimTemplate: default
templates:
podTemplates:
- name: default
spec:
containers:
- name: clickhouse-keeper
image: "clickhouse/clickhouse-keeper:24.3.5.46"
volumeClaimTemplates:
- name: default
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
configuration:
clusters:
- name: default
layout:
replicasCount: 3
settings:
logger/level: trace
As you can see, a ClickHouseKeeperInstallation is very similar to a ClickHouseInstallation, so if you’re already familiar with Altinity Operator, there is nothing new.
Upgrade from 0.23.x
Altinity Operator 0.23.x used a different implementation of Keeper resource. For example, if we would use the 0.23.x operator with the simple-1
ClickHouseKeeperInstallation above, the default service name would be simple-1
instead of keeper-simple-1
. But there is a bigger problem: the storage mapping is completely different. If one upgrades the operator from 0.23.x to 0.24.0 and reconciles Keeper resources, those will lose storage. What should users do? There are two possibilities.
Recover Keeper metadata
The first approach is to recreate Keeper metadata from ClickHouse.There is a SYSTEM RESTORE REPLICA command for that purpose. This needs to be done for every ReplicatedMergeTree table and for every replica. That works for small clusters. However, it does not recover other data that may be stored in Keeper. For example, it does not recover users or user defined functions.
Remap Persistent Volume
If recovering Keeper is not an option, it is possible to do a migration and remap the persistent volume created by operator 0.23.x. Before doing this, let’s take a look at the differences between objects created by 0.23.x and 0.24+.
Let’s consider that single node CHK installation is named test
. So the following objects would be created:
0.23.x | 0.24+ | |
Pod name | test-0 | chk-test-simple-0-0-0 |
Service name | test | keeper-test |
PVC name | both-paths-test-0 | default-chk-test-0-0-0 |
Volume mounts | - mountPath: /var/lib/clickhouse_keeper | - mountPath: /var/lib/clickhouse-keeper |
So, in order to remap the volume, the following steps need to be done:
- Find Persistent Volume (PV) in old CHK installation
- Patch it, setting the persistentVolumeReclaimPolicy to Retain:
kubectl patch pv $PV -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
- Delete the old CHK installation
- Delete the old PVC, since it is not deleted automatically
- Patch the PV one more time, removing the claimRef. That will make the volume available for remounting:
kubectl patch pv $PV -p '{"spec":{"claimRef": null}}'
- Upgrade the operator to 0.24.x
- Deploy the new CHK with the following changes:
- Add volumeName to volumeClaimTemplate, referencing the old volume
- Add settings to mount logs and raft coordination to folders matching old operator:
keeper_server/log_storage_path: /var/lib/clickhouse-keeper/logs
keeper_server/snapshot_storage_path: /var/lib/clickhouse-keeper/snapshots - Add a
serviceTemplate
to match the old name
Please refer to the migration procedure for more detail.
Once the CHK is up, ClickHouse will resume working.
Limitations and Future Work
The current implementation is not ideal. (There is nothing ideal in the world.) We can see at least two things that can be improved.
Reference CHK from CHI
As you could see in examples above, we have to reference CHK from CHI in “the old style”. This is fully compatible with ZooKeeper, but requires knowing service names. After all, it does not look pretty. Instead, we could reference CHK directly by name, as follows:
configuration:
zookeeper:
keeper:
- name: my-keeper
namespace: my-namespace
serviceType: loadBalancer | replicas
So instead of the service name we would use CHK name and also specify the way ClickHouse should be configured to access it: using load balancer service or individual replica services. The latter one may unlock some interesting capabilities to optimize network costs in public clouds.
Dynamic Reconfiguration
Sometimes it is necessary to change the Keeper cluster, e.g. add more replicas. That requires more than just spinning extra pods and services. The Keeper cluster needs to be notified of the change. Of course, it can be done via full restart but it may affect ClickHouse users. There is a better way – dynamic reconfiguration. Once implemented in the operator, it will make all changes go smoothly.
Both features are already in development for the next operator release.
Get started!
We encourage you to get the latest version of the operator to take advantage of the ClickHouse Keeper support. Please subscribe to the Altinity clickhouse-operator project in GitHub to be notified about new releases. And of course, we’ll post all the details here as we continue to add more features to the operator.
ClickHouse® is a registered trademark of ClickHouse, Inc.; Altinity is not affiliated with or associated with ClickHouse, Inc.