Altinity ClickHouse Operator for Kubernetes

When I was setting up my first ClickHouse clusters 3 years ago it was like a journey to an unknown world full of caveats. ClickHouse is very simple and easy to use but not THAT simple. Sometimes I dreamed that setting up the cluster would be as easy as making a cup of coffee. It took us a while to find the right approach, but finally our dreams came true. Today, we are happy to introduce ClickHouse operator for Kubernetes!

Why Kubernetes

Kubenetes is increasingly popular open source platform for managing resources and applications. Originally developed for stateless services orchestration, eventually it opened doors to stateful ones, including databases. Many companies are moving their infrastructure inside Kubernetes, because it is simpler to manage. One of the most important things about Kubernetes is portability. Kubernetes clusters can be installed on bare metal servers, public cloud providers like Amazon, Azure and Google Cloud, and in private clouds. Thus applications developed for Kubernetes can run virtually everywhere. Indeed, Kubernetes is sometimes called the new Linux. So once we have ClickHouse running inside the Kubernetes, it opens up a way to many environments.

What is an Operator

An operator in Kubernetes is a somewhat new concept. In a nutshell it is a Kubernetes extension that simplifies configuration, management, monitoring and more, for certain application types. Human operators operate real life systems, knowing the domain very well, knowing how to handle standard maintenance tasks and reacting properly to incidents. Operators in Kubernetes do pretty much the same: they operate applications in an efficient and application-specific way. Our friends from the Kubernetes consulting company Flant like saying that Kubernetes operators are codified operational knowledge.

There are operators available for many applications inside Kubernetes already, including MySQL, PostgreSQL, MongoDB, Kafka and so on. Typically an operator is responsible for:

  • Setting up the application in a proper way
  • Applying application changes, e.g. configuration changes, upgrades and so on
  • Monitoring application status and exporting metrics to a proper monitoring solution
  • Performing maintenance tasks

Altinity ClickHouse operator does all of the above, and much more for ClickHouse in Kubernetes. A single ClickHouse server is very easy to install. But if you ever tried creating a ClickHouse cluster you know that this part is not so obvious. Our tutorial may help you understand the concepts, but there is still quite a lot of manual work required. With the ClickHouse operator there is not much of a difference if you are setting up a single node or a cluster. What it means is that you can start ClickHouse cluster of any size in seconds! A good example is worth a thousand words, so let’s go.

Creating ClickHouse clusters

We assume that you are familiar with Kubernetes already. If it is not the case, please refer to for a quick Kubernetes introduction. The examples below were run using local minikube installation at my laptop, but there is no difference in real production Kubernetes environments. Proper storage provisioning and networking in Kubernetes have some specifics that are out of the scope in this article. ClickHouse operator needs to be installed to your Kubernetes system first; this is a very simple step, but we will skip it for now. Instead, we will jump into an example right away.

hello-kubernetes.yaml’ describes the Kubernetes object specification for ClickHouse installation of a 3 shard cluster as follows:

apiVersion: ""
kind: "ClickHouseInstallation"
  name: "hello-kubernetes"
    ? name: "sharded"
          type: Standard
          shardsCount: 3
$ kubectl -n test apply -f hello-kubernetes.yaml created
$ kubectl -n test get services
NAME                         TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)                                        AGE
chi-a9cffb-347e-0-0          ClusterIP      None                         8123/TCP,9000/TCP,9009/TCP                     7s
chi-a9cffb-347e-1-0          ClusterIP      None                         8123/TCP,9000/TCP,9009/TCP                     7s
chi-a9cffb-347e-2-0          ClusterIP      None                         8123/TCP,9000/TCP,9009/TCP                     7s
clickhouse-hello-kubernetes  LoadBalancer                 8123:30703/TCP,9000:30348/TCP   7s
$ docker run -it yandex/clickhouse-client clickhouse-client -h
ClickHouse client version 19.1.14.
Connecting to
Connected to ClickHouse server version 19.1.14 revision 54413.
chi-a9cffb-347e-1-0-0.chi-a9cffb-347e-1-0.test.svc.cluster.local  :) 
chi-a9cffb-347e-1-0-0.chi-a9cffb-347e-1-0.test.svc.cluster.local :) create table test_distr as Engine = Distributed('sharded', system, one);
CREATE TABLE test_distr AS
ENGINE = Distributed('sharded', system, one)
0 rows in set. Elapsed: 0.016 sec. 
chi-a9cffb-347e-1-0-0.chi-a9cffb-347e-1-0.test.svc.cluster.local :) select hostName() from test_distr;
SELECT hostName()
FROM test_distr 
│ chi-a9cffb-347e-0-0-0 │
│ chi-a9cffb-347e-1-0-0 │
│ chi-a9cffb-347e-2-0-0 │
3 rows in set. Elapsed: 0.054 sec. 

configuration that defines how ClickHouse hosts comprise the cluster. That allowed us to create distributed tables.

Managing ClickHouse clusters

We have just demonstrated how easily ClickHouse clusters can be started with an operator. But what if we want to add one more shard? We can do it by altering the yaml file with the object specification:

apiVersion: ""
kind: "ClickHouseInstallation"
  name: "hello-kubernetes"
      ? name: "sharded"
          type: Standard
          shardsCount: 4 # Added one node

Instead of 3 shards, now we requested 4. Let’s apply the modified object specification to Kubernetes and check the result:

$ kubectl -n test apply -f hello-kubernetes.yaml configured
$ kubectl -n test get services
NAME                          TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)                                        AGE
chi-a9cffb-347e-0-0           ClusterIP      None                         8123/TCP,9000/TCP,9009/TCP                     9m
chi-a9cffb-347e-1-0           ClusterIP      None                         8123/TCP,9000/TCP,9009/TCP                     9m
chi-a9cffb-347e-2-0           ClusterIP      None                         8123/TCP,9000/TCP,9009/TCP                     9m
chi-a9cffb-347e-3-0           ClusterIP      None                         8123/TCP,9000/TCP,9009/TCP                     27s
clickhouse-hello-kubernetes   LoadBalancer                 8123:32701/TCP,9000:32464/TCP   9m

In this case Kubernetes sent a message to our operator about the changes. The operator understood that there is a change in the object specification and that the change requires an extra shard to be added. The operator provisioned a new pod, configured an extra shard, and updated the configuration of previously created shards, so they are all aware of each other. With a few keystrokes we have scaled up ClickHouse cluster!

What else is the operator capable of? The operator helps with everything that you typically need in order to manage ClickHouse clusters:

  • Managing persistent volumes to be used for ClickHouse data
  • Configuring pod deployment (pod templates, affinity rules and so on)
  • Creating replicated clusters
  • Managing users/profiles configuration
  • Exporting ClickHouse metrics to Prometheus
  • Handling ClickHouse version upgrades
  • …and more

We are going to discuss ClickHouse operator features and configuration details in the next few articles. For those who cannot stand the suspense, you are welcome to try it on your own. Documentation and examples should help you to start using the operator quickly.

Also do not miss the “ClickHouse on Kubernetes!” webinar on April 16th. Altinity CEO Robert Hodges will be there with you to demonstrate how to set up ClickHouse on Kubernetes with Altinity ClickHouse operator, and can answer questions. Stay tuned!



Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.