When I was setting up my first ClickHouse clusters 3 years ago it was like a journey to an unknown world full of caveats. ClickHouse is very simple and easy to use but not THAT simple. Sometimes I dreamed that setting up the cluster would be as easy as making a cup of coffee. It took us a while to find the right approach, but finally our dreams came true. Today, we are happy to introduce ClickHouse operator for Kubernetes!
Kubenetes is increasingly popular open source platform for managing resources and applications. Originally developed for stateless services orchestration, eventually it opened doors to stateful ones, including databases. Many companies are moving their infrastructure inside Kubernetes, because it is simpler to manage. One of the most important things about Kubernetes is portability. Kubernetes clusters can be installed on bare metal servers, public cloud providers like Amazon, Azure and Google Cloud, and in private clouds. Thus applications developed for Kubernetes can run virtually everywhere. Indeed, Kubernetes is sometimes called the new Linux. So once we have ClickHouse running inside the Kubernetes, it opens up a way to many environments.
What is an Operator
An operator in Kubernetes is a somewhat new concept. In a nutshell it is a Kubernetes extension that simplifies configuration, management, monitoring and more, for certain application types. Human operators operate real life systems, knowing the domain very well, knowing how to handle standard maintenance tasks and reacting properly to incidents. Operators in Kubernetes do pretty much the same: they operate applications in an efficient and application-specific way. Our friends from the Kubernetes consulting company Flant like saying that Kubernetes operators are codified operational knowledge.
There are operators available for many applications inside Kubernetes already, including MySQL, PostgreSQL, MongoDB, Kafka and so on. Typically an operator is responsible for:
- Setting up the application in a proper way
- Applying application changes, e.g. configuration changes, upgrades and so on
- Monitoring application status and exporting metrics to a proper monitoring solution
- Performing maintenance tasks
Altinity ClickHouse operator does all of the above, and much more for ClickHouse in Kubernetes. A single ClickHouse server is very easy to install. But if you ever tried creating a ClickHouse cluster you know that this part is not so obvious. Our tutorial may help you understand the concepts, but there is still quite a lot of manual work required. With the ClickHouse operator there is not much of a difference if you are setting up a single node or a cluster. What it means is that you can start ClickHouse cluster of any size in seconds! A good example is worth a thousand words, so let’s go.
Creating ClickHouse clusters
We assume that you are familiar with Kubernetes already. If it is not the case, please refer to https://kubernetes.io for a quick Kubernetes introduction. The examples below were run using local minikube installation at my laptop, but there is no difference in real production Kubernetes environments. Proper storage provisioning and networking in Kubernetes have some specifics that are out of the scope in this article. ClickHouse operator needs to be installed to your Kubernetes system first; this is a very simple step, but we will skip it for now. Instead, we will jump into an example right away.
‘hello-kubernetes.yaml’ describes the Kubernetes object specification for ClickHouse installation of a 3 shard cluster as follows:
apiVersion: "clickhouse.altinity.com/v1" kind: "ClickHouseInstallation" metadata: name: "hello-kubernetes" spec: configuration: clusters: ? name: "sharded" layout: type: Standard shardsCount: 3
$ kubectl -n test apply -f hello-kubernetes.yaml clickhouseinstallation.clickhouse.altinity.com/hello-kubernetes created
$ kubectl -n test get services NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE chi-a9cffb-347e-0-0 ClusterIP None 8123/TCP,9000/TCP,9009/TCP 7s chi-a9cffb-347e-1-0 ClusterIP None 8123/TCP,9000/TCP,9009/TCP 7s chi-a9cffb-347e-2-0 ClusterIP None 8123/TCP,9000/TCP,9009/TCP 7s clickhouse-hello-kubernetes LoadBalancer 10.98.156.78 8123:30703/TCP,9000:30348/TCP 7s
$ docker run -it yandex/clickhouse-client clickhouse-client -h 10.98.156.78 ClickHouse client version 19.1.14. Connecting to 10.98.156.78:9000. Connected to ClickHouse server version 19.1.14 revision 54413.
chi-a9cffb-347e-1-0-0.chi-a9cffb-347e-1-0.test.svc.cluster.local :) create table test_distr as system.one Engine = Distributed('sharded', system, one);
CREATE TABLE test_distr AS system.one ENGINE = Distributed('sharded', system, one)
0 rows in set. Elapsed: 0.016 sec.
chi-a9cffb-347e-1-0-0.chi-a9cffb-347e-1-0.test.svc.cluster.local :) select hostName() from test_distr;
SELECT hostName() FROM test_distr
┌─hostName()────────────┐ │ chi-a9cffb-347e-0-0-0 │ └───────────────────────┘ ┌─hostName()────────────┐ │ chi-a9cffb-347e-1-0-0 │ └───────────────────────┘ ┌─hostName()────────────┐ │ chi-a9cffb-347e-2-0-0 │ └───────────────────────┘
3 rows in set. Elapsed: 0.054 sec.
configuration that defines how ClickHouse hosts comprise the cluster. That allowed us to create distributed tables.
Managing ClickHouse clusters
We have just demonstrated how easily ClickHouse clusters can be started with an operator. But what if we want to add one more shard? We can do it by altering the yaml file with the object specification:
apiVersion: "clickhouse.altinity.com/v1" kind: "ClickHouseInstallation" metadata: name: "hello-kubernetes" spec: configuration: clusters: ? name: "sharded" layout: type: Standard shardsCount: 4 # Added one node
Instead of 3 shards, now we requested 4. Let’s apply the modified object specification to Kubernetes and check the result:
$ kubectl -n test apply -f hello-kubernetes.yaml clickhouseinstallation.clickhouse.altinity.com/hello-kubernetes configured
$ kubectl -n test get services NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE chi-a9cffb-347e-0-0 ClusterIP None 8123/TCP,9000/TCP,9009/TCP 9m chi-a9cffb-347e-1-0 ClusterIP None 8123/TCP,9000/TCP,9009/TCP 9m chi-a9cffb-347e-2-0 ClusterIP None 8123/TCP,9000/TCP,9009/TCP 9m chi-a9cffb-347e-3-0 ClusterIP None 8123/TCP,9000/TCP,9009/TCP 27s clickhouse-hello-kubernetes LoadBalancer 10.108.57.65 8123:32701/TCP,9000:32464/TCP 9m
In this case Kubernetes sent a message to our operator about the changes. The operator understood that there is a change in the object specification and that the change requires an extra shard to be added. The operator provisioned a new pod, configured an extra shard, and updated the configuration of previously created shards, so they are all aware of each other. With a few keystrokes we have scaled up ClickHouse cluster!
What else is the operator capable of? The operator helps with everything that you typically need in order to manage ClickHouse clusters:
- Managing persistent volumes to be used for ClickHouse data
- Configuring pod deployment (pod templates, affinity rules and so on)
- Creating replicated clusters
- Managing users/profiles configuration
- Exporting ClickHouse metrics to Prometheus
- Handling ClickHouse version upgrades
- …and more
We are going to discuss ClickHouse operator features and configuration details in the next few articles. For those who cannot stand the suspense, you are welcome to try it on your own. Documentation and examples should help you to start using the operator quickly.
Also do not miss the “ClickHouse on Kubernetes!” webinar on April 16th. Altinity CEO Robert Hodges will be there with you to demonstrate how to set up ClickHouse on Kubernetes with Altinity ClickHouse operator, and can answer questions. Stay tuned!