What’s New in Altinity clickhouse-operator?

Cat looking at computer

The Altinity Kubernetes Operator for ClickHouse, aka clickhouse-operator is celebrating 4 years of production use. 4 years ago we presented it at the Cyprus ClickHouse Meetup, and deployed it for the first customer in June 2019. Since then we kept working on making it the most flexible, robust and secure operator for databases. This is our most popular open source project so far with 1300+ stars, and it surpasses any other operator for databases in Kubernetes, except for PostgreSQL. Altinity Operator is used by many companies including eBay, Cisco and Twilio (Segment.io). It also powers Altinity.Cloud.

Let’s give an overview of some of the new features that we have added recently. One of the biggest improvements involves restarts, which are now unnecessary in the following cases:

  • Extending the size of storage for storage types that allow automatic extension. 
  • Many (though not all) configuration changes. 

Reducing restarts has been a long-term goal for the Operator. Large ClickHouse servers may take a while to drain current queries, and even longer to come fully back online. 

Operator Managed PVC Provisioning

The standard approach to provision persistence volumes in Kubernetes is to rely on StatefulSet. VolumeClaimTemplate can be defined in StatefulSet, and it is used to create PersistentVolumeClaim (PVC), and PersistentVolume. This is convenient, since both PodTemplate and VolumeClaimTemplate are defined in one resource.

Kubernetes operator for ClickHouse

The main problem is that the StatefulSet does not allow modifications to templates. So if a change is needed to the PVC, such as extending the volume size, the Operator has to re-create the StatefulSet, which results in ClickHouse restart. Restarts may be expensive and undesirable on big clusters.

In clickhouse-operator 0.20 and above, we added a storage management setting that defines if PVC needs to be provisioned by the StatefulSet or the operator. The operator managed persistence can be turned on like this:

spec:
  defaults:
    storageManagement:
      provisioner: Operator

When enabled, there is no PVC template in the StatefulSet anymore; the Operator creates and modifies PVC directly. The big benefit for users: we can rescale volumes without a ClickHouse restart!

Altinity Kubernetes operator

Note, however, that the default storage management provisioner is StatefulSet, as it was before. It can be switched to Operator on an existing ClickHouseInstallation, and the Operator will start managing PVC created by StatefulSet before. That change will require a restart, that’s why it is not enabled by default.

Operator Security Hardening Guide

There were a lot of security improvements over the last 6 months. Some of those changes were enabled by default, others are configurable. To make configuration easier for users, we have documented all security features and best practices in the Security Hardening Guide. Let me highlight some of the most interesting ones.

Secure distributed queries

Distributed queries in ClickHouse are run under the ‘default’ user. This has several security concerns, so ClickHouse has ways to make it more secure. The easiest one is to protect inter-cluster queries with a special ‘secret’ in the cluster configuration. When enabled, Distributed queries will be validated on shards making sure that secret is the same, and also using initial query user instead of ‘default’. This configuration is defined in the “remote_servers” section that is generated by the operator. Since version 0.20, it is possible to define it directly in the cluster definition as follows:

spec:
  configuration:
    clusters:
      - name: my-cluster
        secret:
          auto: "true"

That will generate the secret token automatically and put it into a secret. It is also possible to define the secret reference explicitly if needed. We plan to make this behavior default in the new releases.

Secure network

Out-of-the box ClickHouse comes with insecure HTTP and TCP ports. It was always possible to harden security with configuration of pod templates and changing ClickHouse settings. It is much easier with the Operator now, thanks to several helpful new switches. Check out, for example, the new ‘secure’ and ‘insecure’ settings on clusters. 

spec:
  configuration:
    clusters:
    - name: my-cluster
      secure: "yes"
      insecure: "no"

This will enable default secure ports (9440 and 8443), and correspondingly disable insecure ones. Moreover, it will also route distributed queries to use secure ports as well.

Note, that in order for SSL to work, certificates need to be configured for ClickHouse. Please refer to the Security Hardening Guide for more detail.

Control of ClickHouse Restart When Applying Configuration Changes

Since the very first releases, the Altinity Operator used to restart the ClickHouse server when applying configuration changes. Old Kubernetes versions and old ClickHouse versions required that in order the changes to take effect, so the operator went along. Unfortunately, such restarts can make configuration changes quite cumbersome and disruptive for production clusters, especially if you need to do them multiple times. 

However, both Kubernetes and ClickHouse rapidly evolved, and now it is possible to dynamically adjust most of the server settings without a restart. The rules are not rigid though. Some of them are described in the Altinity Knowledge Base, and they also change between releases.

In order to make it flexible, we have added a new feature into the 0.21 version of the Operator – configurable restart policy. It is defined in operator configuration, and looks like this:

  configurationRestartPolicy:
   rules:
      - version: "*"
        rules:
          - settings/*: "yes"
          - settings/dictionaries_config: "no"
          - settings/logger: "no"
          - settings/macros/*: "no"
          - settings/max_server_memory_*: "no"
          - settings/max_*_to_drop: "no"
          - settings/max_concurrent_queries: "no"
          - settings/models_config: "no"
          - settings/user_defined_executable_functions_config: "no"
          - zookeeper/*: "yes"
          - files/config.d/*.xml: "yes"
          - files/config.d/*dict*.xml: "no"
          - profiles/default/background_*_pool_size: "yes"
          - profiles/default/max_*_for_server: "yes"

      - version: "21.*"
        rules:
          - settings/logger: "yes"

When the Operator observes a setting change, it traverses the list of rules from top to the bottom, matching the expression against the configuration path. The latest match is used in order to decide if restart is needed or not. If at least one setting change requires a restart, the Operator performs a graceful restart.

Other Interesting Features and Future Plans

We briefly described some of the new features of Altinity ClickHouse Operator for Kubernetes. There are many more. In particular, one of the secret weapons is templates. It is possible to use ClickHouseInstallation templates as building blocks, and even inject settings, labels, annotations or ClickHouse versions to existing ClickHouse installations. We are going to describe it in one of the next articles on this topic.

Over the next few months we will add more. In particular, we plan to add user defined metrics. That will make metrics-exporter sidecar even more useful, since users will be able to define application specific metrics, and have them exported together with ClickHouse ones. Another feature in the roadmap is an ability for users to control readiness of ClickHouse nodes. For example, it should be possible to define that a new replica is not added to the load balancer until the data is fully replicated.

Please subscribe to the Altinity clickhouse-operator GitHub project to be notified about new releases. We will also describe them in this blog.  

Share

Related: