Eureka! 8 developer tricks for running ClickHouse® on Kubernetes

Recorded: February 27 @ 07:00 am PST
Presenter: Robert Hodges

In this webinar, Robert Hodges, CEO of Altinity, presents eight battle-tested tricks for running ClickHouse® efficiently on Kubernetes using the Altinity Kubernetes Operator for ClickHouse®. The session is grounded in years of operational experience managing ClickHouse clusters at scale, from embedded systems to hundreds-of-node clusters in public clouds.

Robert opens with a level-set on ClickHouse, Kubernetes, and the Kubernetes operator pattern, explaining how the ClickHouseInstallation custom resource definition allows you to define an entire cluster as a YAML manifest and have the operator reconcile reality to match it. He then works through all eight tricks systematically.

Trick 1 covers using Argo CD for GitOps-style management of ClickHouse manifests, with a GitHub example project showing commands to bring up a full stack including Prometheus, Grafana, and the operator. Trick 2 introduces the Altinity EKS Terraform blueprint as a compact way to provision production-ready EKS clusters for ClickHouse. Trick 3 explains how to use node selectors and Kubernetes cluster autoscaling to pin ClickHouse pods to specific VM types, enabling simple vertical scaling by changing an image value. Trick 4 covers availability zone affinity and anti-affinity using the operator’s compact zone syntax, ensuring replicas land in different data centers rather than on the same node. Trick 5 shows the stop property for turning off compute during idle periods, preserving storage while dropping VM costs. Trick 6 covers the taskId property for forcing operator reconciliation after external changes like password rotations. Trick 7 demonstrates rolling upgrades by changing a container image name, with the operator handling shard-parallel upgrades without downtime. Trick 8 covers the new ClickHouseKeeperInstallation custom resource, which brings the same operator-managed convenience to ClickHouse Keeper that ClickHouseInstallation provides for ClickHouse itself.

The Q&A section provides detailed answers on topics including when to use Kubernetes versus bare metal, the best way to learn Kubernetes, S3 storage policies, password reconciliation behavior, tiered storage management, Terraform module availability for other clouds, the recommendation to use ClickHouse Keeper over ZooKeeper for new deployments, backup via clickhouse-backup as a sidecar, migrating from ZooKeeper to ClickHouse Keeper, and how to migrate from EC2 to Kubernetes using replica-based migration.

Here are the slides:

Eureka-8-developer-tricks-for-running-ClickHouse-on-Kubernetes-2024-02-27 Download

Key Moments (Timestamps)

Key moments generated with AI assistance.

0:10 – Welcome, introductions, and housekeeping
1:24 – About Altinity: products and open source projects
3:17 – ClickHouse, Kubernetes, and operators: a quick level-set
6:12 – Trick 1: Use Argo CD to manage ClickHouse manifests
11:29 – Trick 2: Use Terraform to set up managed Kubernetes
15:18 – Trick 3: Scale compute using VM autoscaling and node selectors
21:43 – Trick 4: Spread ClickHouse servers across availability zones using affinity
26:41 – Trick 5: Turn off compute using the stop property
29:37 – Trick 6: Force operator reconciliation with task ID
34:27 – Trick 7: Upgrade ClickHouse automatically with a rolling upgrade
36:44 – Trick 8: Run ClickHouse Keeper with the Altinity Operator
39:28 – Getting help: Altinity.Cloud® and enterprise support
43:00 – Q&A

Webinar Transcript

[0:10] – Welcome, Introductions, and Housekeeping

Robert: Welcome, everyone, to “Eureka! 8 Developer Tricks for Running ClickHouse® on Kubernetes.” My name is Robert Hodges and I’ll be doing the presentation today. Before we get started, a little housekeeping. This presentation is being recorded, as you probably just heard. We’ll send out a link to the presentation and slides within a few hours, a day at most. Everything will be released publicly and the video will go up on YouTube as well. Questions can go into the Q&A box on the Zoom menu bar, or into the chat. If they’re relevant to the current slide, I’ll dive in and answer them; otherwise we’ll have time at the end.

[1:24] – About Altinity

Robert: Once again I’m Robert Hodges. I’ve been working on databases for about 40 years, on Kubernetes since 2018, and I do a lot of presentations like these as well as participating in the engineering work we do on Kubernetes at Altinity. The Altinity engineering team includes a large number of database geeks like me, spread over about 16 countries, with deep experience in both analytic database technology and the applications that run on top of it.

Altinity is a service provider for ClickHouse. We help people run ClickHouse and build applications on it. Our offerings range from Altinity.Cloud®, our fully managed ClickHouse service, all the way to enterprise support for those who want to run it their own way. We are the authors of the Altinity Kubernetes Operator for ClickHouse®, which will be the main subject of this talk. We’ve done a huge amount of work both in software and operationally to run ClickHouse at scale in Kubernetes, and a lot of what I’ll be sharing today is based on those learnings.

[3:17] – ClickHouse, Kubernetes, and Operators: A Quick Level-Set

Robert: Just in case you’re not familiar with all three pieces, let me do a quick three-slide intro to level-set.

ClickHouse® is a real-time analytic database. It can ingest data extremely quickly, scan it very quickly, and give you answers in less than a second in many cases, scaling to trillions of rows. You can think of it as having the flexibility and open source properties of MySQL: runs everywhere, speaks SQL, convenient licensing, plus the features of an analytic database: column storage, parallel vectorized execution, and scalability to many petabytes.

Kubernetes is an orchestrator for container-based applications. You define your application as resources and those resources are mapped to physical infrastructure.

Operators are one of the really great innovations in the Kubernetes ecosystem. They allow you to create new kinds of resources. With the Altinity Operator we’ve created a custom resource called ClickHouseInstallation. You define these in YAML files, load them into Kubernetes with kubectl apply, they’re stored safely in etcd, and then handed to the ClickHouse operator. The operator looks at the new definition, compares it with the currently allocated resources, and adjusts them so that they match your specification. That is how we bring up a ClickHouse cluster.

[6:12] – Trick 1: Use Argo CD to Manage ClickHouse Manifests

Robert: Tip number one: Use Argo CD to manage your ClickHouse manifests. Argo CD is a system for implementing GitOps, which means storing your operational configuration in Git and systematically applying it to your environments. You put definitions in GitHub, install Argo CD itself into your Kubernetes cluster, point it at your applications defined in GitHub, and it reads them, finds the manifests or Helm charts that compose those applications, and applies them to the cluster.

The majority of ClickHouse users on Kubernetes are now using Argo CD, though Flux and plain Helm also work perfectly well.

Here’s what a ClickHouseInstallation resource looks like as a reminder. We have the cluster layout: one shard, two replicas. A clause telling us where ClickHouse Keeper or ZooKeeper lives. Pod definitions and storage. The pod template just tells us the container image to run. In this case, we’re running Altinity Stable® Builds for ClickHouse® builds, but you can run any build, including official upstream builds. Storage is a volume claim template that turns into a PersistentVolumeClaim. The reclaimPolicy: Retain option is particularly important: it means if you fat-finger a cluster deletion, your storage doesn’t go away with it.

Normally, you would have this YAML file sitting on a laptop and run kubectl apply -f filename.yaml to load it. With Argo CD, you run two commands: argocd app create pointing at the GitHub repo containing your manifest, and argocd app sync to deploy it. That’s it. Go to the GitHub repo shown in the code samples section of this talk for a full set of examples that install everything from Prometheus to Grafana to the ClickHouse operator.

[11:29] – Trick 2: Use Terraform to Set Up Managed Kubernetes

Robert: Tip number two: use Terraform to set up managed Kubernetes. Managed Kubernetes services from every major cloud provider, from Amazon EKS to Google GKE to specialty clouds like Civo and Linode, are an excellent deal. In the case of Amazon EKS, running the cluster control plane costs only 10 cents per hour and the VMs you spin up are costs you would have paid anyway.

We’ve been doing a lot of work on Terraform and now have a compact EKS blueprint you can use today. This is the actual code used to set up the cluster for all the samples in this talk. It uses the terraform-aws-eks-clickhouse module from GitHub. Beyond that the configuration is simple: region, network masks, and node pool definitions. The m6i.large and m6i.xlarge instance types in the node pools are specifically chosen because ClickHouse loves memory, and the m instance family is well-suited for it. This is currently in beta and will be official very shortly. You can find it at the GitHub URL referenced in the demo resources.

Terraform modules for other clouds (GKE, Azure) are also in progress. GKE will likely come first.

[15:18] – Trick 3: Scale Compute Using VM Autoscaling and Node Selectors

Robert: Tip number three: scale compute using a VM autoscaler. This is one of the reasons why Kubernetes, despite its learning curve, is becoming dominant.

The Kubernetes cluster autoscaler watches pods and their resource requests. When a pod can’t be scheduled due to insufficient resources, the autoscaler provisions more VMs. When those pods go away and VMs become underutilized, the autoscaler empties and removes them. From your application’s perspective this is completely transparent: you just submit a pod definition with your resource requirements and Kubernetes fulfills them, regardless of the underlying cloud.

In practice, when you define a pod template in the ClickHouse operator manifest, you can add a nodeSelector using a well-known Kubernetes label called node.kubernetes.io/instance-type. Set it to the VM type you want, say m6i.large, and that pod will only run on a machine of that type. If one doesn’t exist, the autoscaler creates one.

The more profound benefit is transparent vertical scaling. If you decide you need double the CPU and RAM, just change the node selector value to m6i.xlarge. Kubernetes will allocate an m6i.xlarge, move ClickHouse to it, and scale down the old VM. This lets you develop on low-powered machines and scale to larger ones as your production workload grows. Always verify the result with a kubectl get nodes command to confirm the actual VM types running.

[21:43] – Trick 4: Spread ClickHouse Servers Across Availability Zones Using Affinity

Robert: Tip number four: spread ClickHouse servers over availability zones using affinity and anti-affinity.

Affinity and anti-affinity are powerful Kubernetes concepts for controlling where pods run. Anti-affinity means “don’t run me near certain other pods.” Affinity means “run me in this specific location.” Without these settings, Kubernetes might schedule both replicas of your ClickHouse cluster on the same VM in the same data center. If that VM goes down, your entire cluster goes down with it. Replicas are only valuable if they’re in genuinely separate places.

The operator provides a compact zone syntax for this. You define multiple pod templates, one for each availability zone you want to use. Each template specifies the zone it must run in, for example us-west-2a or us-west-2b, and includes a distribution: OnePerHost rule so no two ClickHouse servers from the same cluster can share a VM. The cluster definition then assigns specific shards and replicas to specific pod templates.

Always verify this worked. When I was preparing this example, I made a simple typing mistake that left the pods incorrectly configured. A kubectl get nodes command will show you which AZ each worker node is in, and a subsequent command to show which pods are on which nodes confirms they’re in the right places. You can also write the full Kubernetes affinity and anti-affinity rule blocks directly in the operator manifests if you need custom conditions beyond what the zone syntax provides.

[26:41] – Trick 5: Turn Off Compute Using the Stop Property

Robert: Tip number five: turn off compute using the stop property. When you’re developing, you sleep, you eat, you’re not always working. It’s convenient to turn off as much of your cluster as possible when not in use so you’re not paying for idle VMs.

The operator has a simple stop: yes property. Apply it and the operator shuts down all pods in the cluster. Under the covers it sets the replica count on all StatefulSets to zero, a well-known Kubernetes trick for turning off compute. The key thing is that the PersistentVolumeClaims and PersistentVolumes remain allocated. Your storage survives untouched.

To bring things back, remove the stop property or change it to no and apply again. The operator will kick off autoscaling if configured and bring the cluster back up, typically in a few minutes. This is a cheap and convenient technique for development environments.

[29:37] – Trick 6: Force Operator Reconciliation with Task ID

Robert: Tip number six: force the operator to do reconciliation even when it thinks it might not need to.

Reconciliation is fundamental to how Kubernetes works. When you apply a ClickHouseInstallation manifest that has changed, the operator gets a change event, looks at the new definition versus the currently running resources, and adjusts them to match. But there are cases where the underlying resources have changed externally without the manifest itself changing, and you’d like the operator to re-examine everything.

Password rotation is a good example. With the 0.23 release of the operator, there’s a clean syntax for pulling passwords from Kubernetes Secrets, so you don’t have to hard-code them. When you rotate a password, you regenerate the Secret with a new value. The operator may not notice that the Secret changed. To force it to re-examine everything, you use a property called taskId. Set it to any value. When the manifest is re-applied with a changed taskId, the operator does a full reconciliation and the password update propagates to the pod’s configuration files. Importantly, this does not cause ClickHouse to restart: the operator is smart enough to know a password change doesn’t require a restart.

You can also use a kubectl patch command to change just the taskId without modifying the rest of the manifest, which is useful when you want to re-trigger reconciliation from a script. This is also the right tool when an upgrade or configuration change has timed out and you want the operator to finish what it started.

[34:27] – Trick 7: Upgrade ClickHouse Automatically with a Rolling Upgrade

Robert: Tip number seven: upgrade ClickHouse automatically. This is actually the easiest trick of all.

Pod templates in the ClickHouseInstallation manifest contain a pod specification with a container image name. To upgrade ClickHouse, simply change that image name to a new version, for example from 23.3 to the next release, and apply the manifest. The operator reconciles and performs a rolling upgrade of all servers in your cluster.

Starting from operator version 0.22, and even better in 0.23, the upgrade process is intelligent. The operator first upgrades a single server to verify the new version starts correctly. Once that succeeds, it upgrades shards in parallel: for a six-shard, three-replica cluster you’ll see multiple replicas upgrading simultaneously. The operator will never upgrade more than one replica per shard at once, preserving availability. There is a timeout period during which existing queries are allowed to drain before the upgrade proceeds. As long as you’re not running extremely long queries, upgrades in a properly replicated system will complete without visible downtime for your applications.

[36:44] – Trick 8: Run ClickHouse Keeper with the Altinity Operator

Robert: Tip number eight: run ClickHouse Keeper with the Altinity Operator. This is a big one.

Historically, if you needed ClickHouse Keeper or ZooKeeper for replication and distributed DDL, you would manage those separately. We provided example deployment manifests you could copy and run, but there was nothing special about how we helped you manage them. That’s changed.

In the 0.23 release of the operator, we introduced a new custom resource type: ClickHouseKeeperInstallation. This is a dedicated resource for managing ClickHouse Keeper clusters, with the same operator convenience that ClickHouseInstallation provides for ClickHouse itself.

Why a separate resource type? Because for production, you want Keeper on its own dedicated VMs, not sharing resources with ClickHouse. Keeper requires fast disk, low latency, and consistent CPU availability. Sharing with ClickHouse degrades its performance.

The configuration is simple. Specify three replicas (three is the standard production ensemble: one can go down and the cluster stays available). You can leave all settings at defaults and they’ll be sane, or tune them explicitly. The operator will then manage those Keeper pods, their storage, and their lifecycle just like it does for ClickHouse. You can find examples in the ClickHouse operator GitHub project.

[39:28] – Getting Help: Altinity.Cloud® and Enterprise Support

Robert: All eight tricks I’ve shown you are built into Altinity.Cloud®. We manage everything: the Kubernetes infrastructure, the ClickHouse clusters, upgrades, scaling, monitoring, and backup. We can run it in our own SaaS in our accounts on Amazon, Google, or Azure, or we can manage clusters running inside your own Kubernetes environment via a simple plugin installation. Either way you get built-in enterprise support from engineers who have managed ClickHouse clusters for years.

Some things you get in Altinity.Cloud that are particularly valuable: monitoring with built-in dashboards and alerting, proactive response to issues, and increasingly powerful automation including the ability to spin up an EKS, GKE, or AKS cluster from scratch, install all the software, and connect it to Altinity.Cloud through a largely automated process.

[43:00] – Q&A

Leonard asks: I work on a small team with no Kubernetes experience and have three large servers. Should I set up Kubernetes or manage ClickHouse installations manually?

Robert: Great question, and it really depends on your situation. If you have a relatively stable workload and the machines are beefy, the simplest thing is to set them up with packages on Ubuntu or Red Hat and run ClickHouse directly on the operating system. No reason to introduce Kubernetes. If you need to change the configuration later, like add more machines, Kubernetes makes that easier, but since you’re going to be laying hands on the machines anyway to set them up initially, there’s nothing wrong with skipping Kubernetes. If you do choose Kubernetes, use managed Kubernetes. 90% of the people we work with do. Managing Kubernetes yourself is significantly harder than running apps on it.

What’s the best way to learn Kubernetes?

Robert: Three options: check the tutorials at kubernetes.io; look at Kelsey Hightower’s classic walkthrough of building Kubernetes from scratch, which is the best way to truly understand how it works; or look at our tutorials. For hands-on practice, use minikube. It runs on any Linux machine, installs in a few minutes, and virtually everything I’ve shown today will run on it. You don’t need to pay for anything in the cloud to try it out.

Can PersistentVolumes be backed by S3 or Azure Blob storage?

Robert: In general, no. Kubernetes manages block storage through PersistentVolumeClaims, not object storage. However, ClickHouse has its own storage policy configuration that allows you to use S3 as a storage tier for MergeTree tables. Look up storage policies in the ClickHouse documentation. You configure this in the ClickHouse server config, not at the Kubernetes level.

Joe asks: Does using the taskId trick mean passwords get changed every time a CRD gets pushed?

Robert: No. If the Secret values haven’t changed, the operator will see that nothing is different and won’t re-apply the password. The taskId forces a full reconciliation, but the operator is smart enough to only make changes where the actual values have changed. If you changed the Secret and then push a new taskId, yes, the new password will be applied.

Leonard asks: Can the operator manage tiered storage where older parts live on S3 and newer parts stay close to compute?

Robert: Right now the operator doesn’t have dedicated tiered storage management, but this is the next frontier for us and we’re working on it. In the meantime, you can already configure tiered storage by including the relevant ClickHouse configuration files directly in your ClickHouseInstallation manifest. Look at the operator samples for examples of how to include custom configuration files.

Michael asks: When do you anticipate Terraform modules for other major cloud providers like GCP and Azure?

Robert: We have proprietary Altinity Cloud provider Terraform support for these already, and we’re rolling out public modules. GKE will likely be the next one given our experience with it, and Azure will follow. If you have specific timing needs, ping us directly and we can give you a better date.

On ClickHouse Keeper versus ZooKeeper: what’s the recommendation?

Robert: If you’re building a new system today, go with ClickHouse Keeper. Keeper is now very stable and widely tested. ZooKeeper itself is no longer actively evolving, and the ClickHouse core team has indicated that over time, as ClickHouse Keeper adds features not available in ZooKeeper, newer versions of ClickHouse will eventually require Keeper. Start with Keeper to avoid a migration later.

On backup with the operator: is there a CRD for seamless backup and restore?

Robert: We don’t have a backup CRD in the operator itself, but we do maintain Altinity Backup for ClickHouse®, which is the most popular third-party ClickHouse backup tool and is used by hundreds of installations. The way to use it with Kubernetes is to run it as a sidecar container alongside ClickHouse. There’s an example in the clickhouse-backup project showing exactly how to add it to your ClickHouseInstallation manifest. One note: ClickHouse doesn’t support point-in-time recovery the way traditional RDBMS systems do. To achieve PITR, load your data through Kafka, back up the Kafka consumer group offsets alongside your data backup, and reset those offsets when you restore.

On migrating from ZooKeeper to ClickHouse Keeper:

Robert: There is an in-place migration, but it requires downtime. Many people have done it and it’s fairly stable. If you’re starting today, definitely just start with Keeper to avoid having to do this migration later.

On migrating ClickHouse from EC2 to Kubernetes:

Robert: The cleanest approach is replication-based migration. Set up a replica cluster inside Kubernetes, join it to your existing cluster, switch new inserts to the Kubernetes cluster, let replication catch up data from EC2, verify everything matches, and then decommission the EC2 instances. We can also help with this directly if you want support.

Thank you all for attending and for such great questions. Please come find us at altinity.com, join our Slack, connect with us on LinkedIn, or ping me directly if you’re going to be at KubeCon EU in Paris.

FAQ Section

What is the ClickHouseInstallation custom resource and why is it important? The ClickHouseInstallation (CHI) custom resource is the core abstraction provided by the Altinity Kubernetes Operator for ClickHouse®. It allows you to define an entire ClickHouse cluster, including its topology, pod specifications, container images, storage configuration, and affinity rules, as a single YAML manifest. Once applied to Kubernetes with kubectl, the operator reconciles the actual cluster state to match your definition. This approach eliminates the need to manually manage stateful sets, persistent volume claims, services, and configuration maps, which are the many underlying resources a ClickHouse cluster requires.

What is the stop property and how does it help with development costs? The stop: yes property in a ClickHouseInstallation manifest instructs the Altinity Operator to shut down all pods in the cluster by setting StatefulSet replica counts to zero. PersistentVolumeClaims and PersistentVolumes remain allocated, so your data is preserved. When you’re ready to resume, set stop: no and apply the manifest again. The cluster comes back in a few minutes with all data intact. This is particularly useful for development clusters that you only need during working hours, as it eliminates VM costs during nights and weekends without losing any data.

How does the Altinity Operator implement rolling upgrades? A rolling upgrade is triggered simply by changing the container image name in the pod template of your ClickHouseInstallation manifest. The operator first upgrades one server to validate the new version. If that succeeds, it upgrades the remaining shards in parallel, never taking more than one replica per shard offline at once to preserve availability. The operator waits for in-flight queries to drain before restarting each pod. In a properly replicated cluster with reasonable query durations, this process completes without visible downtime for connected applications.

Why should I use ClickHouse Keeper instead of ZooKeeper for new deployments? ClickHouse Keeper is a C++ reimplementation of the ZooKeeper coordination protocol, purpose-built for ClickHouse. It is now stable, widely tested, and the recommended choice for all new deployments. ZooKeeper is no longer actively evolving, and the ClickHouse team has indicated that future ClickHouse versions may add Keeper-specific features that ZooKeeper cannot support. Starting with Keeper today avoids the need to perform an in-place migration later, which requires downtime.

What is the taskId property and when should I use it? The taskId property in a ClickHouseInstallation manifest forces the operator to perform a full reconciliation, even if no other fields in the manifest have changed. This is useful in two main scenarios: when you have rotated a password by updating a Kubernetes Secret (the operator may not notice the Secret changed unless you also change taskId), and when an upgrade or configuration change has timed out and you want to restart the reconciliation process. Changing taskId via a kubectl patch command is a clean, scriptable way to trigger reconciliation without modifying any operational settings.

How does availability zone spreading work with the Altinity Operator? The operator provides a compact zone syntax inside pod templates that generates the appropriate Kubernetes nodeAffinity and podAntiAffinity rules. By defining separate pod templates for each availability zone (for example us-west-2a and us-west-2b) and assigning specific replicas to specific templates, you ensure that ClickHouse replicas land in geographically separate data centers. The distribution: OnePerHost setting ensures no two ClickHouse pods from the same cluster share a VM. Always verify the result with kubectl get nodes and pod placement commands after applying the configuration.

© Altinity, Inc. Altinity®, Altinity.Cloud®, and Altinity Stable® are registered trademarks of Altinity, Inc. ClickHouse® is a registered trademark of ClickHouse, Inc.; Altinity is not affiliated with or associated with ClickHouse, Inc. Kubernetes, MySQL, and PostgreSQL are trademarks and property of their respective owners.

PRODUCTS

OPEN SOURCE SOFTWARE

CLICKHOUSE^® SOLUTIONS

Get in touch with ClickHouse experts.

Related:

Leave a Reply Cancel reply