The Analytics Easy Button, or: How to Deploy ClickHouse® Services with Terraform, Helm, or Argo CD

Recorded: July 23 @ 08:00 am PDT
Presenter: Josh Lee, Developer Advocate @Altinity, and Robert Hodges, CEO @Altinity

In this webinar, Robert Hodges, CEO of Altinity, and Josh Lee, Developer Advocate at Altinity, present a practical guide to deploying ClickHouse® on Kubernetes using the three most common tools in the modern platform engineering stack: Terraform, Helm, and Argo CD. They frame the session around a key insight: a data platform is really just a software-as-a-service layer for your own internal teams, and building it well requires matching the right tool to the right layer of the stack.

Josh opens with a layered model of platform architecture, arguing that Terraform is best suited for semi-permanent infrastructure such as clusters, operators, and managed services; Helm is most valuable as a templating language rather than as a deployment API; and Argo CD is the right tool for deploying applications and platform services that change frequently or require multi-team access. He explains how these tools complement each other rather than compete, with Helm inflating templates and Argo applying them without creating Helm releases in the cluster.

The first demo section shows a nearly production-ready open source Terraform module that provisions an EKS cluster with two node groups, installs the Altinity Kubernetes Operator for ClickHouse®, and optionally deploys a ClickHouse cluster and ClickHouse Keeper. On top of this infrastructure, Josh deploys a toy analytics application using Argo CD: an OpenTelemetry Collector running as a DaemonSet, a ClickHouse database receiving all telemetry data via the OpenTelemetry ClickHouse exporter, and Grafana for visualization. He demonstrates live queries against OpenTelemetry metric data stored in ClickHouse, showing how SQL and map-typed resource attributes allow rich service discovery and filtering without additional tooling.

Robert then presents Altinity.Cloud® and its two deployment models: Bring Your Own Cloud (BYOC), where Altinity provisions a Kubernetes cluster in the customer’s cloud account, and Bring Your Own Kubernetes (BYOK), where Altinity installs management software into an existing Kubernetes cluster via a lightweight connector. The second demo walks through the BYOK flow: creating a new environment in Altinity.Cloud, running the altinitycloud-connect utility, piping the generated setup script through kubectl, selecting instance types and storage, and launching a new ClickHouse cluster. Robert finishes by exploring the management capabilities available in Altinity.Cloud, including cluster rescaling, version upgrades, built-in Grafana monitoring dashboards, and access control gates that let customers precisely control how much visibility Altinity support staff can have into their data.

Here are the slides:

Analytics-Easy-Button-ClickHouse-2 Download

Key Moments (Timestamps)

Key moments generated with AI assistance.

0:07 – Welcome, introductions, and housekeeping
2:39 – What is a platform? The layered architecture model
4:22 – How Terraform, Helm, and Argo CD fit together
8:45 – ClickHouse in a platform: where it fits and why
13:57 – Demo Part 1: The Altinity EKS Terraform module and the OpenTelemetry observability stack
27:44 – Why managed services? The case for Altinity.Cloud®
32:17 – Altinity.Cloud® BYOC and BYOK deployment models
39:37 – Demo Part 2: Connecting a Kubernetes cluster to Altinity.Cloud®
49:11 – Demo: Managing clusters, scaling, upgrades, and monitoring dashboards
52:52 – Q&A: Shard rebalancing, memory sizing, GPU support, and data access security

Webinar Transcript

[0:07] – Welcome, Introductions, and Housekeeping

Robert: All right, let’s get this show on the road. Welcome to our webinar on the Analytics Easy Button: how to deploy ClickHouse® services with Terraform, Helm, or Argo CD. My name is Robert Hodges. I’m CEO of Altinity and it is my delight to be doing this with Josh Lee, our Developer Advocate, who will be taking the show on the road. A couple of housekeeping items before we dive in. This is being recorded and we will send you a link to the recording and the slides within 24 hours of the webinar completing. We have plenty of time for questions. You can pop them into the chat or the Q&A box. If they’re relevant as we go, we may stop and check them out right then; otherwise we’ll batch them up and answer them at the end.

I’m Robert. I’ve been working on databases for over 40 years, on Kubernetes since 2018, and my day job is running the ClickHouse business at Altinity. We are an enterprise provider for ClickHouse: helping people build high-speed analytic systems. Our products include Altinity.Cloud®, which we’ll be showing you a bit of today, Altinity Stable® Builds for ClickHouse®, and we’re also the authors of the Altinity Kubernetes Operator for ClickHouse®, the most popular and really the only widely used operator for running ClickHouse clusters on Kubernetes. Josh, take it away.

Josh: Sure, absolutely. Thanks, Robert. I’m Josh Lee. My background is as an application developer. I’ve been doing that for over a decade, most recently working in observability, and I’m hoping to bring a bit of an application developer perspective to this platform engineering discussion.

[2:39] – What Is a Platform? The Layered Architecture Model

Josh: What’s in a platform? As I just mentioned, it’s kind of like a SaaS for our internal customers. I’m starting to think of a platform like a neighborhood: the infrastructure and utilities below the surface, and the nice neat user-facing applications above it. In reality, it’s a little more complicated. There are multiple layers beneath our applications. We have the applications that make up our platform running underneath the end-user applications, and all of that is supported by the infrastructure.

We can also think about the devops tools: Argo CD for deploying the applications, Helm for our platform application definitions, and Terraform for the base infrastructure layer. But even that’s not quite right, and we’ll dig into how we actually use these tools together in our real example.

[4:22] – How Terraform, Helm, and Argo CD Fit Together

Josh: Here’s how we actually use these tools. They don’t strictly map to those layers I just described.

Terraform is great for managing infrastructure that is semi-permanent: things that are going to be around for a while and that other things depend on. That includes VMs, clusters, the operators that run in those clusters, managed services running alongside the clusters, and observability and security services. Terraform is excellent for this because of the way it manages state. It understands how changes will affect the entire system before making them. However, that strict state management can be a pain when multiple teams are trying to use it to deploy applications. It’s not great for the application layer where things are more ephemeral and there are more people involved.

Helm is everybody’s starting point when working with Kubernetes. Helm templates are an excellent way to define applications and their components. Personally, I think the Helm API itself is somewhat redundant with the controls already built into Kubernetes and with tools like Argo CD. So in the example we’re going to show you, we’re using Helm heavily as a templating language, but we’re not using it to interact with the Kubernetes API directly and we’re not deploying Helm releases to the cluster. We’re using Argo CD and Terraform to manage all of that.

Argo CD sits at the top of our stack and is part of the API we might expose to application teams so they can deploy their own applications. We also use it for deploying platform applications like our observability stack. Argo can inflate the Helm templates for us so we can see all of the manifests that will get applied to the cluster before they actually get applied, and we can see the complete diff before running the release. Using Helm for templates and Argo for inflating and applying them is a really safe and efficient way to deploy applications to Kubernetes clusters.

The app-of-apps pattern should be familiar to most of you, but briefly: an Argo CD application is just a custom resource in Kubernetes. It’s really just a Helm chart with a templates folder and a chart definition. If each of those templates is itself an Argo app, Argo will follow the instructions in each one and deploy it from its source. You point it at a third-party Helm chart with some custom values, Argo pulls down the chart and the values, inflates the chart, and applies it to the cluster. You can drop as many apps as you want in there, or even include an entire Helm chart directly, and Argo will dutifully deploy it.

[8:45] – ClickHouse in a Platform: Where It Fits and Why

Robert: Let me just do a short detour to talk about ClickHouse, and how it would fit into a platform we might be building.

ClickHouse is basically the core of any system that is processing a large amount of data and wants to do it quickly. If you haven’t heard of ClickHouse before, it’s a real-time analytic database. I like to think of it as the offspring of the marriage of a database like MySQL, which many of you are familiar with, and a database like Vertica, which is one of the pioneering data warehouses. From the MySQL side: it understands SQL, runs practically anywhere, and is open source under Apache 2.0. From the data warehouse side: it was originally designed with a shared-nothing architecture, stores data in columns with high compression, uses vectorized execution with SIMD CPU instructions, can parallelize queries across many nodes, and scales into the petabyte range. There are clusters out there crossing 10 petabytes. It’s one of the most popular data warehouses on GitHub with 30,000 to 40,000 stars and hundreds of contributors. It’s a great core service for observability, security management, and network analytics stacks.

Josh: I really love how you can run it on anything from a Raspberry Pi to an entire cloud.

So where does ClickHouse fit in our platform? Like a good parfait, our platform has layers, and just like a real parfait, as we dig into it, those layers start to blend together. You might find ClickHouse in the top layer, the middle layer, and the bottom layer depending on who is using it and what for. You might be running it as part of your observability infrastructure, as a shared platform service for application teams, or as a core component of a product you’re building.

The key architectural insight is that different layers of the stack require different tools and different kinds of expertise. The lower down you go, the more things standardize around Kubernetes and its toolchain. The higher up you go, the more you’re dealing with custom business logic. And in the middle, where the platform and application teams meet, that’s where ClickHouse often lives: close enough to the infrastructure that Terraform is handy, but also close enough to the application that Argo might be appropriate. The choice depends on how tightly coupled ClickHouse is to the shape of your cluster and how it’s being consumed.

[13:57] – Demo Part 1: The Altinity EKS Terraform Module and the OpenTelemetry Observability Stack

Josh: Let’s get into the demo. There are two parts. The first part is nearly production-ready and is available in a public GitHub repo. This is a Terraform module for EKS and ClickHouse that several of us and our customers are using and testing. It deploys an EKS cluster with two node groups and optionally deploys a ClickHouse cluster using the Altinity Kubernetes Operator for ClickHouse® along with ClickHouse Keeper on one node group, while the other node group is available for any applications you want to run alongside ClickHouse.

The prerequisites are straightforward: Terraform, kubectl, and the AWS CLI. You don’t even need kubectl pre-configured because the module will give you the configuration after provisioning the cluster. You do need your AWS credentials. The usage is simple: specify a region, declare that you want the ClickHouse operator and a cluster, choose your networking and availability zones, pick an instance type. The m family is a great shape for ClickHouse because ClickHouse loves memory. Then set up your node pools and you’re ready to apply.

On top of that I’ve added Argo CD and used it to deploy a toy analytics application. Right now it consists of an OpenTelemetry Collector gathering metrics from the Kubernetes nodes themselves, and the ClickHouse exporter from the OpenTelemetry project to export all of that data to our ClickHouse database. We can then query it using SQL. For some people, SQL will be much more familiar than PromQL for this kind of analysis, and for things like distributed traces and logs, ClickHouse is excellent: the column-oriented storage with time-based partitioning is just perfect for trace data and log aggregates. It’s really a great replacement for Elasticsearch for log storage, and a strong choice for time series as well. One database for everything, which means you can start combining this data across signal types later on.

Looking at the cluster in k9s, we can see the OpenTelemetry Collectors running as a DaemonSet, one on each node, the ClickHouse operator, three ClickHouse pods using a significant amount of memory, ClickHouse Keeper, Grafana, the Argo CD application, AWS agents and autoscalers, and all the standard Kubernetes system services.

Drilling into the ClickHouse data in Grafana, we can see node metrics: all the pods and their RSS memory usage. We can see CPU usage for the various workloads. The OpenTelemetry resource attributes are stored in ClickHouse as a map type, which is where the power of SQL really comes in. We can pull out fields like deployment names and DaemonSet names from those maps and use them to build lists of what’s running in the cluster, essentially doing service discovery just by examining the data.

Looking at the actual table structure created by the OpenTelemetry exporter, we have separate tables for each metric type because values are stored differently, but the resource attributes column is consistent across all of them. A simple SQL query against the metric_name column immediately gives you all the different metrics available to query. From there you can build any Grafana panel you need.

The Terraform file for this is pretty much the same as the module example, with outputs configured to give you the cluster URL, the kubectl configuration, and the ClickHouse admin password so you can create additional users like a read-only user for Grafana.

Argo CD itself is deployed using the Terraform Helm provider. The credentials created by the EKS ClickHouse module are passed through so you don’t need to re-provide them. The argo-apps folder defines the individual Argo applications, each one a simple Helm chart with a templates directory. This is how Grafana and the OpenTelemetry Collector end up on the cluster.

[27:44] – Why Managed Services? The Case for Altinity.Cloud®

Robert: Many people build the entire stack themselves using these open source tools and write a glowing blog post about it. But there are substantial reasons why you might want parts of this stack managed for you. Learning curves are one. Another is the time spent developing infrastructure to manage databases: there is specialized knowledge required to recover from ClickHouse failures. For those who use ClickHouse, detached parts or read-only replicas, for example, are situations where when they occur you want them fixed fast and safely. A well-run managed service can help with all of that.

Having experts available means spending less time on the details of the technology and more time on the application that makes your business successful. Having the automation already built in reduces operational burden. And when you have big failures, having somebody else’s problem to fix, automatically or with expert backup, gets you back up and running quickly without doing further damage.

Altinity.Cloud® was launched in 2020 and runs on Amazon, GCP, Azure, and Hetzner. It’s designed for real-time applications with a lot of performance optimization built in. Enterprise support is baked in automatically: when you have an account, the support people you’re talking to have managed ClickHouse clusters for years and know all the ins and outs. The cloud runs on Kubernetes using the open-source Altinity Kubernetes Operator for ClickHouse® and supports virtually all versions and features of ClickHouse.

[32:17] – Altinity.Cloud® BYOC and BYOK Deployment Models

Robert: The question is: how do we get Altinity.Cloud® into a stack you’ve already built? There are two models.

The first is Bring Your Own Cloud (BYOC). In this model, you grant Altinity minimum privileges to come in and set up a managed Kubernetes cluster in your cloud account. If you’re on Amazon, that cluster comes up on EKS; on Google, GKE; on Azure, AKS. We then install everything necessary to manage and run ClickHouse clusters. You don’t need to be a deep Kubernetes expert; we take care of that part.

The second is Bring Your Own Kubernetes (BYOK). This is very common in large organizations that already have a customized, platform-team-managed Kubernetes distribution. Since the Kubernetes cluster already exists, we can’t create a new one. Instead, we install software directly into your existing cluster using a lightweight connector. The Altinity Cloud Connector establishes a secure outbound connection back to our management plane using QUIC. Everything stays inside your VPC.

In both models, Altinity needs two namespaces inside the Kubernetes cluster. We assign ourselves privileges only within those namespaces: an altinity-cloud-system namespace for our internal services like Prometheus, Grafana, and the connector, and a managed-clickhouse namespace where all your ClickHouse clusters live. Everything else in the cluster is yours. We also optionally set up an NLB-backed edge proxy for routing traffic into specific ClickHouse clusters.

[39:37] – Demo Part 2: Connecting a Kubernetes Cluster to Altinity.Cloud®

Robert: Let me show you this in action. I’ll connect an unmodified Kubernetes cluster to Altinity.Cloud®. Here’s the cluster right now in k9s: just a few default namespaces, nothing installed.

In Altinity.Cloud, I’ll go to Environments and create a new one. I’ll call this a bring-your-own-kubernetes environment, give it a name, and press OK. Altinity.Cloud is now ready to receive a connection from this new environment.

The altinitycloud-connect utility is what establishes the connection. I’ve already downloaded it. I’ll copy the connection command from the UI and run it from within my account. I can confirm we’re connected to the right cluster, then run the setup script that Altinity.Cloud generates, piped directly through kubectl. This creates the namespaces and the foundational services.

Now I can see two namespaces have appeared in the cluster. At this point the connector is there but the rest of the services need to be installed as well, so I can proceed through the setup wizard in Altinity.Cloud. I’ll configure storage (dismissing the default gp2 class, which I don’t want to use), select T3 Large for ClickHouse Keeper and system services, and M6i Large for the ClickHouse nodes. Then I kick off the provisioning. The edge proxy and other system services start coming up in the altinity-cloud-system namespace. Some CrashLoopBackOff states at first are expected: the edge proxy is waiting for a certificate before it can route traffic. Once that resolves, the environment will be fully operational.

That takes about 15 minutes to complete, so let me go back to our already-connected environment to show you what you can do once it’s fully set up.

[49:11] – Demo: Managing Clusters, Scaling, Upgrades, and Monitoring Dashboards

Robert: In my connected environment I can see existing clusters and launch new ones. Let me create a new one: I’ll give it a name and an admin password. I’ll accept most of the defaults; I can control the volume throughput and backup schedule. There’s a backup running at 5 a.m. every day. One fun feature: I’m going to set this cluster to automatically stop after two hours of inactivity. That means VMs stop and I don’t get billed for idle compute while this demo cluster isn’t being used.

I can now review and see the estimated cluster cost before launching. Off it goes. The ClickHouse Keeper nodes start coming up first in the cluster, and the ClickHouse pods follow shortly after.

For the cluster that’s already running, here’s the range of management actions available. Rescaling is excellent: I can change instance types, increase the number of shards, or increase storage, and just press OK. To add more shards I just change the number and they come up. For upgrades, I can pick from different builds including both upstream ClickHouse builds and our Altinity Stable® Builds for ClickHouse®, and upgrade with a click.

The monitoring section is something you really shouldn’t underestimate. Built-in Grafana dashboards show you exactly what’s going on inside your cluster: select queries, read bytes written, all the basic stats. There are more detailed dashboards we use for troubleshooting. The one I come back to most frequently is load average. When someone says their server is getting unresponsive, I come here first to see whether it’s a CPU issue, an IO wait issue, or just general overload, and when exactly it started. That’s usually the first diagnostic step.

On access control: one of the nice features is the Altinity Access gate. You can control exactly how much access Altinity support staff can have to your cluster. Full access means we can read and change data in every database. Read-only restricts us to reading. Read-only on system databases only narrows it further. You can cut off access completely, or also restrict our ability to make configuration changes. For support cases where we need to look inside, we’ll ask you to temporarily open the gate. You’re fully in control.

[52:52] – Q&A: Shard Rebalancing, Memory Sizing, GPU Support, and Data Access Security

Question from Rodrigo: Why use Terraform to deploy ClickHouse versus Argo CD?

Josh: You could absolutely do it either way and the choice depends on how you’re using ClickHouse. In this demo, I’m using ClickHouse as part of the core infrastructure for monitoring the cluster itself. It’s also tightly coupled to the shape of the cluster: the specific availability zones and node groups I want ClickHouse running on. Working within Terraform makes it easier to have that interplay between AWS resource configuration and the ClickHouse cluster configuration. But it’s totally reasonable to use Argo CD for ClickHouse if it’s part of your application stack, for example if you’re building a multi-tenant observability solution where ClickHouse is just one of the services deployed per tenant.

Robert: The general principle is: the lower down the stack you are, the better Terraform is. The higher up the stack, the better Helm and Argo CD are.

Question from the audience: If you change the number of shards, does data automatically rebalance?

Robert: No, it does not. ClickHouse does not automatically rebalance data across shards. For time series data, this often works out naturally: you design the partitioning so new data lands on new shards, and after a retention window the old shard data ages out. For other data types you can manually move data between shards. Automatic resharding is a known feature request that’s being worked on but isn’t available yet.

Question on memory: What are the memory implications for continuous deployment with Altinity.Cloud®?

Robert: Altinity.Cloud® is designed for real-time applications. VMs have a fixed allocation of RAM. If that allocation is insufficient or too much, you can rescale. The monitoring dashboards I showed make it easy to see whether you’re underprovisioned or overprovisioned for memory, and then you can just rescale and Altinity.Cloud® will handle it.

Question from the audience: Is GPU compute on the ClickHouse roadmap?

Robert: There was prototype work on this at Nvidia a few years ago, but it wasn’t integrated. GPU compute comes and goes as a request, but it’s not a high priority in the ClickHouse roadmap right now. The reason is that ClickHouse workloads are typically more IO and memory intensive than compute intensive. When you’re doing joins or aggregates, you mostly need bandwidth to load data into memory and process it. GPUs help with very compute-intensive operations, but that’s not the primary bottleneck for most ClickHouse queries.

Question from Chuck: Does Altinity have access to data stored in customer clusters?

Robert: In theory, yes, because we can see into Kubernetes. But we build in a gate. The access control setting in Altinity.Cloud® lets you define the data access level. Full access allows us to read and modify data in any database. Read-only restricts us to reading. Read-only on system databases only narrows it to just the system tables. You can cut off all data access, and you can also cut off our ability to make configuration changes. Anyone on our support team must pass through this gate, and for certain support cases we may have to ask you to temporarily open it. You’re fully in control and you can keep it shut unless you specifically need our help inside the data.

Robert: Thank you very much everybody. We hope this has been useful. Please join the Altinity Slack, come to our website, and reach out. We’ll be sending the recording and slide links in your email within the next few hours. Thank you Josh for this presentation.

Josh: Thanks everyone. We love ClickHouse and we love building on it. Come talk to us.

FAQ Section

When should I use Terraform versus Argo CD to deploy ClickHouse on Kubernetes? Terraform is the better choice when ClickHouse is part of your core infrastructure: when it’s tightly coupled to the shape of your cluster (specific availability zones, node types, autoscaling groups) or when you’re provisioning it alongside the cluster itself. Argo CD is the better choice when ClickHouse is part of your application layer: when multiple teams need access to deploy or configure it, when it changes frequently, or when you want GitOps-style reconciliation and visibility into the diff before applying changes. Many organizations use both, with Terraform managing the base cluster and operator, and Argo CD managing ClickHouse cluster definitions.

What is the app-of-apps pattern in Argo CD and why is it useful for platform deployments? The app-of-apps pattern means defining a parent Argo CD application whose templates are themselves additional Argo CD applications. When Argo reconciles the parent, it automatically discovers and deploys all child applications. This is useful for platform deployments because you can define the entire observability stack, ClickHouse clusters, and support services as a set of Argo apps in a single Git repository. Adding a new service is as simple as adding a new app definition to the directory, and Argo handles the rest.

What is the difference between Altinity.Cloud® BYOC and BYOK? BYOC (Bring Your Own Cloud) means Altinity provisions and manages a Kubernetes cluster inside your cloud account. You give Altinity specific, limited permissions, and Altinity creates the EKS, GKE, or AKS cluster and installs all the ClickHouse management software. BYOK (Bring Your Own Kubernetes) means you already have a Kubernetes cluster that Altinity cannot create. Instead, Altinity installs a lightweight connector and management software directly into your existing cluster. In both cases your data stays in your VPC and you control access. BYOK is particularly common in large organizations with platform teams that manage their own customized Kubernetes distributions.

How does Altinity.Cloud® connect to an existing Kubernetes cluster? You create a new environment in the Altinity Cloud Manager, then run the altinitycloud-connect utility locally with a connection token generated by the ACM. This establishes a secure outbound connection from your cluster to the Altinity management plane using QUIC. You then run a setup script generated by the ACM, piped through kubectl, which creates two namespaces and installs the Altinity system services. The entire process takes about 15 minutes and Altinity only needs access to those two namespaces, not to the rest of your cluster.

Why is ClickHouse a good fit for storing OpenTelemetry data? ClickHouse handles all three OpenTelemetry signals, logs, metrics, and traces, efficiently in a single database. Its columnar storage with time-based partitioning is well-suited to trace data because spans are naturally queried by time range and grouped by service. Map-typed columns in ClickHouse can store OpenTelemetry resource attributes directly, and SQL queries can extract and filter on any attribute without schema changes. ClickHouse’s compression and vectorized execution also make it significantly more cost-effective than Elasticsearch for log storage at scale.

What monitoring is built into Altinity.Cloud®? Altinity.Cloud® includes built-in Grafana dashboards that provide visibility into ClickHouse cluster health and performance including select query rates, read and write bytes, CPU usage, IO wait, load average, and memory utilization. These dashboards update automatically and are used both by customers for day-to-day monitoring and by Altinity support engineers for troubleshooting. The load average dashboard in particular is a first diagnostic step when a cluster becomes unresponsive.

© Altinity, Inc. Altinity®, Altinity.Cloud®, and Altinity Stable® are registered trademarks of Altinity, Inc. ClickHouse® is a registered trademark of ClickHouse, Inc.; Altinity is not affiliated with or associated with ClickHouse, Inc. Kubernetes, MySQL, and PostgreSQL are trademarks and property of their respective owners.

PRODUCTS

OPEN SOURCE SOFTWARE

CLICKHOUSE^® SOLUTIONS

Get in touch with ClickHouse experts.

Related:

Leave a Reply Cancel reply