Blog

Fast Track to Real-time Analytics: Introducing the Terraform AWS EKS Blueprint for ClickHouse®

Amazon Elastic Kubernetes Service (EKS) is one of the most popular managed Kubernetes platforms for running ClickHouse® applications in the cloud. Combining Terraform with ClickHouse on Amazon EKS offers a powerful and automated way to run high-performance analytics. EKS handles the housekeeping to run, upgrade, and operate the Kubernetes control plane, which schedules containers, manages application availability, and stores cluster metadata. Users can focus instead on building valuable analytic applications.

Despite the enormous benefits of AWS EKS, users still need to configure their clusters and install add-ons to start running applications on EKS. To help with that, we are introducing a new AWS EKS blueprint based on the Altinity-developed Terraform AWS EKS ClickHouse module. Designed in collaboration with the AWS EKS team, the module provides a simple and automatic deployment process to bring up a batteries-included EKS cluster with a sample ClickHouse installation based on best practices. It reduces the process to a few simple edits in a control file followed by two terraform commands.

The remainder of this article describes how the module works, how to get started, and where to go next as you dig into running ClickHouse on Kubernetes. Future articles will touch on how to use the EKS blueprint to set up Kubernetes for use by Altinity.Cloud.

Architecture

The EKS module starts by creating a VPC, then fills in IAM roles, storage resources, the Altinity Kubernetes Operator for ClickHouse, and other resources required to set up an AWS EKS cluster with a sample ClickHouse deployment application. The fully deployed architecture is shown below.

The module employs sensible defaults to provision the EKS cluster and optimize its node groups specifically for ClickHouse databases. These groups feature EBS CSI drivers for encrypted storage and multi-zone autoscaling, underpinned by the necessary IAM roles and policies. The entire setup is designed for a dedicated VPC, ensuring network isolation and internet access through an optional public load balancer, public and private subnets, an Internet gateway, and a NAT gateway.

Components

  • EKS Cluster: Uses Amazon EKS to manage Kubernetes clusters. It creates a set of node groups based on combining your instance configuration with the given availability zones.
  • VPC and Networking: Sets up a VPC with private and public subnets, an internet gateway, a NAT gateway (that can be disabled), and route tables for network isolation and internet access. Public subnets and a NAT gateway are created to grant the container images registry access.
  • IAM Roles and Policies: Defines roles and policies for the Amazon EKS cluster, node groups, and service accounts, facilitating a secure interaction with AWS services with the minimum required permissions.
  • ClickHouse: This ClickHouse cluster is designed for flexibility and high availability in multiple availability zones. It integrates with ClickHouse Keeper for cluster management/coordination, allowing external access with enhanced security. The cluster’s architecture supports high availability with a replica structure across multiple zones, ensuring fault tolerance. Storage is secure and performant, utilizing an encrypted gp3 class. The entire ClickHouse setup is done using different helm charts:
    • ClickHouse Operator: The operator facilitates the lifecycle of ClickHouse clusters, including scaling, backup, and recovery.
    • ClickHouse Cluster: Created by the Altinity Kubernetes Operator for ClickHouse, with configurations for namespace, user, and password.
    • ClickHouse Keeper: Sets up a ClickHouse Keeper for cluster coordination and configuration management, ensuring consistency.
  • Storage:
    • EBS CSI Driver: Implements the Container Storage Interface (CSI) for EBS, enabling dynamic provisioning of block storage for stateful applications.
    • Storage Classes: Defines storage classes for gp3 encrypted EBS volumes, supporting dynamic volume provisioning.
  • Cluster Autoscaler: Implements autoscaling for EKS node groups based on workload demands, ensuring efficient resource utilization in distributed availability zones.

Benefits of the EKS Blueprint

  • Streamlined deployment
    Deploying a ClickHouse cluster on Amazon EKS can seem daunting, considering the configuration required to build an application-ready cluster. The module simplifies the process to a single Terraform file, enabling the deployment of an optimized EKS cluster and node groups ready for ClickHouse, including all necessary tooling.
  • Customization with sensible defaults
    The module was created with opinionated defaults that ease the initial setup process, making it accessible for newcomers. However, it offers extensive customization options for the Amazon EKS cluster and node groups, such as scaling configurations, disk sizes, and instance types, catering to the diverse needs of advanced users.
  • Base for further automation
    Users can fork the module code and tailor it as needed to meet specific automation or security goals. There is no need to develop a new module from scratch.

Getting Started

Setting up your ClickHouse cluster on EKS is straightforward. You’ll need to install these tools: kubectl, the AWS CLI (and configure your credentials), Terraform, git, and the ClickHouse client.

After you have all these tools, you can deploy the cluster with these simple steps:

  1. Create a new directory for your project:
    $ mkdir clickhouse-cluster && cd clickhouse-cluster
  2. Setup terraform-aws-eks-clickhouse module in a main.tf file:
    $ curl https://raw.githubusercontent.com/Altinity/terraform-aws-eks-clickhouse/master/examples/default/main.tf > main.tf
  3. Adjust the properties in the main.tf file, then run the tf init command to initialize the terraform and the module
    $ terraform init
  4. Run the apply, check that everything looks ok and confirm your changes.
    $ terraform apply

That’s all that you need. Spinning up the cluster will take a few minutes to complete. Once it’s done, you can start playing with it using kubectl:

Use this command to setup the kubeconfig for the EKS cluster:
aws eks update-kubeconfig --name clickhouse-cluster --region us-east-1

Connect to your ClickHouse cluster with the clickhouse-client:
kubectl exec -it chi-eks-dev-0-0-0 -n clickhouse -- clickhouse-client

The current “Getting Started” instructions are stored in the top-level project README.md on GitHub. For the most up-to-date information, check there. You will also find different working examples in the repository.

Production Considerations

The VPC configuration balances simplicity for newcomers against security practices required for production systems. In particular, be aware that the sample does not have a full complement of security protections. Most importantly:

  • The Kubernetes API is exposed on the public Internet. 
  • Connections to ClickHouse are not encrypted. 
  • If you choose to have a load-balancer ingress, it will likewise expose ELB ports to the public Internet.
  • If you disable the NAT gateway, your EKS cluster will run on public subnets.
  • The deployment includes a replicated Keeper cluser, but without ensuring availability across multiple Availability Zones, which may impact fault tolerance.

For those planning to use this module in a production environment, we recommend the following.

  1. Make your own fork of the Terraform module code. 
  2. Review the Production Guidelines doc to learn about issues that you’ll need to take care of when running a production cluster. 
  3. Edit the terraform resources accordingly to meet your organization’s security and compliance standards.

The EKS Blueprint for ClickHouse includes detailed design docs in GitHub that will help you tailor the EKS Blueprint scripts.

Where to go next

Once you have the EKS Blueprint cluster up and running you can dig more deeply into running ClickHouse on Kubernetes. Here are some important resources to help you get started.

Wrap-up and future attractions

The Terraform EKS Blueprint for Kubernetes offers a great starting place to help users jump into deploying ClickHouse clusters on AWS EKS. We have a lot more work on tap to make it even easier to deploy real-time analytics on Kubernetes. Here’s a taste of topics we’ll be blogging about soon:

  • Curated Kubernetes examples showing how to use manifests, Helm, and/or Argo CD to deploy ClickHouse-based apps that solve common use cases. 
  • Terraform APIs for Altinity.Cloud that enable users to deploy managed ClickHouse quickly and automatically, including in your own VPCs. 

If you have further questions about the EKS Blueprint, or any aspect of our work related to ClickHouse, please don’t hesitate to contact us. You can set up an appointment for a free consultation or join our Slack channel. We look forward to hearing from you!

Share

ClickHouse® is a registered trademark of ClickHouse, Inc.; Altinity is not affiliated with or associated with ClickHouse, Inc.

Table of Contents:

Related: