Reducing Clickhouse Storage Cost with the LowCardinality Type – Lessons from an Instana Engineer

Reducing Clickhouse Storage Cost with the LowCardinality Type – Lessons from an Instana Engineer

At Instana we are operating our Clickhouse clusters on m5.12xlarge AWS EC2 instances and similar sized custom build Google Cloud instances. The biggest cost driver however is not the size of the instance, but the EBS volumes / persistent SSDs that are attached to the instances. Every single one of our 12TB volumes costs $1320 per month in AWS and $2,040 per month in Google Cloud.

Handling Variable Time Series Efficiently in ClickHouse

Handling Variable Time Series Efficiently in ClickHouse

May 23, 2019

ClickHouse offers incredible flexibility to solve almost any business problem in a multiple of ways. Schema design plays a major role in this. For our recent benchmarking using the Time Series Benchmark Suite (TSBS) we replicated TimescaleDB schema in order to have fair comparisons. In that design every metric is stored in a separate column. This is the best for ClickHouse from a performance perspective, as it perfectly utilizes column store and type specialization.

Sometimes, however, schema is not known in advance, or time series data from multiple device types needs to be stored in the same table. Having a separate column per metric may be not very convenient, hence a different approach is required. In this article we discuss multiple ways to design schema for time series, and do some benchmarking to validate each approach.

A Magical Mystery Tour of the LowCardinality Data Type

A Magical Mystery Tour of the LowCardinality Data Type

Mar 27, 2019

Many ClickHouse features like LowCardinality data type seem mysterious to new users.  ClickHouse often deviates from standard SQL and many data types and operations do not even exist in other data warehouses. The key to understanding is that the ClickHouse engineering team values speed more than almost any other property. Mysterious SQL expressions often turn out to be ‘secret weapons’ to achieve unmatched speed.

In fact, the LowCardinality data type is an example of just such a feature. It has been available since Q4 2018 and was marked as production ready in Feb 2019, but still is not documented, magically appearing in some documentation examples. In this article we will fill the gap  by explaining how LowCardinality works, and when it should be used.