May 23, 2019
ClickHouse offers incredible flexibility to solve almost any business problem in a multiple of ways. Schema design plays a major role in this. For our recent benchmarking using the Time Series Benchmark Suite (TSBS) we replicated TimescaleDB schema in order to have fair comparisons. In that design every metric is stored in a separate column. This is the best for ClickHouse from a performance perspective, as it perfectly utilizes column store and type specialization.
Sometimes, however, schema is not known in advance, or time series data from multiple device types needs to be stored in the same table. Having a separate column per metric may be not very convenient, hence a different approach is required. In this article we discuss multiple ways to design schema for time series, and do some benchmarking to validate each approach.Read More
May 21, 2019
One of our customers recently had a problem using CickHouse: the simple workflow of load-analyze-present wasn't as efficient as they were expecting. The body of the problem was with loading and presenting IPv4 and IPv6 addresses, which are traditionally stored in ClickHouse as UInt32 and FixedString(16) columns. These types have many advantages, like compact footprint and ease of comparing values. But they also have shortcomings that prompted us to seek a better solution.
May 3, 2019
The previous post surveyed connectivity benchmarks for ClickHouse to estimate general performance of server concurrency. In this next post we will take on real-life examples and explore concurrency performance when actual data are involved.
May 2, 2019
ClickHouse is an OLAP database for analytics, so the typical use scenario is processing a relatively small number of requests -- from several per hour to many dozens or even low hundreds per second --affecting huge ranges of data (gigabytes/millions of rows).
But how it will behave in other scenarios? Let's try to use a steam-hammer to crack nuts, and check how ClickHouse will deal with thousands of small requests per second. This will help us to understand the range of possible use cases and limitations better.
This post has two parts. The first part covers connectivity benchmarks and test setup. The next part covers maximum QPS in scenarios involving actual data.Read More