Jan 1, 2020
Cost-efficiency and performance are critical for big data analytics. For this reason a recent blog post from ScyllaDB guys caught our attention. They collected over 500 billion data points and were able to query it with 1B rows/sec query scan performance. The test rig was a beefy and expensive packet.com cluster: 83 n2.xlarge.x86 instances, 28 cores and 384RAM each. This is a nice demo of ScyllaDB cluster management. But looking at the numbers we realized it’s not very impressive as an example of efficient analytics. We can prove that using ClickHouse.
July 10, 2019
Modern analytical databases would not exist without efficient data compression. Storage gets cheaper and more performant, but data sizes typically grow even faster. Moore’s Law for big data outperforms its analogy in hardware. In our blog we already wrote about ClickHouse compression (https://www.altinity.com/blog/2017/11/21/compression-in-clickhouse) and Low Cardinality data type wrapper (https://www.altinity.com/blog/2019/3/27/low-cardinality). In this article we will describe and test the most advanced ClickHouse encodings, which especially shine for time series data. We are proud that some of those encodings have been contributed to ClickHouse by Altinity.
This article presents an early preview of new encoding functionality for ClickHouse release 19.11. As of the time of writing, release 19.11 is not yet available. In order to test new encodings ClickHouse can be built from source, or a testing build can be installed. We expect that ClickHouse release 19.11 should be available in public releases in a few weeks.Read More
May 23, 2019
ClickHouse offers incredible flexibility to solve almost any business problem in a multiple of ways. Schema design plays a major role in this. For our recent benchmarking using the Time Series Benchmark Suite (TSBS) we replicated TimescaleDB schema in order to have fair comparisons. In that design every metric is stored in a separate column. This is the best for ClickHouse from a performance perspective, as it perfectly utilizes column store and type specialization.
Sometimes, however, schema is not known in advance, or time series data from multiple device types needs to be stored in the same table. Having a separate column per metric may be not very convenient, hence a different approach is required. In this article we discuss multiple ways to design schema for time series, and do some benchmarking to validate each approach.Read More
Dec 4, 2018
Our previous take on time series benchmarks attracted a lot of interest so we decided to dig into more details. We conducted 3 different ClickHouse scalability tests using the same TSBS dataset and benchmarking infrastructure. In this article we present results that happen to be quite interesting.
Nov 15, 2018
Once upon a time we spotted TSBS (https://github.com/timescale/tsbs) -- Time Series Benchmark Suite, started by InfluxDB engineers and polished to perfection by TimescaleDB team. The suite allows to compare apples-to-apples when testing different databases: it is a framework to generate test data, load it to different databases, run test queries, and collect statistics to analyse. We could not resist adding ClickHouse to the list of supported databases. It turned out that ClickHouse --- being a general purpose analytical DBMS -- stands very well against proven time series databases. Those benchmarks highlighted strengths and weaknesses of different technologies. Interested? Let’s dig into details.Read More
Jan 4, 2018
It's been a while since Altinity announced a partnership with Kodiak Data, a cloud-infrastructure company. Despite that, we have never written about Kodiak Data and how they help with ClickHouse deployments. Now there are several companies already using ClickHouse at Kodiak Data MemCloud(TM), so it's time to explain why. In this article, we test ClickHouse performance at various AWS and Kodiak Data cloud instances as well as add RedShift to complete the picture.
Jul 3, 2017
We continue to benchmark ClickHouse against other analytic DBMSs. We were inspired by the benchmark with star2002 experiment dataset described here, and decided to replicate it using ClickHouse. That gives another interesting comparison vs Amazon RedShift.
Apr 26, 2017
There are few ClickHouse benchmarks in the web already. Most of them use denormalized database schema. However, in denormalization is not always possible or desirable. In this article we will compare the query performance between denormalized and normalized schema where normalization is modelled using unique ClickHouse external dictionaries feature.