Rick Bilodeau from Imply published an excellent article in June 2020 comparing Druid cost-efficiency to Google Big Query for what they called ‘hot analytics’ — sub-second response time with high query concurrency. We appreciate this effort and great results of an open source technology. In that article Rick used the popular Star Schema Benchmark (SSB). […]
ClickHouse SQL extensions, arrays, in particular, allow it to solve the business use case up to 100 times more efficiently than Redshift at 1/6th the cost. We know that ClickHouse is fast, but we were a bit surprised by these research results.
Being that it has been roughly three years since we last compared ClickHouse to Amazon Redshift, we thought it was time for an update. Using the same popular benchmarking dataset of NYC taxi trips data over multiple years, current size is 1.3 billion rows.
Jan 1, 2020
Cost-efficiency and performance are critical for big data analytics. For this reason a recent blog post from ScyllaDB guys caught our attention. They collected over 500 billion data points and were able to query it with 1B rows/sec query scan performance. The test rig was a beefy and expensive packet.com cluster: 83 n2.xlarge.x86 instances, 28 cores and 384RAM each. This is a nice demo of ScyllaDB cluster management. But looking at the numbers we realized it’s not very impressive as an example of efficient analytics. We can prove that using ClickHouse.
July 10, 2019
Modern analytical databases would not exist without efficient data compression. Storage gets cheaper and more performant, but data sizes typically grow even faster. Moore’s Law for big data outperforms its analogy in hardware. In our blog we already wrote about ClickHouse compression (https://altinity.com/blog/2017/11/21/compression-in-clickhouse) and Low Cardinality data type wrapper (https://altinity.com/blog/2019/3/27/low-cardinality). In this article we will describe and test the most advanced ClickHouse encodings, which especially shine for time series data. We are proud that some of those encodings have been contributed to ClickHouse by Altinity.
This article presents an early preview of new encoding functionality for ClickHouse release 19.11. As of the time of writing, release 19.11 is not yet available. In order to test new encodings ClickHouse can be built from source, or a testing build can be installed. We expect that ClickHouse release 19.11 should be available in public releases in a few weeks.
May 23, 2019
ClickHouse offers incredible flexibility to solve almost any business problem in a multiple of ways. Schema design plays a major role in this. For our recent benchmarking using the Time Series Benchmark Suite (TSBS) we replicated TimescaleDB schema in order to have fair comparisons. In that design every metric is stored in a separate column. This is the best for ClickHouse from a performance perspective, as it perfectly utilizes column store and type specialization.
Sometimes, however, schema is not known in advance, or time series data from multiple device types needs to be stored in the same table. Having a separate column per metric may be not very convenient, hence a different approach is required. In this article we discuss multiple ways to design schema for time series, and do some benchmarking to validate each approach.
Dec 4, 2018
Our previous take on time series benchmarks attracted a lot of interest so we decided to dig into more details. We conducted 3 different ClickHouse scalability tests using the same TSBS dataset and benchmarking infrastructure. In this article we present results that happen to be quite interesting.
Nov 15, 2018
Once upon a time we spotted TSBS (https://github.com/timescale/tsbs) — Time Series Benchmark Suite, started by InfluxDB engineers and polished to perfection by TimescaleDB team. The suite allows to compare apples-to-apples when testing different databases: it is a framework to generate test data, load it to different databases, run test queries, and collect statistics to analyse. We could not resist adding ClickHouse to the list of supported databases. It turned out that ClickHouse — being a general purpose analytical DBMS — stands very well against proven time series databases. Those benchmarks highlighted strengths and weaknesses of different technologies. Interested? Let’s dig into details.
Jul 3, 2017
We continue to benchmark ClickHouse against other analytic DBMSs. We were inspired by the benchmark with star2002 experiment dataset described here, and decided to replicate it using ClickHouse. That gives another interesting comparison vs Amazon RedShift.
Jun 26, 2017
In this blog post, we’ll look at how ClickHouse performs in a general analytical workload using the star schema benchmark test.
Jun 19, 2017
We continue benchmarking ClickHouse. In this article we discuss a benchmark against Amazon RedShift.
Apr 26, 2017
There are few ClickHouse benchmarks in the web already. Most of them use denormalized database schema. However, in denormalization is not always possible or desirable. In this article we will compare the query performance between denormalized and normalized schema where normalization is modelled using unique ClickHouse external dictionaries feature.