Dive into our latest blog to explore a rigorous ClickHouse benchmark tournament, featuring AWS’s 7th-gen Intel m7i, AMD m7a, and Graviton m7g instances. With Altinity.Cloud’s easy configuration, discover which instance offers the best price-performance ratio for your cloud operations. Find your optimal AWS instance for ClickHouse deployments through our concise analysis.
Key-value pairs are widely used to organize data but can become challenging when you want to analyze data of different formats. Learn how you can normalize and extract from different formats with a single approach.
Supported in ClickHouse, Apache Parquet has use cases other than just a storage format in the Hadoop ecosystem. See what results we got when we tested it in Altinity.Cloud to query Parquet files at S3 with the same efficiency as with MergeTree tables.
AWS introduced new instance type families, powered by Graviton3 ARM processors: m7g and r7g. We tested m7g’s performance using the ClickHouse SSB workload and found it’s 35% faster its older brother m6g, and 15% faster than Intel m6i instance! Learn more.
Rick Bilodeau from Imply published an excellent article in June 2020 comparing Druid cost-efficiency to Google Big Query for what they called ‘hot analytics’ — sub-second response time with high query concurrency. We appreciate this effort and great results of an open source technology. In that article Rick used the popular Star Schema Benchmark (SSB). […]
ClickHouse SQL extensions, arrays, in particular, allow it to solve the business use case up to 100 times more efficiently than Redshift at 1/6th the cost. We know that ClickHouse is fast, but we were a bit surprised by these research results.
Being that it has been roughly three years since we last compared ClickHouse to Amazon Redshift, we thought it was time for an update. Using the same popular benchmarking dataset of NYC taxi trips data over multiple years, current size is 1.3 billion rows.
Jan 1, 2020
Cost-efficiency and performance are critical for big data analytics. For this reason a recent blog post from ScyllaDB guys caught our attention. They collected over 500 billion data points and were able to query it with 1B rows/sec query scan performance. The test rig was a beefy and expensive packet.com cluster: 83 n2.xlarge.x86 instances, 28 cores and 384RAM each. This is a nice demo of ScyllaDB cluster management. But looking at the numbers we realized it’s not very impressive as an example of efficient analytics. We can prove that using ClickHouse.
July 10, 2019
Modern analytical databases would not exist without efficient data compression. Storage gets cheaper and more performant, but data sizes typically grow even faster. Moore’s Law for big data outperforms its analogy in hardware. In our blog we already wrote about ClickHouse compression (https://altinity.com/blog/2017/11/21/compression-in-clickhouse) and Low Cardinality data type wrapper (https://altinity.com/blog/2019/3/27/low-cardinality). In this article we will describe and test the most advanced ClickHouse encodings, which especially shine for time series data. We are proud that some of those encodings have been contributed to ClickHouse by Altinity.
This article presents an early preview of new encoding functionality for ClickHouse release 19.11. As of the time of writing, release 19.11 is not yet available. In order to test new encodings ClickHouse can be built from source, or a testing build can be installed. We expect that ClickHouse release 19.11 should be available in public releases in a few weeks.
May 23, 2019
ClickHouse offers incredible flexibility to solve almost any business problem in a multiple of ways. Schema design plays a major role in this. For our recent benchmarking using the Time Series Benchmark Suite (TSBS) we replicated TimescaleDB schema in order to have fair comparisons. In that design every metric is stored in a separate column. This is the best for ClickHouse from a performance perspective, as it perfectly utilizes column store and type specialization.
Sometimes, however, schema is not known in advance, or time series data from multiple device types needs to be stored in the same table. Having a separate column per metric may be not very convenient, hence a different approach is required. In this article we discuss multiple ways to design schema for time series, and do some benchmarking to validate each approach.
Dec 4, 2018
Our previous take on time series benchmarks attracted a lot of interest so we decided to dig into more details. We conducted 3 different ClickHouse scalability tests using the same TSBS dataset and benchmarking infrastructure. In this article we present results that happen to be quite interesting.
Nov 15, 2018
Once upon a time we spotted TSBS (https://github.com/timescale/tsbs) — Time Series Benchmark Suite, started by InfluxDB engineers and polished to perfection by TimescaleDB team. The suite allows to compare apples-to-apples when testing different databases: it is a framework to generate test data, load it to different databases, run test queries, and collect statistics to analyse. We could not resist adding ClickHouse to the list of supported databases. It turned out that ClickHouse — being a general purpose analytical DBMS — stands very well against proven time series databases. Those benchmarks highlighted strengths and weaknesses of different technologies. Interested? Let’s dig into details.