This article is a continuation of the series describing multi-volume storage, which greatly increases ClickHouse server capacity using tiered storage. In the previous article we introduced why tiered storage is important, described multi-volume organization in ClickHouse, and worked through a concrete example of setting up disk definitions.
As longtime users know well, ClickHouse has traditionally had a basic storage model. Each ClickHouse server is a single process that accesses data located on a single storage device. The design offers operational simplicity–a great virtue–but restricts users to a single class of storage for all data. The downside is difficult cost/performance choices, especially for large clusters.
The latest San Francisco Bay Area ClickHouse Meetup was in Silicon Valley on August 13th. We had between 25 and 30 attendees at H2O.ai, who kindly hosted the event at their offices in Mountain View. The crowd was enthusiastic, leading to a lot of back-and-forth questions during the presentations. We had a total of three talks.
July 10, 2019
Modern analytical databases would not exist without efficient data compression. Storage gets cheaper and more performant, but data sizes typically grow even faster. Moore’s Law for big data outperforms its analogy in hardware. In our blog we already wrote about ClickHouse compression (https://altinity.com/blog/2017/11/21/compression-in-clickhouse) and Low Cardinality data type wrapper (https://altinity.com/blog/2019/3/27/low-cardinality). In this article we will describe and test the most advanced ClickHouse encodings, which especially shine for time series data. We are proud that some of those encodings have been contributed to ClickHouse by Altinity.
This article presents an early preview of new encoding functionality for ClickHouse release 19.11. As of the time of writing, release 19.11 is not yet available. In order to test new encodings ClickHouse can be built from source, or a testing build can be installed. We expect that ClickHouse release 19.11 should be available in public releases in a few weeks.
July 1, 2019
Large datasets are critical for anyone trying out or testing ClickHouse. ClickHouse is so fast that you typically need at least 100M rows to discern differences when tuning queries. Also, killer features like materialized views are much more interesting with large volumes of diverse data. Despite the importance of such datasets to ClickHouse users, there is little tooling available to help manage them easily.
June 11, 2019
The most interesting innovations in databases come from asking simple questions. For example: what if you could run ClickHouse queries without a server or attached storage? It would just be SQL queries and the rich ClickHouse function library. What would that look like? What problems could we solve with it?
We can answer the first question easily. It would look like ‘clickhouse-local’! You may not know about this handy tool, as not a lot has been written about it. A simple explanation is that ‘clickhouse-local’ turns the ClickHouse SQL query processor into a command line utility
May 23, 2019
ClickHouse offers incredible flexibility to solve almost any business problem in a multiple of ways. Schema design plays a major role in this. For our recent benchmarking using the Time Series Benchmark Suite (TSBS) we replicated TimescaleDB schema in order to have fair comparisons. In that design every metric is stored in a separate column. This is the best for ClickHouse from a performance perspective, as it perfectly utilizes column store and type specialization.
Sometimes, however, schema is not known in advance, or time series data from multiple device types needs to be stored in the same table. Having a separate column per metric may be not very convenient, hence a different approach is required. In this article we discuss multiple ways to design schema for time series, and do some benchmarking to validate each approach.
Apr 9, 2019
When I was setting up my first ClickHouse clusters 3 years ago it was like a journey to an unknown world full of caveats. ClickHouse is very simple and easy to use but not THAT simple. Sometimes I dreamed that setting up the cluster would be as easy as making a cup of coffee. It took us a while to find the right approach, but finally our dreams came true. Today, we are happy to introduce ClickHouse operator for Kubernetes!
Mar 27, 2019
Many ClickHouse features like LowCardinality data type seem mysterious to new users. ClickHouse often deviates from standard SQL and many data types and operations do not even exist in other data warehouses. The key to understanding is that the ClickHouse engineering team values speed more than almost any other property. Mysterious SQL expressions often turn out to be ‘secret weapons’ to achieve unmatched speed.
In fact, the LowCardinality data type is an example of just such a feature. It has been available since Q4 2018 and was marked as production ready in Feb 2019, but still is not documented, magically appearing in some documentation examples. In this article we will fill the gap by explaining how LowCardinality works, and when it should be used.
Oct 16, 2018
It’s been two years already ago when ClickHouse development team published an excellent blog post “How to update data in ClickHouse”. In that old times ClickHouse supported only monthly partitions, and for mutable data structures, they suggested to use pretty exotic data structures. We were all waiting for a more convenient approach, and finally, it is there: ClickHouse now supports updates in deletes! In this article, we will see how it works.
20 Sept 2018
This article shows different ways of how you can use ClickHouse in connection with other data sources to make queries use all of ClickHouse optimization features in order to make results come faster. Also, it is good practice when you have some infrastructure elements already linked to some other data sources or tools that supports ODBC.
Clickhouse-copier is a tool designed to copy data from one ClickHouse environment to another. The tool is a part of standard ClickHouse server distribution. It can work in a fully parallel mode and distribute the data in the most efficient way. In this article, we review a few typical examples when clickhouse-copier can be used.