Kafka is a popular way to stream data into ClickHouse. ClickHouse has a built-in connector for this purpose -- the Kafka engine. Our friends from Cloudfare originally contributed this engine to ClickHouse. The Kafka engine has been reworked quite a lot since then and is now maintained by Altinity developers. It is not always evident how to use it in the most efficient way, though. We tried to fill the gap with a Kafka webinar, which was a success. In this article we collected typical questions that we get in our support cases regarding the Kafka engine usage. We hope that our recommendations will help to avoid common problems.Read More
This article is a continuation of the series describing multi-volume storage, which greatly increases ClickHouse server capacity using tiered storage. In the previous article we introduced why tiered storage is important, described multi-volume organization in ClickHouse, and worked through a concrete example of setting up disk definitions.
As longtime users know well, ClickHouse has traditionally had a basic storage model. Each ClickHouse server is a single process that accesses data located on a single storage device. The design offers operational simplicity--a great virtue--but restricts users to a single class of storage for all data. The downside is difficult cost/performance choices, especially for large clusters.
June 11, 2019
The most interesting innovations in databases come from asking simple questions. For example: what if you could run ClickHouse queries without a server or attached storage? It would just be SQL queries and the rich ClickHouse function library. What would that look like? What problems could we solve with it?
We can answer the first question easily. It would look like ‘clickhouse-local’! You may not know about this handy tool, as not a lot has been written about it. A simple explanation is that ‘clickhouse-local’ turns the ClickHouse SQL query processor into a command line utilityRead More
May 3, 2019
The previous post surveyed connectivity benchmarks for ClickHouse to estimate general performance of server concurrency. In this next post we will take on real-life examples and explore concurrency performance when actual data are involved.
May 2, 2019
ClickHouse is an OLAP database for analytics, so the typical use scenario is processing a relatively small number of requests -- from several per hour to many dozens or even low hundreds per second --affecting huge ranges of data (gigabytes/millions of rows).
But how it will behave in other scenarios? Let's try to use a steam-hammer to crack nuts, and check how ClickHouse will deal with thousands of small requests per second. This will help us to understand the range of possible use cases and limitations better.
This post has two parts. The first part covers connectivity benchmarks and test setup. The next part covers maximum QPS in scenarios involving actual data.Read More
Many applications have very different requirements for acceptable latencies / processing speed on different parts of the database. In time-series use cases most of your requests touch only the last day of data (‘hot’ data). Those queries should run very fast. Also a lot of background processing actions happen on the ‘hot’ data--inserts, merges, replications, and so on. Such operations should likewise be processed with the highest possible speed and without significant latencies.