ClickHouse Kafka Engine FAQ

Kafka is a popular way to stream data into ClickHouse. ClickHouse has a built-in connector for this purpose -- the Kafka engine. Our friends from Cloudfare originally contributed this engine to ClickHouse. The Kafka engine has been reworked quite a lot since then and is now maintained by Altinity developers. It is not always evident how to use it in the most efficient way, though. We tried to fill the gap with a Kafka webinar, which was a success. In this article we collected typical questions that we get in our support cases regarding the Kafka engine usage. We hope that our recommendations will help to avoid common problems.

Read More
Amplifying ClickHouse Capacity with Multi-Volume Storage (Part 2)

This article is a continuation of the series describing multi-volume storage, which greatly increases ClickHouse server capacity using tiered storage. In the previous article we introduced why tiered storage is important, described multi-volume organization in ClickHouse, and worked through a concrete example of setting up disk definitions. 

Read More
Amplifying ClickHouse Capacity with Multi-Volume Storage (Part 1)

As longtime users know well, ClickHouse has traditionally had a basic storage model.  Each ClickHouse server is a single process that accesses data located on a single storage device. The design offers operational simplicity--a great virtue--but restricts users to a single class of storage for all data. The downside is difficult cost/performance choices, especially for large clusters. 

Read More
clickhouse-local: The power of ClickHouse SQL in a single command

June 11, 2019

The most interesting innovations in databases come from asking simple questions.  For example: what if you could run ClickHouse queries without a server or attached storage?  It would just be SQL queries and the rich ClickHouse function library. What would that look like?  What problems could we solve with it?

We can answer the first question easily.  It would look like ‘clickhouse-local’!  You may not know about this handy tool, as not a lot has been written about it.  A simple explanation is that ‘clickhouse-local’ turns the ClickHouse SQL query processor into a command line utility

Read More
ClickHouse In the Storm. Part 2: Maximum QPS for key-value lookups

May 3, 2019

The previous post surveyed connectivity benchmarks for ClickHouse to estimate general performance of server concurrency. In this next post we will take on real-life examples and explore concurrency performance when actual data are involved.

Read More
ClickHouse In the Storm. Part 1: Maximum QPS estimation

May 2, 2019

ClickHouse is an OLAP database for analytics, so the typical use scenario is processing a relatively small number of requests -- from several per hour to many dozens or even low hundreds per second --affecting huge ranges of data (gigabytes/millions of rows).

But how it will behave in other scenarios? Let's try to use a steam-hammer to crack nuts, and check how ClickHouse will deal with thousands of small requests per second. This will help us to understand the range of possible use cases and limitations better.

This post has two parts. The first part covers connectivity benchmarks and test setup. The next part covers maximum QPS in scenarios involving actual data.

Read More
Do-It-Yourself Multi-Volume Storage in ClickHouse

Many applications have very different requirements for acceptable latencies / processing speed on different parts of the database. In time-series use cases most of your requests touch only the last day of data (‘hot’ data). Those queries should run very fast. Also a lot of background processing actions happen on the ‘hot’ data--inserts, merges, replications, and so on. Such operations should likewise be processed with the highest possible speed and without significant latencies.

Read More