ClickHouse Black Magic, Part 2: Bloom Filters

ClickHouse Black Magic, Part 2: Bloom Filters

Bloom filters are an important ClickHouse index type with mysterious parameters. Take a closer look at the theory behind bloom filters, parameter selection using queries on a test dataset, and effective tuning.

Altinity ClickHouse Knowledge Base

Altinity ClickHouse Knowledge Base

We are pleased to announce a new tool for ClickHouse users: the Altinity Knowledge Base.

The ClickHouse Knowledge Base is maintained by our fantastic team of engineers here at Altinity. Here you’ll find quick answers to common questions involving ClickHouse and Altinity.Cloud.

ClickHouse Kafka Engine FAQ
·

ClickHouse Kafka Engine FAQ

Kafka is a popular way to stream data into ClickHouse. ClickHouse has a built-in connector for this purpose — the Kafka engine. This article collects typical questions that we get in our support cases regarding the Kafka engine usage. We hope that our recommendations will help to avoid common problems.

Amplifying ClickHouse Capacity with Multi-Volume Storage (Part 2)

Amplifying ClickHouse Capacity with Multi-Volume Storage (Part 2)

This article is a continuation of the series describing multi-volume storage, which greatly increases ClickHouse server capacity using tiered storage. In the previous article we introduced why tiered storage is important, described multi-volume organization in ClickHouse, and worked through a concrete example of setting up disk definitions. 

Amplifying ClickHouse Capacity with Multi-Volume Storage (Part 1)

Amplifying ClickHouse Capacity with Multi-Volume Storage (Part 1)

As longtime users know well, ClickHouse has traditionally had a basic storage model.  Each ClickHouse server is a single process that accesses data located on a single storage device. The design offers operational simplicity–a great virtue–but restricts users to a single class of storage for all data. The downside is difficult cost/performance choices, especially for large clusters. 

clickhouse-local: The power of ClickHouse SQL in a single command

clickhouse-local: The power of ClickHouse SQL in a single command

June 11, 2019

The most interesting innovations in databases come from asking simple questions.  For example: what if you could run ClickHouse queries without a server or attached storage?  It would just be SQL queries and the rich ClickHouse function library. What would that look like?  What problems could we solve with it?

We can answer the first question easily.  It would look like ‘clickhouse-local’!  You may not know about this handy tool, as not a lot has been written about it.  A simple explanation is that ‘clickhouse-local’ turns the ClickHouse SQL query processor into a command line utility

ClickHouse In the Storm. Part 2: Maximum QPS for key-value lookups

ClickHouse In the Storm. Part 2: Maximum QPS for key-value lookups

May 3, 2019

The previous post surveyed connectivity benchmarks for ClickHouse to estimate general performance of server concurrency. In this next post we will take on real-life examples and explore concurrency performance when actual data are involved.

ClickHouse In the Storm. Part 1: Maximum QPS estimation

ClickHouse In the Storm. Part 1: Maximum QPS estimation

May 2, 2019

ClickHouse is an OLAP database for analytics, so the typical use scenario is processing a relatively small number of requests — from several per hour to many dozens or even low hundreds per second –affecting huge ranges of data (gigabytes/millions of rows).

But how it will behave in other scenarios? Let’s try to use a steam-hammer to crack nuts, and check how ClickHouse will deal with thousands of small requests per second. This will help us to understand the range of possible use cases and limitations better.

This post has two parts. The first part covers connectivity benchmarks and test setup. The next part covers maximum QPS in scenarios involving actual data.

Do-It-Yourself Multi-Volume Storage in ClickHouse

Do-It-Yourself Multi-Volume Storage in ClickHouse

Many applications have very different requirements for acceptable latencies / processing speed on different parts of the database. In time-series use cases most of your requests touch only the last day of data (‘hot’ data). Those queries should run very fast. Also a lot of background processing actions happen on the ‘hot’ data–inserts, merges, replications, and so on. Such operations should likewise be processed with the highest possible speed and without significant latencies.