Bloom filters are an important ClickHouse index type with mysterious parameters. Take a closer look at the theory behind bloom filters, parameter selection using queries on a test dataset, and effective tuning.
We are pleased to announce a new tool for ClickHouse users: the Altinity Knowledge Base.
The ClickHouse Knowledge Base is maintained by our fantastic team of engineers here at Altinity. Here you’ll find quick answers to common questions involving ClickHouse and Altinity.Cloud.
Kafka is a popular way to stream data into ClickHouse. ClickHouse has a built-in connector for this purpose — the Kafka engine. This article collects typical questions that we get in our support cases regarding the Kafka engine usage. We hope that our recommendations will help to avoid common problems.
This article is a continuation of the series describing multi-volume storage, which greatly increases ClickHouse server capacity using tiered storage. In the previous article we introduced why tiered storage is important, described multi-volume organization in ClickHouse, and worked through a concrete example of setting up disk definitions.
As longtime users know well, ClickHouse has traditionally had a basic storage model. Each ClickHouse server is a single process that accesses data located on a single storage device. The design offers operational simplicity–a great virtue–but restricts users to a single class of storage for all data. The downside is difficult cost/performance choices, especially for large clusters.
June 11, 2019
The most interesting innovations in databases come from asking simple questions. For example: what if you could run ClickHouse queries without a server or attached storage? It would just be SQL queries and the rich ClickHouse function library. What would that look like? What problems could we solve with it?
We can answer the first question easily. It would look like ‘clickhouse-local’! You may not know about this handy tool, as not a lot has been written about it. A simple explanation is that ‘clickhouse-local’ turns the ClickHouse SQL query processor into a command line utility
May 3, 2019
The previous post surveyed connectivity benchmarks for ClickHouse to estimate general performance of server concurrency. In this next post we will take on real-life examples and explore concurrency performance when actual data are involved.
May 2, 2019
ClickHouse is an OLAP database for analytics, so the typical use scenario is processing a relatively small number of requests — from several per hour to many dozens or even low hundreds per second –affecting huge ranges of data (gigabytes/millions of rows).
But how it will behave in other scenarios? Let’s try to use a steam-hammer to crack nuts, and check how ClickHouse will deal with thousands of small requests per second. This will help us to understand the range of possible use cases and limitations better.
This post has two parts. The first part covers connectivity benchmarks and test setup. The next part covers maximum QPS in scenarios involving actual data.
Many applications have very different requirements for acceptable latencies / processing speed on different parts of the database. In time-series use cases most of your requests touch only the last day of data (‘hot’ data). Those queries should run very fast. Also a lot of background processing actions happen on the ‘hot’ data–inserts, merges, replications, and so on. Such operations should likewise be processed with the highest possible speed and without significant latencies.