Our colleague Mikhail Filimonov just published an excellent ClickHouse Kafka Engine FAQ. It provides users with answers to common questions about using stable versions, configuration parameters, standard SQL definitions, and many other topics. Even experienced users are likely to learn something new.
But what if you are getting started and need help setting up Kafka and ClickHouse for the first time? Good news! This article is for you.
Kafka is a popular way to stream data into ClickHouse. ClickHouse has a built-in connector for this purpose -- the Kafka engine. Our friends from Cloudfare originally contributed this engine to ClickHouse. The Kafka engine has been reworked quite a lot since then and is now maintained by Altinity developers. It is not always evident how to use it in the most efficient way, though. We tried to fill the gap with a Kafka webinar, which was a success. In this article we collected typical questions that we get in our support cases regarding the Kafka engine usage. We hope that our recommendations will help to avoid common problems.Read More
Mutable data is generally unwelcome in OLAP databases. ClickHouse is no exception to the rule. Like some other OLAP products, ClickHouse did not even support updates originally. Later on, updates were added, but like many other things they were added in a “ClickHouse way.”
Even now, ClickHouse updates are asynchronous, which makes them difficult to use in interactive applications. Still, in many use cases users need to apply modifications to existing data and expect to see the effect immediately. Can ClickHouse do that? Sure it can.Read More
A common use case in time series applications is to get the measurement value at a given point of time. For example, if there is a stream of measurements, one often needs to query the measurement as of current time or as of the same day yesterday and so on. Financial market data analysis and all sorts of monitoring applications are typical examples.
Databases have different ways to achieve this task and ClickHouse is not an exception here. In fact, ClickHouse offers at least 5 different approaches. In this article, we will review and compare them.Read More
Multi-volume storage is crucial in many use cases. It helps to reduce storage costs as well as improves query performance by allowing placement of the most critical application data on the fastest storage devices. Monitoring data is a classic use case. The value of data degrades rapidly over time. The last day, last week, last month, and previous year data have very different access patterns, which in turn correspond to various storage needs.
Dec 28, 2019
Grafana is a popular tool to create dashboards of time series data. It features outstanding graphics, interactive displays that zoom in on data, and support for a wide range of data sources. It turns out that one of those data sources is ClickHouse, and Grafana is a great way to visualize ClickHouse data.
This article is a continuation of the series describing multi-volume storage, which greatly increases ClickHouse server capacity using tiered storage. In the previous article we introduced why tiered storage is important, described multi-volume organization in ClickHouse, and worked through a concrete example of setting up disk definitions.
As longtime users know well, ClickHouse has traditionally had a basic storage model. Each ClickHouse server is a single process that accesses data located on a single storage device. The design offers operational simplicity--a great virtue--but restricts users to a single class of storage for all data. The downside is difficult cost/performance choices, especially for large clusters.
The latest San Francisco Bay Area ClickHouse Meetup was in Silicon Valley on August 13th. We had between 25 and 30 attendees at H2O.ai, who kindly hosted the event at their offices in Mountain View. The crowd was enthusiastic, leading to a lot of back-and-forth questions during the presentations. We had a total of three talks.Read More
July 10, 2019
Modern analytical databases would not exist without efficient data compression. Storage gets cheaper and more performant, but data sizes typically grow even faster. Moore’s Law for big data outperforms its analogy in hardware. In our blog we already wrote about ClickHouse compression (https://www.altinity.com/blog/2017/11/21/compression-in-clickhouse) and Low Cardinality data type wrapper (https://www.altinity.com/blog/2019/3/27/low-cardinality). In this article we will describe and test the most advanced ClickHouse encodings, which especially shine for time series data. We are proud that some of those encodings have been contributed to ClickHouse by Altinity.
This article presents an early preview of new encoding functionality for ClickHouse release 19.11. As of the time of writing, release 19.11 is not yet available. In order to test new encodings ClickHouse can be built from source, or a testing build can be installed. We expect that ClickHouse release 19.11 should be available in public releases in a few weeks.Read More
July 1, 2019
Large datasets are critical for anyone trying out or testing ClickHouse. ClickHouse is so fast that you typically need at least 100M rows to discern differences when tuning queries. Also, killer features like materialized views are much more interesting with large volumes of diverse data. Despite the importance of such datasets to ClickHouse users, there is little tooling available to help manage them easily.Read More
June 11, 2019
The most interesting innovations in databases come from asking simple questions. For example: what if you could run ClickHouse queries without a server or attached storage? It would just be SQL queries and the rich ClickHouse function library. What would that look like? What problems could we solve with it?
We can answer the first question easily. It would look like ‘clickhouse-local’! You may not know about this handy tool, as not a lot has been written about it. A simple explanation is that ‘clickhouse-local’ turns the ClickHouse SQL query processor into a command line utilityRead More