ClickHouse and S3 Compatible Object Storage

ClickHouse and S3 Compatible Object Storage

ClickHouse is a polyglot database that can talk to many external systems using dedicated engines or table functions. In modern cloud systems, the most important external system is object storage. It can hold raw data to import from or export to other systems (aka a data lake) and offer cheap and highly durable storage for table data. ClickHouse now supports both of these uses for S3 compatible object storage.

Putting Things Where They Belong Using New TTL Moves

Putting Things Where They Belong Using New TTL Moves

Multi-volume storage is crucial in many use cases. It helps to reduce storage costs as well as improves query performance by allowing placement of the most critical application data on the fastest storage devices. Monitoring data is a classic use case. The value of data degrades rapidly over time. The last day, last week, last month, and previous year data have very different access patterns, which in turn correspond to various storage needs.

Amplifying ClickHouse Capacity with Multi-Volume Storage (Part 2)

Amplifying ClickHouse Capacity with Multi-Volume Storage (Part 2)

This article is a continuation of the series describing multi-volume storage, which greatly increases ClickHouse server capacity using tiered storage. In the previous article we introduced why tiered storage is important, described multi-volume organization in ClickHouse, and worked through a concrete example of setting up disk definitions. 

Amplifying ClickHouse Capacity with Multi-Volume Storage (Part 1)

Amplifying ClickHouse Capacity with Multi-Volume Storage (Part 1)

As longtime users know well, ClickHouse has traditionally had a basic storage model.  Each ClickHouse server is a single process that accesses data located on a single storage device. The design offers operational simplicity–a great virtue–but restricts users to a single class of storage for all data. The downside is difficult cost/performance choices, especially for large clusters. 

Do-It-Yourself Multi-Volume Storage in ClickHouse

Do-It-Yourself Multi-Volume Storage in ClickHouse

Many applications have very different requirements for acceptable latencies / processing speed on different parts of the database. In time-series use cases most of your requests touch only the last day of data (‘hot’ data). Those queries should run very fast. Also a lot of background processing actions happen on the ‘hot’ data–inserts, merges, replications, and so on. Such operations should likewise be processed with the highest possible speed and without significant latencies.