Multi-volume storage is crucial in many use cases. It helps to reduce storage costs as well as improves query performance by allowing placement of the most critical application data on the fastest storage devices. Monitoring data is a classic use case. The value of data degrades rapidly over time. The last day, last week, last month, and previous year data have very different access patterns, which in turn correspond to various storage needs.
This article is a continuation of the series describing multi-volume storage, which greatly increases ClickHouse server capacity using tiered storage. In the previous article we introduced why tiered storage is important, described multi-volume organization in ClickHouse, and worked through a concrete example of setting up disk definitions.
As longtime users know well, ClickHouse has traditionally had a basic storage model. Each ClickHouse server is a single process that accesses data located on a single storage device. The design offers operational simplicity–a great virtue–but restricts users to a single class of storage for all data. The downside is difficult cost/performance choices, especially for large clusters.
Many applications have very different requirements for acceptable latencies / processing speed on different parts of the database. In time-series use cases most of your requests touch only the last day of data (‘hot’ data). Those queries should run very fast. Also a lot of background processing actions happen on the ‘hot’ data–inserts, merges, replications, and so on. Such operations should likewise be processed with the highest possible speed and without significant latencies.