Delivering Better S3 Support in ClickHouse

Object storage is a game changer for analytic databases. In the best cases, it’s cheap but offers performant access to large datasets when running in public clouds. Such capabilities are of course very desirable for ClickHouse users. We would therefore like to draw your attention to a new community proposal for improving ClickHouse object storage support and encourage you to join the discussion.

ClickHouse introduced object storage for MergeTree tables in 2020, and capabilities have evolved since then. The current implementation is best described in the Double.Cloud article “How S3-based ClickHouse hybrid storage works under the hood”. We often use S3 as a synonym of object storage, but it also applies to GCS and Azure blob storage.

While S3 support has improved substantially in recent years there are still a number of problems with the current implementation:

  • Data is stored in two places: local metadata files and S3 objects.
  • Data stored in S3 is not self-contained, i.e. it is not possible to attach table stored in S3 without the local metadata data files
  • Every modification requires synchronization between 2 different non-transactional media: local metadata files on a local disk and the data itself stored in object storage. That leads to consistency problems.
  • Because of the above, zero-copy replication is also not reliable, and is known for bugs.
  • Backups are not trivial since two different sources need to be backed up separately.

ClickHouse Inc. made their own solution to this with the SharedMergeTree storage engine. Unfortunately it is not going to be released in open source. This means the ClickHouse community needs another solution that is available under an Apache 2.0 license.

Over the last few weeks we reached out to many ClickHouse contributors on this topic. Drawing from their ideas and suggestions, there is a realistic way to deliver robust object storage that builds on existing features like tiered storage and zero-copy replication. It will make storage support for MergeTree much better, while providing an upgrade path for existing ClickHouse installations. It can implemented by the open source ClickHouse community, so every ClickHouse user will be able to use object storage safely and efficiently.

For more information, see this issue on GitHub and do not hesitate to comment. We invite all ClickHouse community members to the conversation. We also want to thank all who already shared ideas. It is a pleasure to see so many people interested in making ClickHouse better.