Boosting ClickHouse Data Lake Access: Better S3 and URL Function Proxy Support
Learn how new proxy support in ClickHouse improves efficient and secure access to data lakes using S3 storage and URL functions.
Learn how new proxy support in ClickHouse improves efficient and secure access to data lakes using S3 storage and URL functions.
S3-compatible object storage support is critical for ClickHouse applications. There is a new community proposal to make it much better.
ClickHouse often runs in a cluster, and cluster operation poses some interesting questions regarding S3 usage. They include parallelizing data load across nodes, benefits of horizontal vs. vertical scaling, and avoiding unnecessary replication. In this article, we will discuss how ClickHouse clusters can be used with S3 efficiently thanks to two important new features: the ‘s3Cluster‘ table function and zero-copy replication.
ClickHouse now fully supports both AWS S3 and MinIO as S3-compatible object storage services. In this comparison, we will test the performance of AWS S3 and MinIO when used to store table data from two of our standard datasets: the OnTime dataset; and the New York Taxi dataset.
In November 2020, Alexander Zaitsev introduced S3-compatible object storage compatibility with ClickHouse. In his article ClickHouse and S3 Compatible Object Storage, he provided steps to use AWS S3 with ClickHouse’s disk storage system and the S3 table function. Now, we are excited to announce full support for integrating with MinIO, ClickHouse’s second fully supported S3-compatible[…]
ClickHouse is a polyglot database that can talk to many external systems using dedicated engines or table functions. In modern cloud systems, the most important external system is object storage. It can hold raw data to import from or export to other systems (aka a data lake) and offer cheap and highly durable storage for table data. ClickHouse now supports both of these uses for S3 compatible object storage.