In November 2020, Alexander Zaitsev introduced S3-compatible object storage compatibility with ClickHouse. In his article ClickHouse and S3 Compatible Object Storage, he provided steps to use AWS S3 with ClickHouse’s disk storage system and the S3 table function. Now, we are excited to announce full support for integrating with MinIO, ClickHouse’s second fully supported S3-compatible object storage service. MinIO is an extremely high-performance, Kubernetes-native object storage service that you can now access through the S3 table function. You may also use it as one of ClickHouse’s storage disks with a similar configuration as with AWS S3.
MinIO support was originally added to ClickHouse in January 2020, starting with version 220.127.116.11. In this article, we will explain how to integrate MinIO with ClickHouse.
MinIO Using Docker
The easiest way to familiarize yourself with MinIO storage is to use a version of MinIO in a Docker container, as we will do in our examples. We will use a docker-compose cluster of ClickHouse instances, a Docker container running Apache Zookeeper to manage our ClickHouse instances, and a Docker container running MinIO for this example. To use this environment, you will need git, Docker, and docker-compose installed on your system. Then you can clone the repository that contains the test environment to your local system.
git clone https://gitlab.com/altinity-public/blogs/minio-integration-with-clickhouse.git
Next, you will need to check if you can bring up the docker-compose cluster.
Note that you must run all docker-compose commands in the
cd minio-integration-with-clickhouse cd docker-compose docker-compose up -d
Creating network "docker-compose_default" with the default driver Creating docker-compose_zookeeper_1 ... done Creating docker-compose_minio_1 ... done Creating docker-compose_minio-client_1 ... done Creating docker-compose_clickhouse1_1 ... done Creating docker-compose_clickhouse3_1 ... done Creating docker-compose_clickhouse2_1 ... done Creating docker-compose_all_services_ready_1 ... done
If the docker-compose environment starts correctly, you will see messages indicating that the
minio services are now running.
Name Command State Ports ---------------------------------------------------------------------------------------------------------------------- docker-compose_all_services_ready_1 /hello Exit 0 docker-compose_clickhouse1_1 bash -c clickhouse server ... Up (healthy) 8123/tcp, 9000/tcp, 9009/tcp docker-compose_clickhouse2_1 bash -c clickhouse server ... Up (healthy) 8123/tcp, 9000/tcp, 9009/tcp docker-compose_clickhouse3_1 bash -c clickhouse server ... Up (healthy) 8123/tcp, 9000/tcp, 9009/tcp docker-compose_minio-client_1 /bin/sh -c /usr/bin/mc co ... Up (healthy) docker-compose_minio_1 /usr/bin/docker-entrypoint ... Up (healthy) 9000/tcp, 0.0.0.0:9001->9001/tcp docker-compose_zookeeper_1 /docker-entrypoint.sh zkSe ... Up (healthy) 2181/tcp, 2888/tcp, 3888/tcp
Before we proceed, we will perform some sanity checks to ensure that MinIO is running and accessible.
Again, note that you must execute all docker-compose commands from the
First, we will check that we can use the
docker-compose exec minio-client mc -v mc version RELEASE.2021-05-12T03-10-11Z
Next, we will use
minio-client to access the
minio bucket. In the
minio-client.yml file, you may notice that the entrypoint definition will connect the client to the
minio service and create the bucket
root. This bucket can be found by listing all buckets.
docker-compose exec minio-client mc ls
Then, we will check that the three ClickHouse services are running and ready for queries.
docker-compose exec clickhouse1 bash -c 'clickhouse-client -q "SELECT version()"' docker-compose exec clickhouse2 bash -c 'clickhouse-client -q "SELECT version()"' docker-compose exec clickhouse3 bash -c 'clickhouse-client -q "SELECT version()"'
Configuring MinIO Disk Storage
To set up a MinIO storage disk, you will first need a MinIO bucket endpoint, either remote or provided through a MinIO Docker container. You also need an
secret_access_key, which correspond to the bucket. Here is an example configuration file using the local MinIO endpoint we created using Docker.
<yandex> <storage_configuration> <disks> <minio> <type>s3</type> <endpoint>http://minio:9001/root/data/</endpoint> <access_key_id>minio</access_key_id> <secret_access_key>minio123</secret_access_key> </minio> </disks> <policies> <external> <volumes> <s3> <disk>minio</disk> </s3> </volumes> </external> </policies> </storage_configuration> ... </yandex>
In this configuration file, we have one policy that includes a single volume with a single disk configured to use a MinIO bucket endpoint. Generally, in each policy, you can define multiple volumes, which is especially useful when moving data between volumes with TTL statements. You can also configure multiple disks and policies in their respective sections. However, to keep our example simple, it only contains the minimal structure required to use your MinIO bucket. For a complete guide to S3-compatible storage configuration, you may refer back to our earlier article: ClickHouse and S3 Compatible Object Storage. We have included this storage configuration file in the
configs directory, and it will be ready to use when you start the docker-compose environment.
As you can see in the repository we have provided, each local configuration file is mounted on the ClickHouse volumes in the
/etc/clickhouse-server/config.d directory. If you want to add or modify configuration files, these files can be changed in the local config.d directory and added or deleted by changing the volumes mounted in the
clickhouse-service.yml file. For those of you who are not using ClickHouse in docker-compose, you can add this storage configuration file, and all other configuration files, in your
/etc/clickhouse-server/config.d directory. You can use our docker-compose environment with your local ClickHouse instance by using the same bucket endpoint and credentials as in our configuration file. If you are using a remote MinIO bucket endpoint, make sure to replace the provided bucket endpoint and credentials with your own bucket endpoint and credentials.
The storage configuration is now ready to be used to store table data. Now you can connect to one of the ClickHouse nodes or your local ClickHouse instance. Instructions to connect to the docker-compose node are provided below.
docker-compose exec clickhouse1 bash
Then, connect to the ClickHouse client.
clickhouse client ClickHouse client version 18.104.22.168 (official build). Connecting to localhost:9000 as user default. Connected to ClickHouse server version 21.4.6 revision 54447. clickhouse1 :)
Now that you have connected to the ClickHouse client, the following steps will be the same for using a ClickHouse node in the docker-compose cluster and using ClickHouse running on your local machine. You can specify the storage policy in the
CREATE TABLE statement to start storing data on the S3-backed disk.
CREATE TABLE minio ( d UInt64 ) ENGINE = MergeTree() ORDER BY d SETTINGS storage_policy='external'
Now you are ready to insert data into the table just like any other table.
INSERT INTO minio VALUES (1),(2),(3) Query id: 4ac85ec5-5e67-4164-9fba-15ec28a28b78 Ok. 3 rows in set. Elapsed: 0.080 sec.
Once you have stored data in the table, you can confirm that the data was stored on the correct disk by checking the
SELECT disk_name FROM system.parts WHERE table='minio' Query id: 1d49d414-9dda-4f2b-9d47-f91d0b0bc9ea ┌─disk_name─┐ │ minio │ └───────────┘ 1 rows in set. Elapsed: 0.003 sec.
Note that two tables using the same storage policy will not share data. To transfer data directly from a MinIO bucket to a table, or vice versa, you can use the S3 table function.
MinIO can also be accessed directly using ClickHouse’s S3 table function with the following syntax.
s3(path, [aws_access_key_id, aws_secret_access_key,] format, structure, [compression])
To use the table function with MinIO, you will need to specify your endpoint and access credentials. Note that this time you must omit the
/ from the end of your endpoint path for proper syntax. Once again, make sure to replace the bucket endpoint and credentials with your own bucket endpoint and credentials if you are using a remote MinIO bucket endpoint.
This query will upload data to MinIO from the table we created earlier.
INSERT INTO FUNCTION s3('http://minio:9001/root/data2', 'minio', 'minio123', 'CSVWithNames', 'd UInt64') SELECT * FROM minio
Now, let’s create a new table and download the data from MinIO. Notice that we can still take advantage of the S3 table function without using the storage policy we created earlier.
CREATE TABLE minio2 ( d UInt64 ) ENGINE = MergeTree() ORDER BY d
This query will download data from MinIO into the new table.
INSERT INTO minio2 SELECT * FROM s3('http://minio:9001/root/data2', 'minio', 'minio123', 'CSVWithNames', 'd UInt64')
Let’s confirm that the data was transferred correctly by checking the contents of each table to make sure they match.
SELECT * FROM minio Query id: fd26acc7-f105-4388-84b5-80786c61f07b ┌─d─┐ │ 1 │ │ 2 │ │ 3 │ └───┘ 3 rows in set. Elapsed: 0.008 sec. SELECT * FROM minio2 Query id: e41145de-a4b4-41ba-a002-e8dd8dc9a9e1 ┌─d─┐ │ 1 │ │ 2 │ │ 3 │ └───┘ 3 rows in set. Elapsed: 0.001 sec.
Even though this is a small example, you may notice above that the query performance for
minio is slower than
minio2. The tables that use S3-compatible storage experience higher latency than local tables due to data storage in a container rather than on a local disk.
You may have noticed that MinIO storage in a local Docker container is extremely fast. Although storage in a local Docker container will always be faster than cloud storage, MinIO also outperforms AWS S3 as a cloud storage bucket. For example, after running a performance benchmark loading a dataset containing almost 200 million rows (142 GB), the MinIO bucket showed a performance improvement of nearly 40% over the AWS bucket!
In this article, we have introduced MinIO integration with ClickHouse. We reviewed how to use MinIO and ClickHouse together in a docker-compose cluster to actively store table data in MinIO, as well as import and export data directly to and from MinIO using the S3 table function. We have also briefly discussed the performance advantages of using MinIO, especially in a Docker container.