Squeezing JuiceFS with ClickHouse® (Part 1): Setting Things Up in Kubernetes

If you love ClickHouse®, Altinity can simplify your experience with our fully managed service in your cloud (BYOC) or our cloud and 24/7 expert support. Learn more.
JuiceFS is a POSIX-compatible filesystem which can be used over S3 object storage. It is a distributed and cloud-native file system with many features, including consistency, encryption in transit and at rest, BSD and POSIX file locks, and data compression. Every application that uses S3 needs to adapt to the object model. For more complex applications, such as ClickHouse, storing table data on S3 is hard, and native S3 support still has issues. Therefore, if you want to avoid your applications being specifically aware of how to work with S3 storage directly, JuiceFS could be a tempting choice, and if you are already using JuiceFS and ClickHouse as part of your infrastructure, combining the two is a natural choice. In this two-part blog series, we’ll walk through the configuration of JuiceFS over S3 inside Kubernetes and how to use it as a disk to store data for ClickHouse MergeTree tables. Once we set everything up, in Part 2, we’ll look at what performance you can expect and the potential problems with combining JuiceFS and ClickHouse.
The idea of using JuiceFS with ClickHouse is not new, and back in 2021, JuiceFS published a blog article, Exploring storage and computing separation for ClickHouse, which explored using JuiceFS as storage for ClickHouse MergeTree tables, and more recently, Low-Cost Read/Write Separation: Jerry Builds a Primary-Replica ClickHouse Architecture. We’ll try it out for ourselves. However, directly managing servers to run ClickHouse and JuiceFS is complex. Therefore, we’ll use Kubernetes. Specifically, we’ll combine clickhouse-operator and JuiceFS CSI (Container Storage Interface) Driver, simplifying our use of JuiceFS as the storage class for our cluster’s persistent volume claims (PVC).
I’ve chosen an economical and developer-friendly setup for our testing environment. We’ll employ self-managed Kubernetes on Hetzner Cloud with Wasabi S3 object storage.
Deploying Kubernetes cluster
I’ll use hetzner-k3s utility to set up a low-cost Kubernetes cluster quickly. The setup is the same as described in the Low-Cost ClickHouse clusters using Hetzner Cloud with Altinity.Cloud Anywhere article.
First, download and install hetzner-k3s
utility.
curl -fsSLO "https://github.com/janosmiko/hetzner-k3s/releases/latest/download/hetzner-k3s_`uname -s`_`uname -m`.deb" sudo dpkg -i "hetzner-k3s_`uname -s`_`uname -m`.deb"
hetzner-k3s -v
hetzner-k3s version v0.1.9
My Kubernetes cluster configuration will be as follows: I’ll use CPX31
servers that provide 4 vCPU with 8GB RAM in the us-east
(US Ashburn, VA). For S3, I’ll use the Wasabi S3 bucket in the us-east-1
region.
Here is my k3s_cluster.yaml
. Use your Hetzner project API token and your public and private SSH keys.
--- hetzner_token: <YOUR HETZNER PROJECT API TOKEN> cluster_name: clickhouse-cloud kubeconfig_path: "kubeconfig" k3s_version: v1.29.4+k3s1 public_ssh_key_path: "/home/user/.ssh/<YOUR PUBLIC SSH KEY>.pub" private_ssh_key_path: "/home/user/.ssh/<YOUR PRIVATE SSH KEY>" image: "ubuntu-22.04" verify_host_key: false location: ash schedule_workloads_on_masters: false masters: instance_type: cpx31 instance_count: 1 worker_node_pools: - name: clickhouse instance_type: cpx31 instance_count: 3
Now, I’ll create my Kubernetes cluster using the following command:
hetzner-k3s create-cluster -c k3s_cluster.yaml
After the cluster setup, check that all nodes are up and working. Remember to point kubectl
to our cluster’s kubeconfig
file created for the cluster in the same directory as the k3s_cluster.yaml
.
kubectl --kubeconfig ./kubeconfig get nodes
NAME STATUS ROLES AGE VERSION clickhouse-cloud-cpx31-master1 Ready control-plane,etcd,master 2m32s v1.29.4+k3s1 clickhouse-cloud-cpx31-pool-clickhouse-worker1 Ready <none> 2m20s v1.29.4+k3s1 clickhouse-cloud-cpx31-pool-clickhouse-worker2 Ready <none> 2m21s v1.29.4+k3s1 clickhouse-cloud-cpx31-pool-clickhouse-worker3 Ready <none> 2m22s v1.29.4+k3s1
Installing JuiceFS CSI driver
The best way to use JuiceFS in Kubernetes is to use the JuiceFS CSI driver. The CSI driver will allow us to create persistent volumes (PV) by defining persistent volume claims (PVC) that ClickHouse will use to store data and provides a standard way for exposing storage to applications running on Kubernetes. I’ll install the driver using the kubectl
method.
kubectl --kubeconfig kubeconfig apply -f https://raw.githubusercontent.com/juicedata/juicefs-csi-driver/master/deploy/k8s.yaml
serviceaccount/juicefs-csi-controller-sa created
serviceaccount/juicefs-csi-dashboard-sa created
serviceaccount/juicefs-csi-node-sa created
clusterrole.rbac.authorization.k8s.io/juicefs-csi-dashboard-role created
clusterrole.rbac.authorization.k8s.io/juicefs-csi-external-node-service-role created
clusterrole.rbac.authorization.k8s.io/juicefs-external-provisioner-role created
clusterrolebinding.rbac.authorization.k8s.io/juicefs-csi-dashboard-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/juicefs-csi-node-service-binding created
clusterrolebinding.rbac.authorization.k8s.io/juicefs-csi-provisioner-binding created
service/juicefs-csi-dashboard created
deployment.apps/juicefs-csi-dashboard created
statefulset.apps/juicefs-csi-controller created
daemonset.apps/juicefs-csi-node created
csidriver.storage.k8s.io/csi.juicefs.com created
Verify installation.
kubectl --kubeconfig kubeconfig -n kube-system get pods -l app.kubernetes.io/name=juicefs-csi-driver
NAME READY STATUS RESTARTS AGE
juicefs-csi-controller-0 4/4 Running 0 45s
juicefs-csi-controller-1 4/4 Running 0 34s
juicefs-csi-dashboard-58d9c54877-jrphl 1/1 Running 0 45s
juicefs-csi-node-7tsr8 3/3 Running 0 44s
juicefs-csi-node-mmxbk 3/3 Running 0 44s
juicefs-csi-node-njftm 3/3 Running 0 44s
juicefs-csi-node-rnkmz 3/3 Running 0 44s
Creating Redis cluster for JuiceFS metadata store
Having installed the JuiceFS CSI driver, we need to set up a metadata store. JuiceFS needs a metadata store because it uses a data-metadata separated architecture, with data being the file contents presented to the users and the metadata being the information describing the files and the filesystem itself, such as file attributes, filesystem structure, mapping of file contents to S3 objects, etc. The metadata store can be implemented using different databases. I’ll create a simple Redis cluster. For more information on choosing a metadata store, see Guidance on selecting metadata engine in JuiceFS.
First, let’s create a service.
--- apiVersion: v1 kind: Service metadata: name: redis-service namespace: redis labels: app: redis spec: ports: - port: 6379 clusterIP: None selector: app: redis
Then, we need to create simple configuration files that will support both Redis master and secondary nodes.
--- apiVersion: v1 kind: ConfigMap metadata: name: redis-config namespace: redis labels: app: redis data: master.conf: | maxmemory 1024mb maxmemory-policy allkeys-lru maxclients 20000 timeout 300 appendonly no dbfilename dump.rdb dir /data secondary.conf: | slaveof redis-0.redis.redis 6379 maxmemory 1024mb maxmemory-policy allkeys-lru maxclients 20000 timeout 300 dir /data
I will use a StatefulSet to define the Redis cluster. Note that master and secondary nodes will have different configurations, where the pod’s ordinal index determines master and secondary nodes. However, for testing we can get away with only 1 replica and only use 10Gi volumes for the /data
and /etc
folders.
--- apiVersion: apps/v1 kind: StatefulSet metadata: name: redis namespace: redis spec: serviceName: "redis-service" replicas: 1 selector: matchLabels: app: redis template: metadata: labels: app: redis spec: initContainers: - name: init-redis image: redis:7.2.4 command: - bash - "-c" - | set -ex # Generate redis server-id from pod ordinal index. [[ `hostname` =~ -([0-9]+)$ ]] || exit 1 ordinal=${BASH_REMATCH[1]} # Copy appropriate redis config files from config-map to respective directories. if [[ $ordinal -eq 0 ]]; then cp /mnt/master.conf /etc/redis-config.conf else cp /mnt/slave.conf /etc/redis-config.conf fi volumeMounts: - name: redis-claim mountPath: /etc - name: config-map mountPath: /mnt/ containers: - name: redis image: redis:7.2.4 ports: - containerPort: 6379 name: redis command: - redis-server - "/etc/redis-config.conf" volumeMounts: - name: redis-data mountPath: /data - name: redis-claim mountPath: /etc volumes: - name: config-map configMap: name: redis-config volumeClaimTemplates: - metadata: name: redis-claim spec: accessModes: [ "ReadWriteOnce" ] resources: requests: storage: 10Gi - metadata: name: redis-data spec: accessModes: [ "ReadWriteOnce" ] resources: requests: storage: 10Gi
I’ll also create a dedicated namespace for it.
kubectl --kubeconfig kubeconfig create namespace redis
namespace/redis created
Now, we can create our Redis cluster.
kubectl --kubeconfig kubeconfig apply -n redis -f redis_cluster.yaml
service/redis-service created configmap/redis-config created statefulset.apps/redis created
Let’s check the created resources.
kubectl --kubeconfig kubeconfig get configmap -n redis
NAME DATA AGE kube-root-ca.crt 1 3m22s redis-config 2 57s
kubectl --kubeconfig kubeconfig get svc -n redis
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE redis-service ClusterIP None <none> 6379/TCP 88s
kubectl --kubeconfig kubeconfig get statefulset -n redis NAME READY AGE redis 1/1 96s
kubectl --kubeconfig kubeconfig get pods -n redis
NAME READY STATUS RESTARTS AGE redis-0 1/1 Running 0 115s
kubectl --kubeconfig kubeconfig get pvc -n redis
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE redis-claim-redis-0 Bound pvc-0d4c5dba-67f0-43df-a8e4-d44575e0bc1b 10Gi RWO hcloud-volumes <unset> 3m57s redis-data-redis-0 Bound pvc-fc763f30-397f-455f-bd7e-c49db7a8d36e 10Gi RWO hcloud-volumes <unset> 3m57s
We can also check the master node by accessing it directly.
kubectl --kubeconfig kubeconfig exec -it redis-0 -c redis -n redis -- /bin/bash
bash-5.2# redis-cli
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:5c09f7cf01eadafe8120d2c0862cb6c69b43c438
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
Everything looks good. Our metadata store is ready!
Creating JuiceFS storage class
The last thing we need to do is create a storage class for JuiceFS. For that, first, we need to define the secrets that will specify credentials to our S3 bucket and then the storage class itself.
Here is my juicefs-sc.yaml
. You’ll need to specify your bucket, access, and secret keys.
--- apiVersion: v1 kind: Secret metadata: name: juicefs-secret namespace: default labels: juicefs.com/validate-secret: "true" type: Opaque stringData: name: ten-pb-fs metaurl: redis://redis-service.redis:6379/1 storage: s3 bucket: https://<BUCKET>.s3.<REGION>.wasabisys.com access-key: <ACCESS_KEY> secret-key: <SECRET_KEY> --- apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: juicefs provisioner: csi.juicefs.com parameters: csi.storage.k8s.io/provisioner-secret-name: juicefs-secret csi.storage.k8s.io/provisioner-secret-namespace: default csi.storage.k8s.io/node-publish-secret-name: juicefs-secret csi.storage.k8s.io/node-publish-secret-namespace: default
Create the storage class as follows.
kubectl --kubeconfig kubeconfig apply -f juicefs-sc.yaml
secret/juicefs-secret created storageclass.storage.k8s.io/juicefs created
Now, check that storage class is available.
kubectl --kubeconfig kubeconfig get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE juicefs csi.juicefs.com Delete Immediate false 5m43s
Testing JuiceFS
Having defined a storage class for JuiceFS, we can now test it by defining a persistent volume claim that we could then use in our test application pod.
Here is my juicefs-pvc-sc.yaml
that uses the new juicefs
storage class.
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: juicefs-pvc namespace: default spec: accessModes: - ReadWriteMany volumeMode: Filesystem storageClassName: juicefs resources: requests: storage: 10P
Let’s apply it.
kubectl --kubeconfig kubeconfig apply -f juicefs-pvc-sc.yaml
persistentvolumeclaim/juicefs-pvc created
Test it by creating a simple pod using the following juicefs-app.yaml
manifest.
apiVersion: v1 kind: Pod metadata: name: juicefs-app namespace: default spec: containers: - name: juicefs-app image: centos command: - sleep - "infinity" volumeMounts: - mountPath: /data name: data volumes: - name: data persistentVolumeClaim: claimName: juicefs-pvc
Start the pod and check that /data
was successfully mounted.
kubectl --kubeconfig kubeconfig apply -f juicefs-app.yaml
pod/juicefs-app created
kubectl --kubeconfig kubeconfig exec -it -n default juicefs-app -- /bin/bash
[root@juicefs-app /]# ls /data [root@juicefs-app /]#
We have /data
mounted in our pod. JuiceFS is ready for use!
Running benchmarks provided by JuiceFS
Install the juicefs
client into the pod, and let’s run the benchmarks to check that the filesystem is working as expected.
curl -sSL https://d.juicefs.com/install | sh -
Run the benchmark that uses only 1 thread.
juicefs bench -p 1 /data
Write big blocks: 1024/1024 [==============================================================] 96.9/s used: 10.566310616s Read big blocks: 1024/1024 [==============================================================] 39.9/s used: 25.661374375s Write small blocks: 100/100 [==============================================================] 11.0/s used: 9.059618651s Read small blocks: 100/100 [==============================================================] 755.4/s used: 132.409568ms Stat small files: 100/100 [==============================================================] 2566.2/s used: 39.04321ms Benchmark finished! BlockSize: 1 MiB, BigFileSize: 1024 MiB, SmallFileSize: 128 KiB, SmallFileCount: 100, NumThreads: 1 +------------------+----------------+---------------+ | ITEM | VALUE | COST | +------------------+----------------+---------------+ | Write big file | 96.91 MiB/s | 10.57 s/file | | Read big file | 39.91 MiB/s | 25.66 s/file | | Write small file | 11.0 files/s | 90.59 ms/file | | Read small file | 758.5 files/s | 1.32 ms/file | | Stat file | 2604.5 files/s | 0.38 ms/file | +------------------+----------------+---------------+
Run the benchmark that uses 4 parallel threads.
juicefs bench -p 4 /data
Write big blocks: 4096/4096 [==============================================================] 267.1/s used: 15.333303679s Read big blocks: 4096/4096 [==============================================================] 112.3/s used: 36.469782933s Write small blocks: 400/400 [==============================================================] 16.5/s used: 24.257067969s Read small blocks: 400/400 [==============================================================] 2392.4/s used: 167.231047ms Stat small files: 400/400 [==============================================================] 10742.0/s used: 37.281491ms Benchmark finished! BlockSize: 1 MiB, BigFileSize: 1024 MiB, SmallFileSize: 128 KiB, SmallFileCount: 100, NumThreads: 4 +------------------+-----------------+----------------+ | ITEM | VALUE | COST | +------------------+-----------------+----------------+ | Write big file | 267.14 MiB/s | 15.33 s/file | | Read big file | 112.32 MiB/s | 36.47 s/file | | Write small file | 16.5 files/s | 242.56 ms/file | | Read small file | 2401.8 files/s | 1.67 ms/file | | Stat file | 10901.5 files/s | 0.37 ms/file | +------------------+-----------------+----------------+
Let’s increase parallelization by 4 and re-run the benchmark with 16 threads.
juicefs bench -p 16 /data
Write big blocks: 16384/16384 [==============================================================] 307.2/s used: 53.335911256s Read big blocks: 16384/16384 [==============================================================] 331.4/s used: 49.440177355s Write small blocks: 1600/1600 [==============================================================] 48.9/s used: 32.723927882s Read small blocks: 1600/1600 [==============================================================] 3181.2/s used: 503.016108ms Stat small files: 1600/1600 [==============================================================] 9822.4/s used: 162.940025ms Benchmark finished! BlockSize: 1 MiB, BigFileSize: 1024 MiB, SmallFileSize: 128 KiB, SmallFileCount: 100, NumThreads: 16 +------------------+----------------+----------------+ | ITEM | VALUE | COST | +------------------+----------------+----------------+ | Write big file | 307.19 MiB/s | 53.34 s/file | | Read big file | 331.39 MiB/s | 49.44 s/file | | Write small file | 48.9 files/s | 327.23 ms/file | | Read small file | 3184.9 files/s | 5.02 ms/file | | Stat file | 9852.3 files/s | 1.62 ms/file | +------------------+----------------+----------------+
The benchmark shows that the JuiceFS filesystem is working, and the write and read speeds increase as we increase the number of worker threads. We are ready to combine JuiceFS with ClickHouse.
Conclusion
We have successfully set up the JuiceFS CSI driver in our Kubernetes environment and checked that the JuiceFS storage class is working by running benchmarks provided by the juicefs
client. Now, stay tuned. In Part 2, we will create a ClickHouse cluster and examine the performance you can expect when combining JuiceFS with ClickHouse using clickhouse-operator.
ClickHouse® is a registered trademark of ClickHouse, Inc.; Altinity is not affiliated with or associated with ClickHouse, Inc.