Blog

Migrating your ClickHouse® clusters to Altinity.Cloud

Altinity.Cloud is the most mature and flexible platform for managing ClickHouse clusters. First launched in 2020, it is now available on AWS, GCP, Azure, and on-prem Kubernetes. It may be deployed as a classic SaaS service, or inside user accounts. The data layer is built using 100% open-source components, which eliminates vendor lock-in. This is very attractive, and recently we’ve seen lots of users migrating to Altinity.Cloud from other ClickHouse vendors. 

That raises the question: how do you migrate data from another ClickHouse installation? In fact, there are multiple ways to do so. Here is a short overview.

Prerequisites

In order to perform a successful data migration, the destination cluster should be properly prepared first. In particular, the following is highly recommended:

  • The ClickHouse version should be the same on the source cluster and the destination cluster.
  • If there are Named Collections used in table engines accessing external data – create them first.
  • Profile settings should be the same; missing some settings may block creating tables. Here is an easy way to see all modified settings and compare between clusters:

    SELECT name, value, default FROM system.settings WHERE changed

  • Ensure that the disk configuration (e.g., storage policies) is identical on both clusters, as differences can cause errors when creating tables.
  • Finally, if there are distributed tables that use hardcoded cluster names instead of macros, this may require some extra steps. Here is how to check:

    SELECT database, name, extract(engine_full, 'Distributed\(\'([^\']+)') cluster, engine_full
    FROM system.tables WHERE engine = 'Distributed'


    If you see {cluster} in the cluster column – you are good. Otherwise, try to create a destination cluster with the same name or define a cluster in remote_servers.xml explicitly.

Once the destination cluster is prepared, it’s time to start moving the data.

Embedded BACKUP/RESTORE

One of the easiest ways to migrate data is to use the built-in BACKUP command. It allows the user to make a backup of a table, a database or a full cluster to object storage, presumably S3. The BACKUP command is extremely simple:

BACKUP ALL TO S3('https://<bucket_name>.<region>.amazonaws.com/<backup_name>', '<access_key>','<secret_key>') ASYNC;

The process can be monitored in the system.backups table. Once backup is complete, it can be restored on Altinity.Cloud ClickHouse using a single RESTORE command:

RESTORE ALL FROM S3('https://<bucket_name>.<region>.amazonaws.com/<backup_name>', '<access_key>','<secret_key>') ASYNC;

Altinity Backup for ClickHouse®

Altinity Backup for ClickHouse® (aka clickhouse-backup after the binary) is an open-source backup management solution that we also use in Altinity.Cloud. It supports multiple back-ends and a lot of configuration options. In addition to data and schema, it can also backup and restore users. 

The clickhouse-backup binary needs to be installed on the same host as ClickHouse itself in order to use it. That means it’s not possible to use it for backing up ClickHouse managed by service providers, but it works great for on-prem installations. Here is an example of how it can be used:

S3_BUCKET=<bucket_name> \
S3_REGION=<region> \
S3_ACCESS_KEY=<access_key> \
S3_SECRET_KEY=<secret_key>  \
/bin/clickhouse-backup create_remote –rbac <bucket_name>

Of course, you can also select specific databases and tables to back up instead of just the full server.

Alternatively, you can create a local backup and ship it to an S3 bucket with a separate process. If a ClickHouse cluster is running in a highly secure environment without Internet access, this could be the only option available. 

In order to restore, you can use the Altinity.Cloud UI cluster restore wizard to an external bucket location:

See other options to restore data in the Altinity.Cloud documentation.

Altinity.Cloud Data Copy Wizard

Another way to migrate data is to use the Data Copy wizard available in Altinity.Cloud. It is built on top of the clickhouse-copier tool. We explained how to use it manually a long time ago, and since then we have added support into Altinity.Cloud as well.

Using the Data Copy wizard is simple: you need to specify the source cluster address and credentials, and select the databases or tables to migrate:

That requires the source cluster to be open to inbound Internet access.

Manual INSERT/SELECT from remote()

Finally, every ClickHouse user should be familiar with the remote() and remoteSecure() table functions. Those functions allow you to copy tables between ClickHouse hosts in any direction. For example, on a source cluster one may run:

INSERT INTO FUNCTION remoteSecure(‘my-cluster.my-environment.altinity.cloud:9440’, my_database, my_table, <user>, <pass>)
SELECT * FROM my_database.my_table;

Alternatively, it can be run on the Altinity.Cloud destination cluster:

INSERT INTO my_database.my_table
SELECT * FROM remoteSecure(‘my-clickhouse.my-service-provider.com:9440’, my_database, my_table, <user>, <pass>)


Using remoteSecure() requires a lot of manual effort. If the connection is interrupted, it does not retry automatically. Therefore it only works for a small number of small tables. A good practice is to move data by partitions, but that makes it even more difficult and may require some scripting.

Bumps on the Road

Multiple things can go wrong during data migration. For example, if you use MySQL or PostgreSQL engine tables, or external dictionaries, those typically contain sensitive data like passwords. By default, ClickHouse protects those from exposure, so data migration tools can not pick those up properly. The only tool that can reliably backup and restore such objects is clickhouse-backup, since it can copy files directly at the OS level, bypassing ClickHouse controls. For others, there are multiple knobs to turn, e.g. format_display_secrets_in_show_and_select.

While restoring a single host may not be too difficult, it gets harder for bigger clusters. For example, coordination across multiple shards may be a challenge. clickhouse-copier can do that, though.

Finally, if you restore a replicated cluster, it will take double the time. First, one replica needs to be restored, whereupon ClickHouse replicates to the other node(s).

Wrap-up

There are multiple options to migrate ClickHouse data from one host or cluster to another one. Different use cases may require different approaches. Altinity.Cloud has built-in automation to make it simpler. More importantly, our engineers have helped users migrate dozens of clusters to their new homes and can suggest the proper solution for a particular use case. Please reach out to support@altinity.com or sign up for a free Altinity.Cloud trial and we will be happy to help you!

Share

ClickHouse® is a registered trademark of ClickHouse, Inc.; Altinity is not affiliated with or associated with ClickHouse, Inc.

Table of Contents:

Related: