Introduction to ClickHouse Backups and clickhouse-backup

ClickHouse Backups

Data backups are an inglorious but vital part of IT operations. They are most challenging in “big data” deployments, such as analytical databases. This article will explore the plumbing involved in backing up ClickHouse and introduce the clickhouse-backup tool for automating the process.

The native replication support built into ClickHouse provides high availability and resilience against individual node failures. However, rare disaster scenarios may require recovering data from backups. These include data corruption and the failure of all replicas in a shard or cluster.

A critical component of any ClickHouse backup scheme is “freezing” tables. As with all databases, consistent backups depend on ClickHouse being in a “quiesced” state. Instead of having to halt the database entirely, ClickHouse has native support for “freezing” tables for backup or migration. This is a no-downtime operation.

Manual Backups in Four Easy Steps

ClickHouse includes native support for instantaneous point-in-time backups, through its ‘ALTER TABLE… FREEZE’ feature.

  1. Confirm your shadow directory is empty:
    ls /var/lib/clickhouse/shadow/
  2. Ask ClickHouse to freeze your table:
    echo -n 'alter table events freeze' | clickhouse-client
  3. Save your backup in case of disaster:
    cd /var/lib/clickhouse/
    sudo mkdir backup
    sudo cp -r shadow/ backup/my-backup-name
  4. Finally, clean up the backup source for next time:
    sudo rm -rf /var/lib/clickhouse/shadow/*

ClickHouse uses filesystem hard links to achieve instantaneous backups with no downtime (or locking) for ClickHouse services. These hard links can be further leveraged for efficient backup storage. On filesystems that support hard links, such as local filesystems or NFS, use cp with the -l flag (or rsync with the –hard-links and –numeric-ids flags) to avoid copying data. 

When hard links are utilized, storage on disk is much more efficient. Because they rely on hard links, each backup is effectively a “full” backup, even though duplicate use of disk space is avoided.

Test Your Backup

It is rightly said that a backup is worthless if the restoration process hasn’t been tested. Perform regular test restores to ensure your data will be there when you need it.

Here are the steps for manual recovery:

  1. Drop your test table, or find another server for testing.
  2. Create your test table for recovery:
    cat events.sql | clickhouse-client
  3. Copy your backup to the table’s `detached` directory:
    cd /var/lib/clickhouse
    sudo cp -rl backup/my-backup-name/* data/default/events/detached/
  4. Attach the detached parts:
    echo 'alter table events attach partition 202006' | clickhouse-client
  5. Confirm your data has been restored:
    echo 'select count() from events' | clickhouse-client

Automate the Backup Process with clickhouse-backup

The clickhouse-backup tool, created by Alex Akulov, helps to automate the manual steps above: https://github.com/AlexAkulov/clickhouse-backup. We like clickhouse-backup and have implemented several new features, which are described here for the first time.

To get started you’ll need to install clickhouse-backup. Full instructions are in the ReadMe.md file. Here’s an example of installation from a tarball. RPMs, Debian packages, and Docker images are also available.

wget https://github.com/AlexAkulov/clickhouse-backup/releases/download/v0.5.2/clickhouse-backup.tar.gz
tar -xf clickhouse-backup.tar.gz
cd clickhouse-backup/
sudo cp clickhouse-backup /usr/local/bin
clickhouse-backup -v

The API features and new storage options like ‘remote_storage’ described in this blog article are not yet available in an official build. You either need to build from source or run the latest docker image. Here’s an example of the latter.

docker run --rm -it --network host \
  -v "/var/lib/clickhouse:/var/lib/clickhouse" \
  -e CLICKHOUSE_PASSWORD="password" \
  -e S3_BUCKET="clickhouse-backup" \
  -e S3_ACCESS_KEY="access_key" \
  -e S3_SECRET_KEY="secret" \
  alexakulov/clickhouse-backup:master --help

For the rest of the article we assume you have a build that has the new features. When used on the command line, clickhouse-backup requires a configuration file. Here is a minimal example.

$ cat /etc/clickhouse-backup/config.yml
general:
  remote_storage: none

You will need to add additional configuration options for non-default ClickHouse installations or authentication. A full config example can be created by running clickhouse-backup default-config. This is a great starting point for your use, showing all available settings.

Once configured, clickhouse-backup provides a variety of subcommands for managing backups.

$ clickhouse-backup help
NAME:
clickhouse-backup - Tool for easy backup of ClickHouse with cloud support
...
COMMANDS:
   tables          Print list of tables
   create          Create new backup
   upload          Upload backup to remote storage
   list            Print list of backups
   download        Download backup from remote storage
   restore         Create schema and restore data from backup
   delete          Delete specific backup
   default-config  Print default config
   freeze          Freeze tables
   clean           Remove data in 'shadow' folder
   server          Run API server
   help, h         Shows a list of commands or help for one command

Just like the manual backup example above, you will need to use sudo or run clickhouse-backup as the clickhouse user.

The configuration file allows for certain databases or tables to be ignored. The tables subcommand will show you which tables will be backed up:

$ clickhouse-backup tables
default.events
system.metric_log   (ignored)
system.query_log    (ignored)
system.query_thread_log (ignored)
system.trace_log    (ignored)

Creating a backup is as easy as:

$ sudo clickhouse-backup create
2020/07/06 20:13:02 Create backup '2020-07-06T20-13-02'
2020/07/06 20:13:02 Freeze `default`.`events`
2020/07/06 20:13:02 Skip `system`.`metric_log`
2020/07/06 20:13:02 Skip `system`.`query_log`
2020/07/06 20:13:02 Skip `system`.`query_thread_log`
2020/07/06 20:13:02 Skip `system`.`trace_log`
2020/07/06 20:13:02 Copy metadata
2020/07/06 20:13:02   Done.
2020/07/06 20:13:02 Move shadow
2020/07/06 20:13:02   Done.

As you can see in the example above, the backup completed within the same second.

You can review existing local backups:

$ sudo clickhouse-backup list
Local backups:
- '2020-07-06T20-13-02' (created at 06-07-2020 20:13:02)

Note that `Size` is not computed for local backups for performance reasons.

Internally, clickhouse-backup utilizes hard links when possible, as described above. The backup is stored in /var/lib/clickhouse/backup/BACKUPNAME. The backup name defaults to a timestamp, but you can optionally specify the backup name with the –name flag. The backup contains two directories: a `metadata` directory, with the DDL SQL statements necessary to recreate the schema, and a `shadow` directory with the data as a result of the ALTER TABLE ... FREEZE operation.

Restoring from a backup is also easy. For example:

$ echo 'drop table events' | clickhouse-client

$ sudo clickhouse-backup restore 2020-07-06T20-13-02
2020/07/06 20:14:46 Create table `default`.`events`
2020/07/06 20:14:46 Prepare data for restoring `default`.`events`
2020/07/06 20:14:46 ALTER TABLE `default`.`events` ATTACH PART '202006_1_1_4'
2020/07/06 20:14:46 ALTER TABLE `default`.`events` ATTACH PART '202006_2_2_2'
2020/07/06 20:14:46 ALTER TABLE `default`.`events` ATTACH PART '202006_3_3_3'
2020/07/06 20:14:46 ALTER TABLE `default`.`events` ATTACH PART '202006_4_4_3'
2020/07/06 20:14:46 ALTER TABLE `default`.`events` ATTACH PART '202006_5_5_2'
2020/07/06 20:14:46 ALTER TABLE `default`.`events` ATTACH PART '202006_6_6_1'

The restore subcommand automates both schema and data restoration. In case you only want to restore the schema, use the optional --schema flag. Or if you only want to restore the data (assuming the schema already exists), you can use the --data flag. The latter case is especially useful in restoring to a server that already has existing data.

Another useful feature is support for specifying a table pattern with most commands, such as create and restore. The --table argument allows you to backup (or restore) a specific table. You can also use a regex to, for example, target a specific database: --table=dbname.*.

Remote Backup Destinations

Of course, you could rsync your backup to a remote destination, save it to an object store like S3, or archive it using an existing backup solution. Local storage is usually insufficient to meet data durability requirements.

The clickhouse-backup tool supports uploading and downloading backups from a remote object store, such as S3, GCS, or IBM’s COS. A minimal AWS S3 configuration looks like:

s3:
  access_key: <YOUR AWS ACCESS KEY>
  secret_key: <YOUR AWS SECRET KEY>
  bucket: <YOUR BUCKET NAME>
  region: us-east-1
  path: "/some/path/in/bucket"

Once you have configured your credentials and destination bucket, clickhouse-backup can take care of the rest:

$ clickhouse-backup upload 2020-07-06T20-13-02
2020/07/07 15:22:32 Upload backup '2020-07-06T20-13-02'
2020/07/07 15:22:49   Done.

The remote backup can be downloaded to local storage before restoration:
$ sudo clickhouse-backup download 2020-07-06T20-13-02
2020/07/07 15:27:16   Done.

The clickhouse-backup config file supports backups_to_keep_local and backups_to_keep_remote settings – tune them to meet your data retention requirements. For example, set backups_to_keep_local: 7 and backups_to_keep_remote: 31 to retain a week’s worth of nightly backups locally and a month’s worth remotely. Set both to 0 to disable backup pruning.  

There is also a --diff-from option to the upload subcommand. This feature compares files to a previous local backup, only uploading new/changed files. It is essential you retain the previous backup in order to do a restore from the new backup.

Data transfer time and cost are critical aspects of remote storage. How long will it take to restore a large table to a new server? This will be largely dependent on network and storage bandwidth. It’s critical to test various recovery scenarios to have a good understanding of what recovery times you can achieve in a failure. Cost management will be important if you are using a public cloud.

Using the clickhouse-backup API

Finally, clickhouse-backup can run as a service that provides a REST API. This is a new feature. The API mirrors the command-line commands and options, and may be a more convenient way to integrate with an existing scheduling or CI/CD system.

$ sudo clickhouse-backup server &
$ curl localhost:7171/backup/list
{"Type":"local","Name":"2020-07-06T20-13-02","Size":0,"Date": "2020-07-06T20:13:02.328066165Z"}

Documentation of the API endpoints can be found here: https://github.com/AlexAkulov/clickhouse-backup#api

Using clickhouse-backup in Production

It is important to take note of the known limitations of clickhouse-backup, which are documented here: https://github.com/AlexAkulov/clickhouse-backup#limitations

In addition, the documentation contains this important warning:

Never change files permissions in /var/lib/clickhouse/backup. This path contains hard links. Permissions on all hard links to the same data on disk are always identical. That means that if you change the permissions/owner/attributes on a hard link in backup path, permissions on files with which ClickHouse works will be changed too. That might lead to data corruption.

Recovery Scenarios

Failed Replica

Failure of individual servers or nodes is by far the most common disaster scenario seen in production. In almost all cases, the failed replica should be replaced and the schema recreated. ClickHouse’s native replication will take over and ensure the replacement server is consistent. This failure scenario is worth testing in advance to understand the network and compute the impact of rebuilding the new replica.

Failed Shard

In a clustered environment, at least one replica from each shard should be backed up. The clickhouse-backup API is one approach for orchestrating backup naming and execution across a cluster.

If all replicas in a shard were to fail, or more commonly, data was corrupted, the entire shard must be restored from a backup as described above. Ideally, restore the backup to one replica, restore the schema to the others, and allow ClickHouse’s native replication take over.

Failed Cluster

A completely failed cluster, whether due to infrastructure failure or data corruption, can be restored in the same manner as a failed shard. One replica in each individual shard must be restored via the process above.

Alternate Backup Strategies

Offline Replica with Filesystem Snapshots

A common alternative is to use an “offline replica” for backups. A replica is configured (often in another region), which is not used for queries as part of any Distributed Tables. “Offline replica” should not plan to do any merges, which can be specified with ‘always_fetch_merged_part’ and ‘replica_can_become_leader’ ClickHouse MergeTree settings. While production replicas are best served by the ext4 filesystem, the backup replica uses ZFS (or another filesystem that supports snapshots.) This approach provides a quick restoration process. Note that backups, in this case, are still local to the server/node and do not necessarily provide sufficient data durability. ZFS provides directory-based filesystem access to individual snapshots, so it would be possible to automate the storage of these snapshots on a remote system or object store.

Storage-as-a-Service with Snapshots

It is common for Cloud deployments to use network-based block storage (such as AWS EBS or GCP persistent disks). Some on-prem deployments use Ceph or OpenEBS for this purpose. Each of these “storage-as-a-service” technologies supports transparent volume snapshots. By first freezing tables for backup and then creating a snapshot, you can achieve nearly instantaneous backups.

Internally, snapshots only store blocks on disk that have changed since a previous snapshot. While not a true “incremental” backup, these systems provide highly efficient use of disk space. Beware that network-based block storage is rarely as performant as local disk, and be sure to monitor snapshot retention and cost.  

Using Kafka to Improve Backups

So far we have discussed specific point-in-time backups that are created on a nightly or hourly basis (for example). Some organizations require the ability to restore to any arbitrary point in time. Because ClickHouse doesn’t have a native binary log (such as the Postgres WAL), some other mechanism is needed to “replay” the data since the last specific point-in-time backup.

Many organizations use Kafka to meet this requirement. Streaming data through Kafka into ClickHouse has many advantages for availability and fault tolerance. Another advantage is the ability to reset ingestion to any offset in the Kafka partition. When performing point-in-time backups, the Kafka offset must be stored. During recovery, the ClickHouse Kafka Engine configuration is set to the Kafka offset at the time the backup was created, and data after that point in time will be ingested.

A simple way to store Kafka offsets is to INSERT them into a table in ClickHouse that is included in the backup.  This can be done in a wrapper script that pauses ingest from Kafka, writes the current topic partition offsets, starts backup, and enables ingestion again. When you restore data, you can reset offsets for the consumer group, then re-enable ingest.  See the ClickHouse Kafka engine tutorial on this blog for an example of resetting offsets.

Next Steps

Before diving in and implementing a backup solution, take a moment to reflect on the requirements of your business and end users. Get a good understanding of Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO) for your environment. Then, consider the approaches outlined above to determine which is the best fit.

One of ClickHouse’s greatest strengths is the support and contributions of a diverse and invested community. The automation provided by clickhouse-backup is no exception: great thanks to Alex Akulov!

At Altinity we care as much about your data as you do. Contact us for help with backups, or any other ClickHouse challenge!

Share

3 Comments

  1. I would like a clarification on something, if possible.
    First of all, this is a test instance “ClickHouse server version 20.10.1.1” which lives entirely in my home folder, at ~/clickhouse, and therefore data is in ~/clickhouse/data/{data/, shadow/, metadata/, etc.}
    My goals is to load a local instance with data each day into several tables, then copy the new partitions over a WAN, and attach these to the existing data in the main database. So it’s not exactly a backup, as it is a way to compress the raw data in the most efficient format possible at collection point, each day, and transport it (a long way) to its final destination, where it joins previous entries.

    I started simple, with a single table. I followed the manual steps as listed above to freeze a single table (mydatabase.mytable) and then I created a tar copy of the data/shadow folder:
    $ [radu@ams clickhouse]$ tar -cvf shadow.tar data/shadow/
    which I uploaded to the server and tar -xf into an identically set up instance in the folder ~/clickhouse/data/data/mydatabase/mytable/detached/
    Now, this resulted in a folder “1” and an “increment.txt” file, and basically a structure like so: ./1/store/5a2/5a2078fb-9ffb-4b24-8737-671cebe56c3c/20201002_12_165_4
    where this is actually a folder containing the partition’s files (column.bin/mrk2 etc.).
    The problem is ALTER TABLE mydatabase.mytable ATTACH PARTITION 20201002 did not result in the partition being attached at all.
    I did get the partitions re-attached, but only after delving into the folders in detached/ and moving the folder 20201002_12_165_4 into it, so I had:
    ~/clickhouse/data/data/mydatabase/mytable/detached/20201002_12_165_4

    What am I missing? The folder name “./1/store/5a2/5a2078fb-9ffb-4b24-8737-671cebe56c3c” seems impossible to predict, which is probably the idea, so creating an archive of just the partition folders doesn’t seem straightforward.
    Am I doing something wrong? Thank you for your reply.

    1. Hi radu,

      The issue you face is related to the “Atomic” Database enabled by default on 20.10.

      You can find which table corresponds to which uuid by checking metadata, just like that:
      “`
      cat /var/lib/clickhouse/metadata/dbname/tablename.sql
      ATTACH TABLE _ UUID ’94fc5fac-ad13-4911-b6cc-a9ae7bf8fb89′
      (
      `number` UInt64
      )
      ENGINE = MergeTree
      ORDER BY tuple()
      SETTINGS index_granularity = 8192
      “`
      Alternatively you can use the old (‘Ordinary’) engine for the Database, in that case, all folder names will be readable. You can do it either by using explicit engine when creating database:
      “`
      CREATE DATABASE dbname2 ENGINE=Ordinary;
      “`
      Or just saying that Ordinary should be used by default (as on <= 20.10 versions) using that condig snippet: https://github.com/ClickHouse/ClickHouse/blob/master/tests/config/users.d/database_ordinary.xml

  2. Hi Mikhail,

    My tar of the shadow contents had this folder structure:
    ./ 1 / store / 5a2 / 5a2078fb-9ffb-4b24-8737-671cebe56c3c / 20201002_12_165_4

    So when extracting in the destination table’s detached folder, I simply used:
    $ tar -xf ~/incoming/backup.tar –strip-components=5
    which placed 20201002_12_165_4 directly into mytable/detached

    My only question is if we can rely on the amount of path entries to be always 5.
    I appreciate your answer by the way, thank you for the explanation! I’d rather keep the default configuration, instead of using an older db engine. I think finding the UUID, and then grabbing the first 3 (always?) letters and building the path will just complicate the backup/transfer/append script without any benefit in this case… Do you not agree?

Comments are closed.