Introducing AES Encryption Functions in ClickHouse

Preserving privacy is a vital topic in computer science, especially for databases. Sometimes (or most of the time actually) you don’t want strangers to peek into your own or your customers’ secrets. At other times you may be absolutely forced to prevent that from happening with any means possible by government regulations (hello GDPR and friends), which is a good thing.

The most popular way to hide secrets in the modern age is to encrypt them, so even if some nasty hackers grab a hold of the data they won’t be able to make sense of it, unless there is a way to obtain the encryption keys.

Of course, there is full-disk encryption and other ways of encrypting data at-rest, and that works fine until you boot up your server and start reading data with ClickHouse. At that point data must be decrypted by the OS (or any other intermediate) and presented to an application as plain text. So anyone* logging into a ClickHouse instance is able to read all the secrets with a simple SELECT query.

Unacceptable, but what can we do about that? Fortunately ClickHouse now has an answer!

(* Slight exaggeration for a dramatic effect, actually anyone with permission to read the data).

How to encrypt sensitive data

Starting from ClickHouse version 20.11 there is a new family of AES encryption functions developed at Altinity by popular request and with support from one of our customers.

  • encrypt(mode, plaintext, key, [iv, aad])
  • decrypt(mode, ciphertext, key, [iv, aad])
  • aes_encrypt_mysql(mode, plaintext, key, [iv])
  • aes_decrypt_mysql(mode, ciphertext, key, [iv])

The latter two are compatible with MySQL–more on that later.

Arguments:

  • mode – const String // const FixedString – name of the cipher mode
  • plaintext/ciphertext – any with data to be encrypted/decrypted
  • key – String or FixedString encryption/decryption key
  • iv – String or FixedString encryption/decryption initialization vector (for all modes except ECB)
  • AAD – String or FixedString additional authenticated data (for GCM only)

Result:

  • String a binary string – ciphertext or plaintext.

ClickHouse relies on the underlying SSL library (OpenSSL/BoringSSL) for actually doing encryption and decryption, so it is top-notch in terms of quality (no badly crafted home-made cryptography) and performance. It is worth mentioning that some versions of these SSL libraries are FIPS-180 certified.

Available modes

There are multiple ways to encrypt/decrypt the data and it is controlled by the encryption/decryption modes. For more on this check out Wikipedia.

As of now, ClickHouse supports the following modes:

  • aes-<keylen>-ecb
  • aes-<keylen>-cbc
  • aes-<keylen>-cfb128
  • aes-<keylen>-ofb
  • aes-<keylen>-ctr
  • aes-<keylen>-gcm

Where <keylen> is one of: 128192, or 256.

And it basically works just like this to encrypt secrets.

First, let’s create a secure key and iv:

$ openssl rand -hex 32
41204a63d38dcb7432c9265ba03e62a872e395057b32d4a11833311a394aaea8

$ openssl rand -hex 16
c5827b8d4e4cea195ee12d62c18d379c

Then, apply those to secure some top-secret data:

:) SELECT hex(encrypt('aes-256-gcm', 'My dirty little secret: I ate all the cookies from the jar!', unhex('41204a63d38dcb7432c9265ba03e62a872e395057b32d4a11833311a394aaea8'), 
unhex('c5827b8d4e4cea195ee12d62c18d379c'))) AS ciphertext;

┌─ciphertext─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ 74BC0316613B94EED810586A792D95FA3D813DF17CCA01AFD23EFB6D4340355FD3B80BB24D7BC235A336FC96977AACC210CF6C16E38396B038A817058F0AD9E61D9845E6453BBB969993BB │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Any attempt to decrypt the ciphertext using different modekey, or IV would result in gibberish like the following:

:) SELECT decrypt('aes-256-cfb128', unhex('74BC0316613B94EED810586A792D95FA3D813DF17CCA01AFD23EFB6D4340355FD3B80BB24D7BC235A336FC96977AACC210CF6C16E38396B038A817058F0AD9E61D9845E6453BBB969993BB'), unhex('41204a63d38dcb7432c9265ba03e62a872e395057b32d4a11833311a394aaea8'), unhex('c5827b8d4e4cea195ee12d62c18d379c')) AS plaintext

Query id: 8e1052b2-63d5-44c4-b886-6aedecf401ab

Row 1:
──────
plaintext: |���Nh▒G�ݻ%;��J<���;^L�����~��C��aG����л�Oj
                                                      g�r ��I��&�K"$���2ma�-

Or an error:

:) SELECT decrypt('aes-256-gcm', unhex('00000316613B94EED810586A792D95FA3D813DF17CCA01AFD23EFB6D4340355FD3B80BB24D7BC235A336FC96977AACC210CF6C16E38396B038A817058F0AD9E61D9845E6453BBB969993BB'), unhex('41204a63d38dcb7432c9265ba03e62a872e395057b32d4a11833311a394aaea8'), unhex('c5827b8d4e4cea195ee12d62c18d379c')) AS plaintext

Received exception from server (version 21.1.2):
Code: 454. DB::Exception: Received from localhost:9000. DB::Exception: Failed to decrypt. OpenSSL error code: 0: While processing decrypt('aes-256-gcm', unhex('00000316613B94EED810586A792D95FA3D813DF17CCA01AFD23EFB6D4340355FD3B80BB24D7BC235A336FC96977AACC210CF6C16E38396B038A817058F0AD9E61D9845E6453BBB969993BB'), unhex('41204a63d38dcb7432c9265ba03e62a872e395057b32d4a11833311a394aaea8'), unhex('c5827b8d4e4cea195ee12d62c18d379c')) AS plaintext.

Please note that we’ve turned binary ciphertext into a hex string only to simplify demonstration, and you probably shouldn’t store ciphertext as a hex-string in production. ClickHouse can handle binary data in String (or FixedString) columns pretty well.

A more realistic example

Imagine you have a web store and you save every customer with all their information to a DBMS, e.g. ClickHouse. So your schema may look something like this:

CREATE TABLE customers
(
    `id` UInt64,
    `name` String,
    `address` String,
    `card` String
)
ENGINE = MergeTree
ORDER BY id

Now let’s fill it with fake data using https://www.generatedata.com/ 

Load that data into ClickHouse and check what we’ve got:

:) SELECT * FROM customers LIMIT 3

┌─id─┬─name─────────────┬─address─────────────────┬─card───────────────────────────┐
│  1 │ Leah Hinton      │ 934-305 Sit St.         │ ;4903506520878360=3001921590?1 │
│  2 │ Rigel G. Cline   │ 334-8390 Orci, St.      │ ;5315779724329808=1311459558?3 │
│  3 │ Skyler E. Bailey │ 462-5033 Quisque Avenue │ ;5137475010092210=94045411?5   │
└────┴──────────────────┴─────────────────────────┴────────────────────────────────┘

Ok, that looks spooky. Anybody with read access can now see very sensitive customer information. Let’s fix this!

There are many ways how you can encrypt the data, but for simplicity let’s update the column in-place starting by generating the key and initialization values.

$ openssl rand -hex 32 # key
5eb83a2d768889a7984fc3aea79f1b75f274825c426641697b20c547f9765c35
$ openssl rand -hex 16 # iv
8716176c655163785fa0dc9e96b8eb06

Next we update the table asynchronously to encrypt values.

:) ALTER TABLE customers UPDATE name = encrypt('aes-256-gcm', name, unhex('5eb83a2d768889a7984fc3aea79f1b75f274825c426641697b20c547f9765c35'), unhex('8716176c655163785fa0dc9e96b8eb06')) WHERE 1;

Finally, we can select our data.

:) SELECT * FROM customers LIMIT 3 \G

Row 1:
──────
id:      1
name:    i��K��_��`,+D�9��$v�E�e�~
address: 934-305 Sit St.
card:    ;4903506520878360=3001921590?1

Row 2:
──────
id:      2
name:    ��ν��b�yy�Ɨ�gL3r���ʀ�p2
address: 334-8390 Orci, St.
card:    ;5315779724329808=1311459558?3

Row 3:
──────
id:      3
name:    v
����t��L�~p(��yE1L�˞�?)��M
address: 462-5033 Quisque Avenue
card:    ;5137475010092210=94045411?5

That is much better, and after updating rest of the customer data in the similar manner we’ll get this:

:) SELECT * FROM customers LIMIT 3 \G

Row 1:
──────
id:      1
name:    i��K��_��`,+D�9��$v�E�e�~
address: R��X����z�DhR�2ʀt�Y���'q�
card:    U��X����>� $OgR��><�{��Z���ɿ,iV���▒�A�

Row 2:
──────
id:      2
name:    ��ν��b�yy�Ɨ�gL3r���ʀ�p2
~�q���q�� ��S����|�~0\�
card:    T��^���:�%%DaZ��==�|��V������;w>m߿/�8��{

Row 3:
──────
id:      3
name:    v
����t��L�~p(��yE1L�˞�?)��M
address: W��^����{�dm   4BұkbS�킞��!�䊧:���&
card:    T��\����>�..N`R��:<�}��\�y��~qF>���n���w

Now it looks like all your data is corrupted, but fear not! You can still get it all back:

:) SELECT id, decrypt('aes-256-gcm', name, unhex('5eb83a2d768889a7984fc3aea79f1b75f274825c426641697b20c547f9765c35'), unhex('8716176c655163785fa0dc9e96b8eb06')) AS name FROM customers LIMIT 3

┌─id─┬─name─────────────┐
│  1 │ Leah Hinton      │
│  2 │ Rigel G. Cline   │
│  3 │ Skyler E. Bailey │
└────┴──────────────────┘

Here we’ve used the same key for each customer, but we actually can use a unique key for each row if required. But you had better make your keys unpredicatable (to prevent hackers from just guessing the key), and also make sure that you know the keys from your encrypted data. Otherwise it is next to impossible to decrypt it back.

Secrets and Logs

Ok, now we have all customer data encrypted, but what about leaking keys? If attacker has access to ClickHouse logs, is it perhaps possible to get the keys from the logs? Well, yes and no:

$ cat /var/log/clickhouse-server/clickhouse-server.log | grep encrypt
# I've redacted timestamps and other unrelated info out to declutter
<Debug> executeQuery: (from 127.0.0.1:54436) WITH 'My dirty little secret: I ate all the cookies from the jar!' as plaintext select hex(encrypt(???)) as ciphertext, length(plaintext) as l1, length(ciphertext) as l2;
<Debug> executeQuery: (from 127.0.0.1:55982) SELECT decrypt(???)
<Debug> executeQuery: (from 127.0.0.1:55982) SELECT hex(encrypt(???)) as ciphertext
<Debug> executeQuery: (from 127.0.0.1:55982) SELECT decrypt(???) as plaintext
<Error> executeQuery: Code: 36, e.displayText() = DB::Exception: Invalid key size: 33 expected 32: While processing decrypt(???) AS plaintext (version 20.11.6.6 (official build)) (from 127.0.0.1:55982) (in query: SELECT decrypt(???) as plaintext), Stack trace (when copying this message, always include the lines below):
<Error> TCPHandler: Code: 36, e.displayText() = DB::Exception: Invalid key size: 33 expected 32: While processing decrypt(???) AS plaintext, Stack trace:
<Debug> executeQuery: (from 127.0.0.1:55982) SELECT number, SHA256(toString(number)) as key, encrypt(???) from system.numbers LIMIT 5;

Same thing applies for system.query_log:

:) SELECT query FROM system.query_log WHERE query LIKE '%crypt%'

┌─query──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ WITH 'My dirty little secret: I ate all the cookies from the jar!' as plaintext select hex(encrypt(???)) as ciphertext, length(plaintext) as l1, length(ciphertext) as l2;   │
│ SELECT decrypt(???)                                                                                                                                                            │
│ SELECT hex(encrypt(???)) as ciphertext                                                                                                                                         │
│ SELECT decrypt(???) as plaintext                                                                                                                                               │
│ SELECT encrypt(???)                                                                                                                                                            │
│ SELECT number, hex(SHA256(toString(number))) as key, hex(encrypt(???)) as ciphertext from system.numbers LIMIT 5;                                                              │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Currently ClickHouse uses the query_masking_rules mechanism, which has its limitations. Basically the query mask is a set of regular expressions used to overwrite the query before it goes into the logs. You can change what exactly is removed by tweaking the query_masking_rules section in /etc/clickhouse-server/config.xml.

As you can see, all arguments inside encrypt or decrypt functions are redacted out. If the key or a secret is placed outside of the function call, query_masking_rules have no chance of detecting and removing it automatically. The workaround for now is to add your own custom masking rules, which works fine for small applications but could be a bit like whack-a-mole for large ones that use AES functions frequently.

This is a known limitation, and we are working on a better way of managing secrets. Stay tuned…

Compatibility with MySQL

As you may already know, MySQL has its own encryption/decryption functions. They are a bit more relaxed about argument checking and also perform non-standard “key folding”, so there are two separate functions that behave exactly the way MySQL does, and hence produce identical output.

  • aes_encrypt_mysql(mode, plaintext, key, [iv])
  • aes_decrypt_mysql(mode, ciphertext, key, [iv])

Let’s verify this by spinning up a MySQL instance with similar schema, put the same data in, and encrypt it.

mysql> CREATE TABLE `customers` (
    ->   `id` mediumint,
    ->   `name` VARBINARY(255) default NULL,
    ->   `address` VARBINARY(255) default NULL,
    ->   `card` VARBINARY(255)
    -> );
mysql> INSERT INTO `customers` (`id`,`name`,`address`,`card`) VALUES (1,'Abdul S. Forbes','752-607 Cum Av.','%B6771438115157824^KeefeBishop^2101554167?6')...
mysql> SET block_encryption_mode = 'aes-128-ecb'
mysql> UPDATE customers SET name = AES_ENCRYPT(name, 'this is an encryption key used in mysql to encrypt name column', 'this is an iv used in mysql');
mysql> SELECT id, name FROM customers LIMIT 3;
+------+--------------------------------------------------------------------+
| id   | name                                                               |
+------+--------------------------------------------------------------------+
|    1 | 0xE9F0A38662357C87F8D80090242D491E                                 |
|    2 | 0x3A56A44186392AB666C8A9C96D5FB35E                                 |
|    3 | 0x3C758231FE47E5B513D520F92E6FCC78                                 |
+------+--------------------------------------------------------------------+

mysql> SELECT id, cast(aes_decrypt(name, 'this is an encryption key used in mysql to encrypt name column', 'this is an iv used in mysql') as char) as name FROM customers LIMIT 3; 
+------+-----------------+
| id   | name            |
+------+-----------------+
|    1 | Abdul S. Forbes |
|    2 | Veronica Doyle  |
|    3 | Len Jones       |
+------+-----------------+

Now let’s try to read that data from MySQL in ClickHouse.

:) SELECT id, hex(name) FROM mysql('172.17.0.3', 'online_store', 'customers', 'root', 'password') LIMIT 3;

┌─id─┬─hex(name)────────────────────────────────────────────────────────┐
│  1 │ E9F0A38662357C87F8D80090242D491E                                 │
│  2 │ 3A56A44186392AB666C8A9C96D5FB35E                                 │
│  3 │ 3C758231FE47E5B513D520F92E6FCC78                                 │
└────┴──────────────────────────────────────────────────────────────────┘

And decrypt it

:) SELECT id, aes_decrypt_mysql('aes-128-ecb', name, 'this is an encryption key used in mysql to encrypt name column', 'this is an iv used in mysql') as name FROM mysql('172.17.0.3', 'online_store', 'customers', 'root', 'password') LIMIT 3;

┌─id─┬─name───────────────────┐
│  1 │ Abdul S. Forbes        │
│  2 │ Veronica Doyle         │
│  3 │ Len Jones              │
└────┴────────────────────────┘

Please note that decrypting with just decrypt() would fail due to more strict validation of input arguments performed by decrypt():

:) SELECT id, decrypt('aes-128-ecb', name, 'this is an encryption key used in mysql to encrypt name column', 'this is an iv used in mysql') FROM mysql('172.17.0.3', 'online_store', 'customers', 'root', 'password') LIMIT 3;

Received exception from server (version 20.11.6):
Code: 36. DB::Exception: Received from localhost:9000. DB::Exception: aes-128-ecb does not support IV: while executing 'FUNCTION decrypt('aes-128-ecb', name, 'this is an encryption key used in mysql to encrypt name column', 'this is an iv used in mysql') Nullable(String) = decrypt('aes-128-ecb', name, 'this is an encryption key used in mysql to encrypt name column', 'this is an iv used in mysql')'. 

Performance

Within AES encryption functions ClickHouse uses SSL API differently from how the original creators intended: we have a relatively small input stream (a single value of a current row) and then cipher context is fully reset. Prior to switching to BoringSSL, that made AES performance rather slow.

ClickHouse client version 20.11.6.6 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 20.11.6 revision 54442.

:) SELECT count() FROM numbers(10000000) WHERE NOT ignore(encrypt('aes-128-cbc', number, 'keykeykeykeykeyk', 'iviviviviviviviv')) LIMIT 1;

┌──count()─┐
│ 10000000 │
└──────────┘

1 rows in set. Elapsed: 14.651 sec. Processed 10.02 million rows, 80.18 MB (684.08 thousand rows/s., 5.47 MB/s.) 

684 thousands rows per second is not impressive by ClickHouse standards. But after switching to BoringSSL since ClickHouse 21.1 things have improved significantly:

ClickHouse client version 21.1.2.15 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 21.1.2 revision 54443.

:) SELECT count() FROM numbers(10000000) WHERE NOT ignore(encrypt('aes-128-cbc', number, 'keykeykeykeykeyk', 'iviviviviviviviv')) LIMIT 1;

┌──count()─┐
│ 10000000 │
└──────────┘

1 rows in set. Elapsed: 2.138 sec. Processed 10.02 million rows, 80.18 MB (4.69 million rows/s., 37.51 MB/s.)

4.69 million rows is much better, looks like BoringSSL does less work on context reset and might be more performant overall.

Conclusion

AES encryption functions give users a powerful new tool to protect sensitive data in applications. With this new feature you can encrypt critical data using various encryption modes. You can now even decrypt data encrypted on MySQL server. And starting from 21.1 performance of the encrypt/decrypt functions should not be a major concern.

AES encryption is just one of many ClickHouse security features we are working on at Altinity. If you want to see a complete view of ClickHouse security, have a look at our recent webinar ClickHouse Defense against the Dark Arts: Intro to Security and Privacy. We plan many more blog articles on security topics in the near future. If you have questions about security or need new security capabilities in ClickHouse, please contact us. We’ll be happy to help. Until then, enjoy the new AES capabilities!

Share