Protecting ClickHouse Data With User Level Network Access Rules

Arthur Passos is a software engineer at Altinity. His interests include C++ programming, networking, and security. His recent work includes improvements to reverse DNS lookups in network access rules. 

Security is a key consideration of modern software systems. That is particularly true for applications handling sensitive data, such as database software. ClickHouse offers a rich set of tools to improve security. These security features can be divided into a few groups: user, network, and storage. 

Increasing ClickHouse security at the user level means setting up restrictions and applying security best practices on a per user basis. Some of the features offered by ClickHouse are: quotas, LDAP remote authentication, resource restrictions (i.e which tables are accessible by which users), and network restrictions. You can read more about all of these in Altinity’s User Hardening documentation. 

User level network access restrictions in ClickHouse are powerful but not always well understood. This blog article is a deep dive into how they work and how to apply them to protect ClickHouse data.

What is User Level Network Security on ClickHouse?

The ClickHouse user level network security toolset allows connections to be restricted to certain hosts. This can be achieved by using the IP, host, like pattern or host_regexp directives. While the IP directive achieves this by simply matching the client IP address with the authorized IP, the other three approaches depend on reverse DNS resolution. The image below shows the four different ways ClickHouse offers to restrict access to specific hosts.

As already mentioned, the IP directive allows a list of authorized IP addresses to be specified for a given user. It is a simple approach and will cover basic scenarios where the IP address of the client is fixed. The below configuration illustrates how to restrict connections to the 1.1.1.50 IP address.

<clickhouse>
    <users>
        <altinity>
            <password/>
            <networks>
                <ip>1.1.1.50</ip>
            </networks>
            <profile>default</profile>
        </altinity>
    </users>
</clickhouse>

Note: Examples in this article use XML definition format. You can also define network access rules using SQL RBAC commands (aka role-based access control).

Unfortunately, not every ClickHouse client has a fixed IP address. Cloud applications can change IP addresses as they move across availability zones. ClickHouse clients on Kubernetes may change IP addresses when pods restart. In these cases, approaches based on hostnames & reverse DNS resolution are a better fit.

The host directive is exactly the same as the IP directive, but for hostnames. A list of authorized hostnames may be specified for a given user, and ClickHouse will make sure only those hostnames are allowed to establish a connection. The main benefit of this approach over the IP one is that hostnames tend to remain the same while IP addresses have a tendency to change. In order to restrict access to connections coming from the admin1@example.br host, the below configuration can be used.

<clickhouse>
    <users>
        <altinity>
            <password/>
            <networks>
                <host>admin1@example.br</host>
            </networks>
            <profile>default</profile>
        </altinity>
    </users>
</clickhouse>

Both the IP and host directives are hard to maintain when the host list is big and not fixed. In such situations, pattern-matching approaches might be a better choice.

The ClickHouse ‘like’ pattern further increases the flexibility of the IP and host directives. It can be used to match both IP and hostnames by specifying a pattern constructed with % and _ characters. The former will match any byte in any quantity, while the latter indicates any single byte. Some patterns are too complex and can not be expressed by the like operator, enters host_regexp.

Host_regexp builds on existing host directive and presents a more flexible approach over the like pattern by using regular expressions. Regular expressions are known to be very powerful and allow fine-grained customizations, such as: limiting the characters that can be used, number of characters, where such characters should appear, and much more.

Approaches based on reverse DNS resolution will inevitably introduce latency. The good news is that ClickHouse caches DNS results by default, which is likely to drastically decrease the number of DNS requests. If DNS caching is disabled and each tiny bit of latency matters, then this is something to keep an eye on. 

The host_regexp directive is extremely flexible. We’ll therefore focus on using it effectively in this article.

Blocking Unauthorized Requests With host_regexp

Consider a scenario where the number of hosts that should be granted access to ClickHouse may grow or shrink over time. As long as the host nomenclature follows a predictable pattern, it is possible to use host_regexp to dynamically determine if access should be granted or denied.

Suppose the following host nomenclature is used for a server fleet that should be granted access to ClickHouse: host50.example.br, host51.example.br, host52.example.br, and so on. It is possible to extract a pattern: hostNN.example.br, where NN is a number ranging from 50 to 99. To block incoming connections from hosts that do not follow this pattern, a regular expression like the following may be used: ^host[5-9][0-9].example\.br$.

The below image illustrates a scenario where two distinct clients perform the very same request with identical credentials. Even though both clients have valid credentials, the request coming from an unauthorized domain is denied.

As a general rule of thumb, users should be restricted to the networks they connect from when possible.

Configuration

The configuration in ClickHouse side is pretty simple: just add a user configuration file containing the host_regexp setting for the desired user to the users.d directory. The below configuration restricts the user altinity to be accessed only by hostnames that match ^host[5-9][0-9].example\.br$ (e.g example50.host.br and example51.host.br).

<clickhouse>
    <users>
        <altinity>
            <password/>
            <networks>
                <host_regexp>^host[5-9][0-9].example\.br$</host_regexp>
            </networks>
            <profile>default</profile>
        </altinity>
    </users>
</clickhouse>

In case it is necessary to have more than one regular expression for a given user, it can be simply added within the same user clause. This way, ClickHouse will check for matches in both and apply the same “if any” logic. Below is an example of two host_regexp for one user.

<clickhouse>
    <users>
        <altinity>
            <password/>
            <networks>
                <host_regexp>^host[5-9][0-9].example\.br$</host_regexp>
                <host_regexp>^altinity\d\d\.cloud\.br$</host_regexp>
            </networks>
            <profile>default</profile>
        </altinity>
    </users>
</clickhouse>

The equivalent of the above configuration in RBAC format is: CREATE USER altinity HOST REGEXP ‘^host[5-9][0-9].example\.br$’, REGEXP ‘^altinity\d\d\.cloud\.br$’

Inner Workings

The mechanism behind this feature is straightforward: ClickHouse will test all hostnames associated with the client IP address against the configured host_regexp. If any matches, the request is authorized. Below is a high-level diagram of the decision-making process involved in authorizing a host/user based on the host_regexp feature.

Internally, a DNS PTR query is performed to get the client hostnames (Yes, you read that right. There can be more than one hostname associated with an IP. BTW, support for multiple hostnames was added by Altinity). Then, a regular DNS query is performed for each hostname to assert it resolves back to the client IP address. Once that is confirmed, each hostname is checked against the configured host_regexp.

Due to the limitations of glibc getnameinfo (on some systems, it returns only the first PTR record, ignoring all the other ones), the c-ares library is used by ClickHouse to perform PTR queries. You can find out more about the reverse resolution implementation in the CaresPTRResolver class and the authorization process in the AllowedClientHosts class.

Handling Multiple Host Names in Kubernetes

It is a common practice to deploy applications and databases using Kubernetes clusters. ClickHouse is no exception. Managing complex and large Kubernetes clusters introduces its set of challenges. The Altinity ClickHouse operator helps deploy and manage ClickHouse Kubernetes Clusters. It allows several ways to configure access protection, and host_regexp is one of the possible options.

To adhere to security best practices, the Altinity ClickHouse Operator properly restricts access to an internal domain. This is done via the following host_regexp setting:

hostRegexpTemplate: "(chi-{chi}-[^.]+\\d+-\\d+|clickhouse\\-{chi})\\.{namespace}\\.svc\\.cluster\\.local$"

Here, {chi} is a template variable holding the name of the ClickHouse Installation and {namespace} refers to the Kubernetes namespace. Cluster pods are deployed with unique hostnames that match this regular expression. 

When performing a reverse DNS query on a ClickHouse installation, two hostnames are observed.

root@chi-simple-01-simple-0-0-0:/# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.17.0.4  netmask 255.255.0.0  broadcast 172.17.255.255
        ether 02:42:ac:11:00:04  txqueuelen 0  (Ethernet)
        RX packets 25762  bytes 37188413 (37.1 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 12376  bytes 1063097 (1.0 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

root@chi-simple-01-simple-0-0-0:/# nslookup 172.17.0.4
4.0.17.172.in-addr.arpa	name = chi-simple-01-simple-0-0-0.chi-simple-01-simple-0-0.default.svc.cluster.local.
4.0.17.172.in-addr.arpa	name = 172-17-0-4.clickhouse-simple-01.default.svc.cluster.local

The multiple hostnames association with a single IP scenario is due to Kubernetes Services. By default, each Kubernetes Service assigns a hostname to the Pod. If a Pod is associated with N Kubernetes services, a reverse lookup on the Pod IP will return N hostnames.

The regular expression used by the Altinity ClickHouse Operator will match chi-simple-01-simple-0-0-0.chi-simple-01-simple-0-0.default.svc.cluster.local, but will not work for 172-17-0-4.clickhouse-simple-01.default.svc.cluster.local. Without proper support for multiple hostnames, requests to the cluster instances could be incorrectly refused depending on the order returned by the DNS server. This was fixed by Altinity in #37827 and has been included in the 22.7 version of ClickHouse.

Final Remarks

There are multiple ways to improve ClickHouse security at the user level, restricting which hosts can establish connections is one of them. To achieve this, ClickHouse offers four different settings: IP, host, like pattern and host_regexp. Each of these settings has its pros and cons and should be adopted depending on the scenario.

IP and host are simpler approaches and address the basic scenarios, but do not scale well if the IPs are not fixed or the authorized hostname list is likely to grow or shrink over time. In such scenarios, assuming the hostnames follow a predictable pattern, the like pattern and host_regexp are a better fit.

Both like pattern and host_regexp present a very flexible approach to hostname matching. The former being simpler and the latter allowing more fine-grained filtering. As general advice, users should default to the like pattern whenever possible. If the like pattern is too complex or does not fully cover the use case, host_regexp should be adopted.

When adopting one of the approaches that depend on reverse DNS resolution, one must take into account a possible latency increase due to DNS queries. By default, ClickHouse will cache DNS queries, which minimizes the side effects.

Last but not least, the multiple hostnames support added by Altinity not only complies with the network layer standards, but it also extends the use cases of this feature for a variety of scenarios. The use of the Altinity ClickHouse Operator to manage ClickHouse Kubernetes clusters is one of them.

Share