The year 2020 has been a tough one for the world. It changed the way people work, communicate and travel. It taught us to care more about family, friends, and colleagues. It highlighted the fragility of our world. Altinity and ClickHouse changed as well, but fortunately most changes were good. Altinity doubled the size of our team and revenues, we launched Altinity.Cloud, and we delivered many features to ClickHouse itself that users asked for. ClickHouse reached 14K stars on GitHub and doubled its dbengines.com ranking. ClickHouse is looking over the shoulder of some big boys already.
It is hard if even possible to pick the most important new features of 2020. But I’ll give it a try.
2020 ClickHouse Recap
Probably one of the biggest focus areas of ClickHouse in 2020 was security. Industrial grade Role Based Access Control (RBAC), LDAP authentication and users mapping, AES encryption functions, Kerberos — all these makes ClickHouse much stronger for enterprise use.
ClickHouse storage underwent massive re-factoring. Tiered storage was first introduced in 2019, but it was productized in 2020 with TTL moves. TTL syntax was so convenient that it has been further extended with re-compression and rollup logic. Compact and in-memory MergeTree parts improved versatility and performance of small writes. And last but not least is Object Storage integration, which opens up new horizons in cloud environments.
ClickHouse team is always focused on performance. 2020 was no exception. The base has been established by releasing a new query execution pipeline — Processors. There were probably hundreds of optimizations of various kinds, including parallel parsing of input data, parallel SELECT FINAL, pushdown optimization for distributed queries, SIMD optimizations and new algorithms. ClickHouse ODBC driver underwent massive refactoring for performance sake as well. ClickHouse continues to be the fastest analytical database on the planet.
ClickHouse continues to improve generic SQL compatibility. The most important functions in this regard are multiple joins that have been completed in Q1 2020, and common table expressions added recently. At the beginning of 2020 we set up an environment for TPC-DS testing, and we run the suite on every ClickHouse release. The number of passing tests doubled over the year. It is still not 100% but we are looking forward to 2021 to have full coverage. Needless to say the performance results of TPC-DS are extremely good.
Being already a “Polyglot Database” ClickHouse continues to learn new ways to integrate with external data. Kafka engine reached production quality, experimental MaterializeMySQL engine turns ClickHouse into a MySQL replica, and PostgreSQL wire protocol is a welcome addition to MySQL wire protocol, which already existed for some time. MongoDB, Redis, Cassandra dictionaries together with direct and ssd_cache dictionary layout allow you to plug external key-value stores for fast lookups.
I have just scratched the surface here. A lot of great work has been done by ClickHouse developers and community. Now we can relax for a few days before the New Year, and get ready for 2021. So what can users expect to come in 2021?
2021 ClickHouse Upcoming Highlights
Missing window functions has been one of the major concerns from enterprise users. While ClickHouse arrays usually can do the job they require some time to get used to. If you follow GitHub issues the prototype is already being committed, and we expect it to be ready by the end of Q2 2021.
No, do not expect full ACID transactions in ClickHouse! But making an insert reliably atomic in various scenarios, including the extensive usage of materialized views, is the top priority.
Lightweight Deletes (and Updates)
ClickHouse already supports heavyweight deletes that require full re-write of affected parts. While it is inevitable, it does not have to happen immediately. We plan to implement deletes that would not change parts but instead would add a delete mask to be applied at query time. The actual data eviction will happen during the merge process. It is not going to turn ClickHouse to an OLTP database, but will help in many other scenarios such as implementing privacy policies mandated by GDPR.
ClickHouse is greedy for available resources, and tries to use them as much as possible to get the work done faster. In concurrent environments, however, it may lead to problems. The better management and allocation of CPU, RAM and I/O resources during query execution is required.
Cloud enabling features
ClickHouse is already used in Yandex.Cloud and Altinity.Cloud as well as by many community users thanks to the operator for Kubernetes. However, it still can not utilize the cloud computing model fully. Better integration with object storage, ability to separate storage from compute in the cloud are game changing features here.
We love ZooKeeper because it enables ClickHouse replication. We hate ZooKeeper because it enables ClickHouse replication. Almost every ClickHouse user has lost ZooKeeper in his career at least once, and we know how difficult it is to recover. While we are working on some tools to make ZooKeeper recovery an easier walk, there are ambitious plans to integrate replication coordination protocol entirely in ClickHouse.
These are the most significant tasks. The full list is available at the community website.
New Year Release Updates
We can not leave ClickHouse users without presents, and therefore we have prepared three small gifts:
- ClickHouse Altinity Stable Release update 22.214.171.124. See community release notes for the list of bug fixes. We recommend all 20.8 users to update to this release.
- ClickHouse ODBC driver has been updated 126.96.36.19901226. See release notes.
- We released a new version of ClickHouse operator for Kubernetes.
ClickHouse development team complemented the list with the new Recipes Dataset with strawberry cake inside.
In January 2021 we are going to certify ClickHouse 20.11 or 20.12 in order to productize remaining 2020 features. Stay tuned and Happy New Year!