ClickHouse RoadMap 2019
Dec 17, 2018
The year 2018 approaches the end. It has been a great year for ClickHouse and the ClickHouse community — a lot of events, new features and interesting projects. Now it is time to see what is next. ClickHouse development team lead by Alexey Milovidov unveiled some plans and allowed us to share them with you.
There is some time left before the New Year still, and new features can sill arrive. There were rumors that next release is going to be published on December 31st, though it may be ready earlier as well. The following features are planned there:
- HDFS import/export via table functions
- Parquet file format support for importing/exporting data. That makes it easier to integrate ClickHouse with Hadoop ecosystem.
- Column level compression/encoding. The initial release will include lz4, zstd and delta encoding. Double delta, Gorilla and blosc algorithms are to be released later.
- Ability to add new columns to MergeTree storage engine index. This is especially useful for Summing/Aggregating MergeTree tables that require all non-aggregated columns to be in the index
The first 2019 major releases will bring the following integration extensions.
- Amazon S3 import/export via table functions
- Dictionaries as first-class citizens defined with common ‘CREATE TABLE’ SQL syntax
Security and fine grained access control is a highly desirable feature by many companies, and ClickHouse will properly support it in Q1/2019:
- Table, column and row level security
- RBAC access control model
- Pluggable external authentication (LDAP, Kerberos)
MergeTree is the core ClickHouse technology and it will be improved further for even better performance and usability. Q1-Q2/2019 plans include:
- Adaptive index granularity for MergeTree tables
- Secondary index structures (min/max, bloom filter)
- Using index for better ORDER BY / GROUP BY performance
This year there was a lot of work done already on improving ClickHouse support of SQL joins. In Q2-Q3/2019 it is going to be continued, both in terms of SQL standard compliance and better performance. That includes:
- Multi-table joins
- Merge join for big tables
- Bucket-shuffle algorithm for distributed joins
- ASOF joins for time series data
Resource pools and support for multiple storage volumes were planned for 2018 but delayed in favor of other features. Those are still in the plan for Q2-Q3/2019 with resource pools coming first:
- Resource pools (fine grained CPU, memory, network, RAM allocation)
- Layered storage HDD/SDD for cold/hot data
- JBOD storage support
ClickHouse has been being criticized sometimes for limited support of geospatial data structures. We can not expect it to be as feature rich as PostGIS, but some extensions for geospatial applications are planned for Q3/2019, though the priorities may be changed, and it may appear earlier:
- Geohash support
- Polygonal dictionaries
Amongst other things that ClickHouse development team has plans to work on, we would like to highlight two in particular:
- Advanced algorithms for searching strings, making it more full-text-search-friendly
- Machine learning algorithms as aggregate functions. That opens up a lot of possibilities so we are eager to see how it works.
This is just a list of projects that the core development team is going to work on. There are many community contributors who add significant features to ClickHouse as well. Altinity is going to be active there too — we have several ClickHouse projects and code contributions planned for 2019 that will make ClickHouse easier and safer to use.
Stay tuned!
According to the roadmap published on the Yandex site: https://clickhouse.yandex/docs/en/roadmap/ it sounds as if RBAC and external authentication is slated for Q3 2019, rather than Q1… Any more word on this scheduling?
We recently checked it with them, the target is still Q1 or early Q2.