The key trade-off is highly compressed, columnar storage vs. full-text search on documents.
ClickHouse is a SQL data warehouse designed for extremely fast queries on large datasets. It uses columnar storage, compression, parallel query, and materialized views to reduce response time by a factor of 1000 over databases like MySQL or PostgreSQL. ClickHouse stores data in tabular format with a sparse “primary key” index and a predefined table order. It can define new columns and change column format instantly. Changing the primary key or order columns generally requires the table to be reloaded.
Elasticsearch is a search engine designed to run queries on documents containing semi-structured data like JSON log records. It implements full text search, which allows it to query document data without converting to tables. Elasticsearch is based on Lucene, which provides indexed storage. Elasticsearch requires users to set fixed types to index data; or Elasticsearch can dynamically map data types itself. Either way, you must reindex to change data types once they are set.
ClickHouse is more open.
ClickHouse is released under the popular open source Apache 2.0 License. Users are free to add ClickHouse to proprietary products, use it to build SaaS offerings, and run managed ClickHouse services. There are no limitations to the type of business you can support.
Elasticsearch originally released under Apache 2.0 but has moved away. Users have a choice of the proprietary Elastic License v2 or the Server Side Public License (SSPL) 1.0. Both licenses ensure access to source code but place significant limitations on usage of Elasticsearch, especially for SaaS businesses.
ClickHouse outperforms Elasticsearch as data structures become more predictable and data size increases.
ClickHouse excels in many other use cases as well: rapid valuation of financial assets, network flowlog analysis, intrusion detection, real-time marketing, CDN management, and observability applications, to name a few. ClickHouse performance in these use cases meets or exceeds any other analytic database.
ClickHouse performance is a clear winner over Elasticsearch for tabular or near-tabular data. Uber measurements show ingest time into ClickHouse was capped at 1 minute max and multi-region queries returned in seconds without degradation under high load. ContentSquare found that queries were 4 times faster overall and 10 times faster at 99th percentile latencies. Alibaba performance tests showed that ClickHouse outperformed Elasticsearch–often substantially–across a range of use cases.
Uber reduced their cluster footprint on ClickHouse by over 50% while serving more queries than with Elasticsearch. ContentSquare reported that ClickHouse was 11 times cheaper than Elasticsearch. ClickHouse runs well even on very small devices, such as Intel NUCs, where it can handle datasets running to hundreds of billions of records.
ClickHouse is a C++ binary that runs anywhere Linux does, from Android phones (really) up to clusters with hundreds of nodes. Many ClickHouse installations use a single node only because ClickHouse requires so few resources.
Elasticsearch is based on Java, which means Java must also be installed. It requires substantial care to manage effectively, especially as cluster sizes increase. ContentSquare found that their ClickHouse cluster was far more fault tolerant than a similarly capable Elasticsearch cluster.
It’s definitely more competitive.
ClickHouse Apache 2.0 licensing enables a worldwide market for managed ClickHouse in public clouds. There are many such services including our own Altinity.Cloud in AWS and GCP. Users can easily move back and forth between managed ClickHouse and on-prem operation.
Elasticsearch licensing prevents vendors other than Elastic from running managed Elasticsearch for current software versions. Competitors must fork the older, Apache 2.0-licensed version, as Amazon has done. Managed Elasticsearch services may diverge in future, and compatibility with on-prem versions cannot be guaranteed.
Both are secure, provided they are properly configured.
ClickHouse security has improved rapidly over the past couple of years. It now offers LDAP user management, role-based access control (RBAC), Kerberos authentication, column encryption functions, high-performance TLS connections, and many other features. Altinity offers FIPS-compatible ClickHouse builds and works with customers to enable deployment into FedRAMP environments.
ClickHouse has a number of popular options for loading log data including Vector, FluentD, and Kafka. You can also import log data directly using ClickHouse table functions, which can read from files, S3 object storage, HDFS, and other sources. Here’s an example of loading compressed file system data using the file table function.
INSERT INTO mytable
SELECT * FROM file('logfile.log.gz', 'Template', 'col1 …, colN …');
Yes, though there are limitations.
ClickHouse generally gets the best performance on data stored in table columns with proper data types. Scans on unstructured data are more expensive.
One popular ClickHouse pattern is to store the original unstructured doc in a table and extract enough attributes into columns to cover the majority of queries. PixelJets documented how ClickHouse makes it easy to extract data from common formats like JSON into high performance columnar format. ClickHouse also has a JSON datatype. It works efficiently for simple JSON documents whose schema does not vary.
Yes. They are experimental, however, and should not be used in production systems yet.
More importantly, ClickHouse offers skipping indexes (ngrambf_v1, tokenbf_v, bloom filter, etc.), which can reduce I/O by 95% or more. In addition, it’s unbeatable in full scans thanks to features like compression, vectorized query processing, and efficient distributed query. In specific use cases like log search, these features outperform Elasticsearch full-text indexing and also use far less storage space.