CLICKHOUSE AND ELASTICSEARCH FAQ’S

ClickHouse is a great candidate to replace expensive Elasticsearch implementations.  Read on for more information about when you should consider ClickHouse instead.

The key trade-off is highly compressed, columnar storage vs. full-text search on documents.

ClickHouse is a SQL data warehouse designed for extremely fast queries on large datasets. It uses columnar storage, compression, and materialized views to reduce response time by a factor of 1000 over conventional databases like MySQL or PostgreSQL. 

Elasticsearch is a search engine designed to run queries on large volumes of semi-structured data like log records in JSON. It uses full text search, which allows it to query document data without converting it to tables.

ClickHouse is more open. 

ClickHouse is released under the popular open source Apache 2.0 License. Users are free to add ClickHouse to proprietary products, use it to build SaaS offerings, and run managed ClickHouse services. There are no limitations to the type of business you can support. 

Elasticsearch originally released under Apache 2.0 but has moved away. Users have a choice of the proprietary Elastic License v2 or the Server Side Public License (SSPL) 1.0. Both licenses ensure access to source code but place significant limitations on usage of Elasticsearch, especially for SaaS businesses.

ClickHouse outperforms Elasticsearch as data structures become more predictable and data size increases.

Uber moved to ClickHouse from Elasticsearch to manage service logs at massive scale. ContentSquare successfully migrated web analytics processing from Elasticsearch to ClickHouse. 

ClickHouse excels in many other use cases as well: rapid valuation of financial assets, network flowlog analysis, intrusion detection, real-time marketing, CDN management, and observability applications, to name a few. ClickHouse performance in these use cases meets or exceeds any other analytic database.

Yes.

ClickHouse performance is a clear winner over Elasticsearch for tabular or near-tabular data.  Uber measurements show ingest time into ClickHouse was capped at 1 minute max and multi-region queries returned in seconds without degradation under high load.  ContentSquare found that queries were 4 times faster overall and 10 times faster at 99th percentile latencies.

Yes.

Uber reduced their cluster footprint on ClickHouse by over 50% while serving more queries than with Elasticsearch. ContentSquare reported that ClickHouse was 11 times cheaper than Elasticsearch. ClickHouse runs well even on very small devices, such as Intel NUCs, where it can handle datasets running to hundreds of billions of records.

Yes.

ClickHouse is a C++ binary that runs anywhere Linux does, from Android phones (really) up to clusters with hundreds of nodes. Many ClickHouse installations use single node only because ClickHouse requires so few resources. 

Elasticsearch is based on Java, which means Java must also be installed. It requires substantial care to manage effectively, especially as cluster sizes increase. ContentSquare found that their ClickHouse cluster was far more fault tolerant than a similarly capable Elasticsearch cluster.

It’s definitely more competitive.

ClickHouse Apache 2.0 licensing enables a worldwide market for managed ClickHouse in public clouds.  There are at least 7 such services including our own Altinity.Cloud in AWS. Users can easily move back and forth between managed ClickHouse and on-prem operation. 

Elasticsearch licensing prevents vendors other than Elastic from running managed Elasticsearch for current software versions. Competitors must fork the older, Apache 2.0-licensed version, as Amazon has done. Managed Elasticsearch services may diverge in future, and compatibility with on-prem versions cannot be guaranteed.

Both are secure.

ClickHouse security has improved rapidly over the past couple of years. It now offers LDAP user management, role-based access control (RBAC), Kerberos authentication, column encryption functions, high-performance TLS connections, and many other features. More are on the way, thanks to a large and active community of contributors.

Yes.

Grafana is a popular choice for building dashboards quickly on ClickHouse data. The community-supported plugin is stable and widely used.  Superset is another excellent choice. ClickHouse client library support is excellent, which makes it easy to embed analytics in Javascript, Python, Golang, and Java applications.

Yes.

ClickHouse has a number of popular options for loading log data including Vector, FluentD, and Kafka. You can also import log data directly using ClickHouse table functions, which can read from files, S3 object storage, HDFS, and other sources.  Here’s an example of loading compressed file system data using the file table function.

INSERT INTO mytable 
  SELECT * FROM file('logfile.log.gz', 'Template', 'col1 …, colN …');

Yes, though there are limitations.

ClickHouse generally gets the best performance on data stored in table columns with proper data types.  Scans on unstructured data are more expensive. 

One popular ClickHouse pattern is to store the original unstructured doc in a table and extract enough attributes into columns to cover the majority of queries. PixelJets documented how ClickHouse makes it easy to extract data from common formats like JSON into high performance columnar format.

No.

But ClickHouse has skipping indexes (ngrambf_v1, tokenbf_v, bloom filter, etc.), which can reduce I/O by 95% or more. In addition, it’s unbeatable in full scans thanks to features like compression, vectorized query processing, and efficient distributed query. ClickHouse is filled with features designed to increase performance.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.