Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay

Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay

eBay depends on Kafka to solve the impedance mismatch between rapidly arriving messages in event streams and efficient block insert into ClickHouse clusters. Naïve loading procedures from Kafka to ClickHouse generate non-deterministic blocks, leading to data loss and incorrect results in applications. Learn how the eBay team solved this problem.

Analytics That Really Bring the Heat

Read why and how Pepper.com, the largest shopping community, chose ClickHouse over BigQuery to provide personalized customer experiences to 25 Million+ shoppers.

ClickHouse at LifeStreet: Performance Marketing is as Strong as Your Data Platform
·

ClickHouse at LifeStreet: Performance Marketing is as Strong as Your Data Platform

We live in a rapidly changing world. The ability to discover and apply business-critical insights from petabyte datasets in real-time is now a key factor in many businesses. Digital marketing is no exception. In fact, digital marketing is now one of the major sources of Big Data. In this article, we will explain how ClickHouse is used by the digital marketing company LifeStreet.

ClickHouse Cost-Efficiency in Action: Analyzing 500 Billion Rows on an Intel NUC

ClickHouse Cost-Efficiency in Action: Analyzing 500 Billion Rows on an Intel NUC

Jan 1, 2020
Cost-efficiency and performance are critical for big data analytics. For this reason a recent blog post from ScyllaDB guys caught our attention. They collected over 500 billion data points and were able to query it with 1B rows/sec query scan performance. The test rig was a beefy and expensive packet.com cluster: 83 n2.xlarge.x86 instances, 28 cores and 384RAM each. This is a nice demo of ScyllaDB cluster management. But looking at the numbers we realized it’s not very impressive as an example of efficient analytics. We can prove that using ClickHouse.

ClickHouse and ProxySQL queries rewrite

ClickHouse and ProxySQL queries rewrite

ProxySQL is a popular open source, high performance and protocol-aware proxy server for MySQL and its forks. Since September 2017 ProxySQL supports ClickHouse as a backend, so clients can connect to ClickHouse via MySQL protocol. In practice, this helps MySQL-aware applications to start using ClickHouse as without changes in the client library.

To avoid some limitations to this approach, ProxySQL creator René Cannaò added additional functionality for query rewrite. With his permission, we cross-post his article describing new functionality in our blog.

Realtime replication from MySQL to ClickHouse in practice

July 2, 2018
Vladislav Klimenko from Altinity and Valery Panov from Ivinco presented a talk at HighLoad Siberia 2018 conference recently. They described the real problem that Ivinco faced and how it has been solved with migration of analytics from MySQL into ClickHouse using MySQL to ClickHouse replication. A few months ago we introduced clickhouse-mysql tool in our blog, and Ivinco was the first company we know that tried it and used it in production.

Big Data Analysis in Digital Marketing Research

Dec 6, 2017
Christian Hotz-Behofsits, Teaching & Research Associate at Vienna University of Business and Economics, is one of the creators of RClickhouse package for R that we have recently introduced on our blog. In this article he describes data analysis challenges his group is facing and how ClickHouse helps in their research.