Open Source

Open Source

  1. Events
  2. Open Source

Views Navigation

Event Views Navigation

Today

What’s a Data Lake and What Does It Mean For My Open Source ClickHouse® Stack?

Data lakes on open table formats are emerging as the go-to storage of large datasets for analytics, data science, and AI. This talk explains how data lakes work and how ClickHouse® is integrating them. We’ll introduce the key components of data lakes using a concrete example based on Parquet, Iceberg open table format, and the Iceberg REST catalog. Next we’ll look at new ClickHouse® feature adaptations, exploring specific issues like event stream ingest, compaction, and queries. Finally, we’ll illustrate how to combine ClickHouse® with Apache Spark and Kafka to deliver fast analytics on massive, shared tables. The real-time data lake is arriving and ClickHouse® is going to be a big part of it. Join us and bring your questions!