Blog

Real-Time Data Lakes Meetup at Sentry in San Francisco

On Tuesday, July 8th we had the pleasure of participating in a meetup devoted to real-time data lakes at Sentry’s offices in San Francisco. The goal was to create an opportunity for data enthusiasts to share ideas for adapting real-time databases to the scalability offered by Apache Iceberg-based lake houses. The result was 3 hours geeking out on database technology, coming away with new ideas to build high-performance analytic apps on open source. In short: a resounding success! 

The meetup included talks by Altinity, PostHog, and CelerData. We posted recordings to a YouTube playlist, in case you missed the live versions. Here’s a very quick summary of the talks including links to individual talk videos and slides. 

Adapting ClickHouse® to use Apache Iceberg Storage – Robert Hodges, CEO @ Altinity. Project Antalya reduces cost of storage and compute by using Iceberg as shared S3 storage for ClickHouse. Robert showed how Antalya integrates ClickHouse and Iceberg, then dug into benchmarks. Antalya response on Parquet data is approaching parity with ClickHouse MergeTree speed. This is a big deal; 2 years ago Parquet was at least 10x slower. (Slides: here)

​The PostHog Data Lakehouse – How we turned ClickHouse into our Lake House – James Greenhill, Chief Data Wrangler @ PostHog. Starting with the proposition that every database ends up as a business intelligence store (BI), James walked through PostHog’s evolution from shared nothing to multi-tenand data lake that include custom data from users. The end goal is NVMe-block storage combined with Iceberg in an Antalya architecture. (Slides: here)

Achieving Data Warehouse Performance on Apache Iceberg – Sida Shen, Product Manager @ CelerData. Sida’s talk started with a nice explanation of Iceberg that include useful tips for high performance. He continued with several industry examples of operating StarRocks on Iceberg at scale. The talk highlighted clever StarRocks tricks like parallel query processing of Iceberg positional deletes using distributed anti-joins. It’s as cool as it sounds! (Slides: here)

The best thing about open source culture is the open sharing of knowledge. There was plenty of that in conversations throughout the evening. For that we also have to thank the folks at Sentry, especially Pierre Massat and Melissa Cheng for offering the Sentry space in San Francisco to make this event possible. Y’all are great. 

We’ll be back soon with more deep dives into real-time data lakes. We’re looking for locations in New York, Atlanta, Seattle, and London, among other places. Do you want to sponsor a real-time data lake meetup in your location? Reach out and let’s make it happen.

P.S. Want to stay in the loop on our events? Join the Altinity Slack channel. We share updates there first!

Share

ClickHouse® is a registered trademark of ClickHouse, Inc.; Altinity is not affiliated with or associated with ClickHouse, Inc.

Table of Contents:

Leave a Reply

Your email address will not be published. Required fields are marked *