ClickHouse Data Integration — September 2020 Meetup Report

By Robert Hodges on September 22nd, 2020

ClickHouse Data Integration — September 2020 Meetup Report

Robert Hodges ClickHouseMachine LearningMeetupMySQLODBCopen dataStartups

Our last ClickHouse SF Bay Area meetup was on September 10th and focused on data integration. This turned out to be a popular subject.  We had well over 100 signups with 59 people in attendance. The article you are reading summarizes the talks and provides links to slides and videos.  

Most databases have the ability to read data from locations other than internal tables. ClickHouse is increasingly capable in this area, with multiple ways to read and write external data sources.  These not only simplify data loading, but also offer creative opportunities for integration with other products. The goal of the meetup was to dig into integration features and show the creative possibilities.

The meetup started with a lightning talk from Rahul Sharma covering Integration of ClickHouse with Apache Superset and Dremio. Rahul’s application loads health census data into ClickHouse and displays it in Superset dashboards. It was a nice lead-in for the main talks, which were the following: 

  • Polyglot ClickHouse — Robert Hodges from Altinity (video link here). Robert covered the ways that ClickHouse can connect to data sources like MySQL, Kafka, S3, and Snowflake. S3 is new but is one of the most powerful mechanisms, since it allows access to cloud data lakes. ClickHouse integration features are developing rapidly and new ones appear regularly. There’s even a new engine to allow ClickHouse to read from MySQL binlogs
  • Splitgraph: Open data and beyond — Artjoms Iskovs and Miles Richardson from Splitgraph (video link here). Splitgraph publishes open datasets using the PostgreSQL protocol. Any application that can connect to PostgreSQL can now access them, which is a huge step forward in accessibility.  Artyoms and Miles proved it by showing how ClickHouse users can connect to open data using the Odbc table function.  You can query data directly on Splitgraph or pull it into native MergeTree tables for fast local processing. The built-in ODBC capability of ClickHouse opens up some interesting doors! 
  • MindsDB: Machine Learning in Clickhouse — Jorges Torres, Max Stephanov, and Zoran Pandovski from MindsDB (video link here). MindsDB has developed a machine learning engine that manifests machine learning models as MySQL tables. It integrates with ClickHouse in two very useful and easy-to-understand ways.  First, users can access the models through the MySQLDatabase engine and execute them using SQL SELECT statements. Second, MindsDB models can reach back into ClickHouse and use data in ClickHouse tables for training. Machine learning-to-database integration is a hot topic, and MindsDB gave a great example of how it can be done. 

In summary, ClickHouse polyglot capabilities enable data ingest, access, and egress with minimal effort.  As the latter talks illustrated, they also open up fascinating possibilities for application integration using the MySQL and PostgreSQL connection protocols. In the Q&A we discussed how ClickHouse could enable better integration by pushing down operations like joins into ODBC and MySQL connections.  There is a lot to digest here and a lot of possibilities. 

Our next ClickHouse SF Bay Area Meetup will be virtual office hours with three ClickHouse stars: Alexey Milovidov, Alexander Zaitsev, and Denis Zhuravlev (aka Denny_Crane on telegram).  We’ll meet in late October.  Start thinking of your questions now!

Leave a Reply

Your email address will not be published. Required fields are marked *