Podcast: Combining Python And SQL To Build A PyData Warehouse

Podcast: Combining Python And SQL To Build A PyData Warehouse

Sept 5, 2019
Robert Hodges on The Python Podcast with Tobias Macey

The ecosystem of tools and libraries in Python for data manipulation and analytics is truly impressive, and continues to grow. There are, however, gaps in their utility that can be filled by the capabilities of a data warehouse. In this episode Robert Hodges discusses how the PyData suite of tools can be paired with a data warehouse for an analytics pipeline that is more robust than either can provide on their own. This is a great introduction to what differentiates a data warehouse from a relational database and ways that you can think differently about running your analytical workloads for larger volumes of data.

Managing ClickHouse Datasets with ad-cli

Managing ClickHouse Datasets with ad-cli

July 1, 2019

Large datasets are critical for anyone trying out or testing ClickHouse. ClickHouse is so fast that you typically need at least 100M rows to discern differences when tuning queries. Also, killer features like materialized views are much more interesting with large volumes of diverse data. Despite the importance of such datasets to ClickHouse users, there is little tooling available to help manage them easily.

ClickHouse and Python: Jupyter Notebooks

ClickHouse and Python: Jupyter Notebooks

Jupyter Notebooks are an indispensable tool for sharing code between users in Python data science. For those unfamiliar with them, notebooks are documents that contain runnable code snippets mixed with documentation. They can invoke Python libraries for numerical processing, machine learning, and visualization. The code output includes not just text output but also graphs from powerful libraries like matplotlib and seaborn. Notebooks are so ubiquitous that it’s hard to think of manipulating data in Python without them.

ClickHouse support for Jupyter Notebooks is excellent. I have spent the last several weeks playing around with Jupyter Notebooks using two community drivers: clickhouse-driver and clickhouse-sqlalchemy. The results are now published on Github at https://github.com/Altinity/clickhouse-python-examples. The remainder of this blog contains tips to help you integrate ClickHouse data to your notebooks.

ClickHouse and Python: Getting to Know the Clickhouse-driver Client

ClickHouse and Python: Getting to Know the Clickhouse-driver Client

Python is a force in the world of analytics due to powerful libraries like numpy along with a host of machine learning frameworks. ClickHouse is an increasingly popular store of data. As a Python data scientist you may wonder how to connect them. This post contains a review of the clickhouse-driver client.  It’s a solidly engineered module that is easy to use and integrates easily with standard tools like Jupyter Notebooks and Anaconda.  Clickhouse-driver is a great way to jump into ClickHouse Python connectivity.