19 December, 2019?
Many years ago a wise customer once said, “every report wants to be an Excel spreadsheet when it grows up!” I have always had a soft spot for Excel–it’s one of the most useful programs that Microsoft ever released. To honor that long lost user this article will show how to bring joy to your ClickHouse reporting by pulling data into Excel.
The key is to use the ClickHouse ODBC driver. We’ll walk through how to install the ODBC driver, create a clickhouse data source, and finally fetch data into Microsoft Excel. In the best Windows tutorial tradition we’ll show screenshots of all steps.
The examples that follow use Windows 10 and MS Office 365 version 1910 (32 bit). The ClickHouse ODBC driver is version 1.1.2.
Installing the ODBC driver
Driver builds are located on the ClickHouse ODBC driver project on GitHub. Go to the Release tab to find build files. Pick either the latest 64- or 32-bit .msi file and download. In our case we’ll use the 32-bit driver to match the Excel build.
Important Note: Ensure you have the latest ODBC driver version 1.1.2 or above. It fixes a few bugs that show up in ODBC on Windows.
Click the .msi file icon to start installation. In Windows 10 you have to step past a screen like the following. Click on the “More info” link to see a “Run anyway” button and press it.
You’ll now see the clickhouse-odbc Setup Wizard, which looks like the following.
Click “Next,” accept the agreement, and step through another screen to get the “Install” button. Press it to complete installation. Windows 10 will ask you for permission to update files. Once you give permission the installation will finish in a couple of seconds. You will see the following screen after a successful install.
Press “Finish” to end the process. The ODBC driver is now installed and ready for use.
Creating a ClickHouse data source
ODBC uses preconfigured connections to databases called data sources. The idea is that you set up all the information to locate the ClickHouse server and authenticate. The data source has a name that you can refer to in Excel when connecting to the server.
The first step is to find ODBC. On Windows 10 we will do this the easy way: just drop down to the search bar and type “ODBC”, then select ODBC Data Sources (32-bit). If you are using 64-bit Excel you’ll need to select the 64-bit version. Either way, a screen like the following will pop up.
User DSNs are data sources that are visible only to the account you are using. System DSNs are data sources that are visible to anyone using the Windows machine. We’ll press “Add” to create a User DSN. You’ll see a screen like the following.
Pick ClickHouseDriver and press “Finish.” You will now see a screen with connection information. The easiest way to fill out the cluster connection information is to put in a URL. Or you can specify the host and port explicitly. The following examples show two identical definitions of an https ClickHouse connection with a self-signed certificate. The user is ‘default’ and it requires a password.
Here are a few more details about ClickHouse data sources.
The URL follows the ClickHouse HTTP interface format. If you use the URL form, you must provide the port for https connections.
If you choose the host form, enter the ClickHouse host, port, and database (optional). In this case the driver will construct a URL for you.
SSLmode has two values. They are:
require – check server X509 certificate and fail the connection if certificate cannot be verified.
allow – ignore self-signed and bad certificates. Use with caution. (It’s handy for development or demos like this one.)
The login and password are optional. If you leave them out Excel will prompt for them.
Timeout sets how long to wait for queries to ClickHouse. It defaults to 30 seconds.
Once you press OK, the data source is saved. We can now use it with Excel.
Connecting to ClickHouse from Excel
Selecting Data from Whole Tables
The final step is to connect an Excel spreadsheet to ClickHouse and fetch some real data. Let’s start by firing up Excel and creating a new spreadsheet. Once you are in the spreadsheet, select the Data tab and press ‘Get Data’ on the left, as shown below.
In the drop-down that appears under “Get Data,” select “From Other Sources” followed by “From ODBC.” A dialog like the following will appear with a list of available data sources.
We pick the parade-32-url data source. At this point Excel will try to load data. The first time you use the data source Excel will pop up a window to enter connection credentials. Here’s an example. You can press the ‘Default or Custom’ tab to proceed with the credentials from the ODBC data source itself. Or you can enter the credentials again and press ‘Connect’ as shown below.
Assuming the connection is OK, Excel will show a handy navigator listing all tables in your ClickHouse server. For small datasets you can just navigate down to a table and select it, as in the following example.
Press the load button and the data will be pulled into Excel. You can then play with the values to your heart’s content using your favorite features of Excel, such as graphs or pivot tables. Here is an example. Old hands at data science will recognize the famous Iris data set from Ronald Fisher’s 1936 paper.
Selecting Data from Queries
ClickHouse tables tend to be pretty big, so selecting all the data is perhaps not such a good idea. Instead, we can select a smaller quantity of data using a query. Let’s do that as follows.
Select the “Data” table on the top menu followed by “Get Data.” Follow the same steps as before in the drop downs to find the parade-32-url data source. When it appears press “Advanced options” and enter a query. Here’s an example.
You’ll need to go through the same steps to preview and load data but in the end you’ll have the query result in a new tab and can work on it as before. Here’s an example of what the results might look like.
If you are an Excel fan like me, I’m sure you’ll be happy to see this.
Here are a few more tips for success with Excel.
Fixing Data Sources
The ClickHouse ODBC driver does not currently offer a way to test connections before use. If there is a problem with the data source you’ll find out when you try to fetch data. If this happens, open up the ODBC driver utility, fix the data source, and try again.
Bad Credentials on Data Sources
As we showed above, Excel lets you enter credentials for connections that don’t have them in the ODBC definition. If you enter the wrong credentials or they change later, your queries will no longer work. To fix this problem find the ‘Data Source Settings’ menu and select it. Excel will pop up the credentials screen so you can fix the problem.
The ClickHouse ODBC driver will happily try to select from tables that are far too large for Excel. You will see the query pulling down hundreds of thousands of rows (or more). Fortunately, Excel lets you cancel the query. Retry with a query that selects less data.
Bugs, Feedback, and Improvements
Altinity maintains the ClickHouse ODBC driver and has made a number of improvements over the last few months. If you run into a problem or have a question please log an issue on GitHub. If you have a bug fix, please file a pull request. Windows ODBC usage is relatively new, so there will certainly be a few problems to fix in the future. Getting help from the community makes this easier.
Finally, if you need major improvements to the driver for your projects, Altinity can implement new features as a commercial service. Please contact [email protected] and we will be glad to help.
Thanks for reading this article. Now get out there and have fun with ClickHouse and Excel!
p.s., I would like to thank Denis Glazachev, the ODBC driver maintainer, for his help in preparing this article. Denis has done a lot of recent heavy lifting to make the ODBC driver work better on all platforms, including Windows.