Accessing ClickHouse from Excel using Mondrian ROLAP engine

ClickHouse is a very powerful database for analytics. Microsoft Excel is one of the world’s most popular business applications. There are several ways to bring ClickHouse data to Excel spreadsheets. In this article we will explain how to connect Excel to ClickHouse using the Mondrian OLAP server. This approach has been pioneered in Sergei Semenkov’s excellent article on the same topic. Sergei kindly helped us to adopt it for Altinity.Cloud.

Introduction

A couple of years ago in our blog we already explained how to connect Excel to ClickHouse using an ODBC driver. Users have to install the ODBC driver on their PCs and create ClickHouse data source in Excel. See this article for more detail on this approach.

The other way to explore ClickHouse data in Excel is using Pentaho Mondrian OLAP server as a middleware. This approach has several advantages. First, it moves configuration responsibility from the user to a DBA or other app developer. There is no need to manage ODBC connections any more. Second, it allows users to define a multi-dimensional model for ClickHouse data that is easier to understand and use than plain database tables. The data is modelled in OLAP way as cubes and dimensions and can be easily plugged to Excel PivotTable. Other applications may run custom MDX queries on ClickHouse data. 

Mondrian is an open source project, as well as ClickHouse. Open source allows projects to change rapidly and adapt to user needs. We will use the eMondrian fork that has some compatibility fixes for ADOMD.NET and Excel clients. eMondrian also already includes ClickHouse JDBC driver. Mondrian can be hosted on a variety of popular java servers (Apache Tomcat, Jetty, Tiny Java Web Server). 

For the purpose of this article we installed Mondrian server into Altinity.Cloud. It is running at https://emondrian.demo.trial.altinity.cloud/emondrian/xmla, and you are free to use it for your own testing. We will keep it running for the demo purposes but can not guarantee any SLA. Please refer to the original article if you would like to install Mondrian in your own environment.

Mondrian and Excel

Let’s start connecting Excel to the Mondrian demo Foodmart database. Open Excel, select the Data tab, then Get Data -> From Database -> From Analysis Services.

Unfortunately, if you are running MacOS it does not work. The MacOS version of Excel does not support Analysis Services. Not surprisingly, Windows is still a preferred home for Microsoft products. If your Excel supports Analysis Services, then enter the address of the Mondrian service as Server name and click Next. 

Then choose a cube and click Finish. The PivotTable with cube data will appear.

This is test data. Let’s see how we can plug in ClickHouse.

Mondrian configuration for ClickHouse

In order to expose ClickHouse to Excel via Mondrian, we need to create a Mondrian schema for some ClickHouse table. The popular OnTime dataset is a good test example, and we have one hosted at Altinity.Cloud as well. We define the schema as two OLAP cubes. Cube ‘OnTime’ is based on data from an ontime table. Cube ‘ViewOnTime’ is an example of how to create a cube based on a custom query instead of a table.

On the Altinity.Cloud instance we have already configured Mondrian for OnTime dataset, so it is ready to be used. But for curious minds this is what you have to do in order to duplicate the setup in other environments.

First, you need to configure ClickHouse datasource for Mondrian. It is defined at: 

$TOMCATDIR/webapps/emondrian/WEB-INF/datasources.xml

The Altinity.Cloud instance with ontime dataset is running at github.demo.trial.altinity.cloud, so we put server name and credentials in DataSourceInfo tag. You can use your own ClickHouse server as well.

<DataSources>
  <DataSource>
    <DataSourceName>OnTime</DataSourceName>
    <DataSourceDescription>ClickHouse Sample Data</DataSourceDescription>
    <URL>http://localhost:8080/emondrian/xmla</URL>
    <DataSourceInfo>
Provider=mondrian;Jdbc=jdbc:clickhouse://github.demo.trial.altinity.cloud:8443/default?ssl=true&#38;user=mondrian&#38;password=mondrian;JdbcDrivers=ru.yandex.clickhouse.ClickHouseDriver
    </DataSourceInfo>
    <ProviderName>Mondrian</ProviderName>
    <ProviderType>MDP</ProviderType>
    <AuthenticationMode>Unauthenticated</AuthenticationMode>
    <Catalogs>
        <Catalog name="OnTime">
            <Definition>/WEB-INF/schema/OnTime.xml</Definition>
        </Catalog>
    </Catalogs>
  </DataSource>
</DataSources>

Mondrian schema is defined in an XML file that should be deployed to a web server. In this example it should be placed into the emondrian web application folder as follows:

$TOMCATDIR/webapps/emondrian/WEB-INF/schema/OnTime.xml

<?xml version="1.0"?>
<Schema name="OnTime">
<Cube name="OnTime">
  <Table name="ontime"/>
  <Dimension name="Date">
    <Hierarchy hasAll="true" allMemberName="All Dates">
      <Level name="Year" column="Year" uniqueMembers="true" type="Numeric"/>
      <Level name="Quarter" column="Quarter" uniqueMembers="true" type="Numeric">
        <CaptionExpression>
          <SQL dialect="generic">toString(Quarter) || ' ' || toString(Year)</SQL>
        </CaptionExpression>
      </Level>
      <Level name="Month" column="Month" uniqueMembers="true" type="Numeric">
        <CaptionExpression>
<SQL dialect="generic">
transform(Month, 
[1,2,3,4,5,6,7,8,9,10,11,12], 
['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'],
'Unknown') || toString(Year)
</SQL>
        </CaptionExpression> 
      </Level> 
      <Level name="Day" column="DayofMonth" uniqueMembers="true">
        <CaptionExpression>
          <SQL dialect="generic">toString(Year) || '-' || toString(Month) || '-' || toString(DayofMonth)</SQL>
        </CaptionExpression>
      </Level>
    </Hierarchy>
  </Dimension>
  <Dimension name="Unique Carrier">
    <Hierarchy hasAll="true" allMemberName="All Unique Carriers">
      <Level name="Unique Carrier" uniqueMembers="true">
        <KeyExpression>
          <SQL dialect="generic">toStringCutToZero(UniqueCarrier)</SQL>
        </KeyExpression>
      </Level>
    </Hierarchy>
  </Dimension>
  <Dimension name="Destination">
    <Hierarchy hasAll="true" allMemberName="All Destinations">
      <Level name="Destination State Name" column="DestStateName" uniqueMembers="true"/>
      <Level name="Destination City" column="DestCityName" uniqueMembers="true"/>
    </Hierarchy>
  </Dimension>
  <Dimension name="Origin">
    <Hierarchy hasAll="true" allMemberName="All Origins">
      <Level name="Origin State Name" column="OriginStateName" uniqueMembers="true"/>
      <Level name="Origin City" column="OriginCityName" uniqueMembers="true"/>
    </Hierarchy>
  </Dimension>
  <Measure name="Flights" column="Flights" aggregator="sum" formatString="#,###"/>
  <Measure name="Air Time" column="AirTime" aggregator="sum" formatString="#,###"/>
  <Measure name="Distance" column="Distance" aggregator="sum" formatString="#,###"/>
  <Measure name="Dep Delay" column="DepDelay" aggregator="sum" formatString="#,###"/>
  <Measure name="Carrier Delay" column="CarrierDelay" aggregator="sum" formatString="#,###"/>
  <Measure name="Weather Delay" column="WeatherDelay" aggregator="sum" formatString="#,###"/>
</Cube>
<Cube name="ViewOnTime">
<View alias="ViewOnTime">
<SQL dialect="generic">
<![CDATA[select * from ontime]]>
</SQL>
</View>
  <Dimension name="Destination City Name" foreignKey="DestCityName">
    <Hierarchy hasAll="true" allMemberName="All Destinations">
      <Level name="Destination City Name" column="DestCityName" uniqueMembers="true"/>
    </Hierarchy>
  </Dimension>
  <Dimension name="Origin City Name" foreignKey="OriginCityName">
    <Hierarchy hasAll="true" allMemberName="All Origin Cities">
      <Level name="Origin City Name" column="OriginCityName" uniqueMembers="true"/>
    </Hierarchy>
  </Dimension>
  <Measure name="Dep Delay" column="DepDelay" aggregator="sum" formatString="#,###"/>
</Cube>

</Schema>

You can see how dimensions, hierarchies and measures are defined.

Once all configuration is completed, make sure that files are owned by a tomcat user and restart the server. 

Trying everything together

Now we are ready to create a new connection to the Mondrian server in Excel and play with cubes from the OnTime schema. Excel will let you choose the cube. Pick one to browse its data in a PivotTable.

That’s it! We have ClickHouse ontime dataset exposed as an OLAP model in Excel in a user friendly way!

Conclusion

XMLA is a popular cross-platform protocol for querying multi-dimensional data using MDX query language. It is extensively used in Microsoft products, but also supported by popular BI tools such as Tableau, MicroStrategy and others. Having MDX integration layer for ClickHouse allows users of those products to access ClickHouse data without ODBC hassles. The OLAP model makes it easier to use as well. 

For years, the only possibility to turn ClickHouse into an OLAP engine was an open source Olaper project from WonderSoft, which could be adopted for ClickHouse. Mondrian is a much more mature solution, and it raises ClickHouse MDX integration to a new level. 

We will continue to use Mondrian with ClickHouse and test compatibility with other popular tools besides Excel. We are starting to try Tableau and PowerBI and hope to report on this later. Contact us at info@altinity.com if you have specific use cases that you would like to explore. We can work with you to make your preferred tools work better. Meanwhile, stay tuned! 

Share

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.