20 Sept 2018
This article shows different ways of how you can use ClickHouse in connection with other data sources to make queries use all of ClickHouse optimization features in order to make results come faster. Also, it is good practice when you have some infrastructure elements already linked to some other data sources or tools that supports ODBC.
Here we are going to show how one can use ClickHouse in order to get data from other databases using ODBC as well as how ClickHouse can provide the data stored within to the outside using ODBC.
Let’s say there is a case when you want to fetch the data from your external data source into ClickHouse and apply some grouping and sorting on top of that. The simplest way to achieve that is to use
odbc() function as the source for your query.
So the basic query for outside data using odbc() function would look like this:
SELECT * FROM odbc(‘DSN=connection’, ‘table’)
Here is one of the examples of querying the data from table ‘weather_observations’ defined in PostgreSQL database with psql DSN:
SELECT * FROM odbc('DSN=psql', '', 'weather_observations') LIMIT 10;
This return the set of 10 records from a respective table including all columns the table have. The tricky thing here is that you could not define the set of columns manually, it is possible to use the only wildcard at this time.
In fact before you can start using it you need to configure ODBC DSN connection first. And here is how you do it. In *nix operating systems you can use unixODBX application. This will require you to add sections in
odbcinst.ini files respectively. Here are examples of the setup for the outside database hosted on PostgreSQL:
odbc.ini [DEFAULT] Driver = psql [psql] Description = PostgreSQL connection Driver = PostgreSQL Unicode Database = nyc-taxi-data Servername = 192.168.11.3 UserName = nyc Password = nyc Port = 5432 Protocol = 9.3 ReadOnly = No RowVersioning = No ShowSystemTables = No odbcinst.ini [PostgreSQL ANSI] Description=PostgreSQL ODBC driver (ANSI version) Driver=psqlodbca.so Setup=libodbcpsqlS.so Debug=0 CommLog=1 UsageCount=1 [PostgreSQL Unicode] Description=PostgreSQL ODBC driver (Unicode version) Driver=psqlodbcw.so Setup=libodbcpsqlS.so Debug=0 CommLog=1 UsageCount=1
Once those are configured you can start using odbc() function in the way shown above. This query will select every record from table ‘table’ defined in the dedicated DB server connected through ODBC DSN.
ODBC Table Engine
When odbc() function is a way to start wheels spinning and try some things there is a time when you can realize that you want to create more complicated queries for the external data using ODBC in ClickHouse. In this case, you can use another option for ODBC in ClickHouse which is ODBC Table Engine.
Basically with ODBC Table Engine ClickHouse adds an abstraction layer on top of odbc() function implementation so you will be accessing external schemas using ClickHouse table syntax without including DSN connection details into the query all the time.
So here is a basic template of how you create a table using ODBC table engine:
CREATE TABLE xxx (<column_def1>, <column_def2>) Engine = ODBC('DSN=connection', '', '<table_name>’);
You should skip database name here when it is defined in odbc.ini configuration file. The pre-configuration of odbc.ini and odbcinst.ini files are required in the same way as for odbc() function (see odbc() function section for configuration examples).
Here is an example of a table and query to it based on an external table located in PostgreSQL:
CREATE TABLE weather ( station_id String, station_name String, date DateTime, max_temperature Int32, ) ENGINE = ODBC('DSN=psql', '', 'weather')
In this create query we assume that source ‘weather’ table has desired columns. Having that table created one can make a query like so:
SELECT station_name, max_temperature FROM weather WHERE date > today()-7
One important note to know about creating a table with ODBC engine is that column definitions should have compatible data types with the ones inside the source table and have the same names. This is also a key difference between
odbc() function and ODBC so you can manually set the list of columns to pull from the table on a per-query basis.
The definition of the ODBC dictionary requires one to specify odbc details inside the source section. So it is even less complicated definition in comparison to other dictionary sources (see more information about dictionaries here)
<dictionary> <source> <odbc> <table>taxi_zones</table> <connection_string>DSN=taxi_rides</connection_string> </odbc> </source> ... </dictionary>
As you can see all the information about connection details is still located in ODBC configuration files so all you need to do is to specify DSN string inside source section.
Yet again you need to specify dictionary structure using ClickHouse compatible data types for the columns inside the source database.
After all cases we use ClickHouse as a consumer of the data stored somewhere else here is a way you can use ClickHouse data from other database client using ODBC connector. There are DEB and RPM distributions available from Yandex and Altinity respectively. You can you use one of these in order to connect your ODBC client to a Clickhouse server.
Here we show how to use RPM package of ODBC Connector for ClickHouse.
It is easier to install packages from Altinity’s repository by adding its link to your package manager just like described here: https://packagecloud.io/Altinity/clickhouse/install
Once it’s done you can install connector as follows:
sudo yum install clickhouse-odbc
After that you can find examples of how to configure your ODBC DSN’s inside the package distribution here
/usr/local/share/clickhouse-odbc/unixodbc.ini and here
/usr/local/share/doc/odbc/examples/odbc.ini. Here is the source of
odbc.ini file just for example:
# Place this file to ~/.odbc.ini or /etc/odbc.ini [ODBC Data Sources] clickhouse_localhost = Clickhouse clickhouse_localhost_w = Clickhousew # Ansi [clickhouse_localhost] driver = Clickhouse description = Clickhouse driver #server = localhost #database = default #uid = default #port = 8123 #proto = http # Use https with 8443 port: #sslmode = require
As you can see there are some options are commented out so you may need to uncomment those in order to make the configuration setup properly. You can use those files as templates for your unixODBC installation tool odbcinst.
Once you configured ODBC DSN for clickhouse you can use any ODBC client to query a ClickHouse database. Here is a basic query you can start with using isql tool which comes with unixODBC package:
[[email protected] /]$ isql clickhouse +---------------------------------------+ | Connected! | | | | sql-statement | | help [tablename] | | quit | | | +---------------------------------------+ SQL> select * from test; +-----------+--------------------+-----+ | _date | _time | desc| +-----------+--------------------+-----+ | 2080-04-05| 2018-09-14 11:36:48| test| +-----------+--------------------+-----+ SQLRowCount returns 1 1 rows fetched
One more important note about version support for RPM ODBC connector package. Currently, clickhouse-odbc RPM only support ClickHouse server version starting from 18.09.x and it was tested only on Centos 7.