ClickHouse + Graphouse introduction

 

May 16, 2018

Introduction

A few weeks ago we described how to set up ‘ClickHouse monitoring with Graphite’. Today we are going to look at this from a different angle: how ClickHouse can be used as a Graphite backend.

Graphite

Graphite is positioned as an enterprise-scale monitoring tool that runs well on cheap hardware, and it has gained significant popularity lately. Graphite consists of 3 software components:

  • carbon – a daemon that listens for data
  • whisper – a database layer for storing data
  • graphite webapp – a webapp that renders graphs

All is going well with Graphite until incoming data stream gets big. What does big exactly mean — it really depends. At some point storage layer, which is carbon + whisper, may start performing not very well. In particular, it has following problems that start to be critical with huge volumes and critical systems:

  • Lack of failover and high availability
  • High disk I/O utilization.
  • High disk space utilization

Let’s walk over these issues in more details.

Lack of replication possibilities

When monitoring system needs to be highly available, metrics data has to be replicated to another location for failover. The only way to do that with Graphite, it is to duplicate data stream to each instance of carbon + whisper running on different servers. That is possible, but fragile since if one of those streams or copies fails, the monitoring data becomes incomplete and inconsistent. That could be a major issue since it may be not easy to recover significantly lowering the value of ‘replica’. It would be much more convenient to have replication and consistency natively built in the monitoring system.

High disk I/O utilization

whisper consumes disk I/O heavily and not very efficiently. Under some conditions 100% disk I/O utilization is possible.

High disk space utilization

Graphite does not have enough control over retention of metrics data. That results in suboptimal disk space usage since different metrics often require different retention.

Of course, each of those issues is not a huge problem per se, but the more data you have the more problems you might be having. In short, Graphite does not scale well. But there is a way to change it and to make it more scalable.

Graphouse

Graphouse allows you to use ClickHouse as a Graphite storage.

ClickHouse is a very fast and scalable analytical DBMS. Used as Graphite backend it provides:

  • Data distribution and replication out-of-the-box
  • Efficient disk I/O usage
  • Lower disk space utilization

Solution architecture

As described earlier, Graphite consists of 3 software components

  • carbon – a daemon that listens for data
  • whisper – a database layer for storing data
  • graphite webapp – a webapp that renders graphs

Graphouse substitutes carbon and whisper components, presenting its own data accumulation daemon and ClickHouse as a data storage layer. Ultimately, you can replace graphite webapp with another graphing tool as well, leaving no native Graphite components left.

So let’s walk through the Graphouse installation and setup procedures. There are three major steps:

  • Install ClickHouse (it would be used as a data storage layer)
  • Install Graphouse (it would be used as a metrics processing layer)
  • Setup Graphouse – ClickHouse integration
  • Setup Graphite-web as a graphing tool

Let’s go.

Install ClickHouse

Most likely you already have ClickHouse installed, in this case just move to Configure ClickHouse section. ClickHouse installation is explained in several sources, such as:

Configure ClickHouse

Let’s configure ClickHouse to be a data storage for Graphouse.

Create rollup config file /etc/clickhouse-server/conf.d/graphite_rollup.xml.

This file contains settings for thinning data for Graphite.

cat /etc/clickhouse-server/conf.d/graphite_rollup.xml

<yandex>
<graphite_rollup>
    <path_column_name>metric</path_column_name>
    <time_column_name>timestamp</time_column_name>
    <value_column_name>value</value_column_name>
    <version_column_name>updated</version_column_name>
    <pattern>
        <regexp>^five_sec</regexp>
        <function>any</function>
        <retention>
            <age>0</age>
            <precision>5</precision>
        </retention>
        <retention>
            <age>2592000</age>
            <precision>60</precision>
        </retention>
        <retention>
            <age>31104000</age>
            <precision>600</precision>
        </retention>
    </pattern>

    <pattern>
        <regexp>^one_min</regexp>
        <function>any</function>
        <retention>
            <age>0</age>
            <precision>60</precision>
        </retention>
        <retention>
            <age>2592000</age>
            <precision>300</precision>
        </retention>
        <retention>
            <age>31104000</age>
            <precision>1800</precision>
        </retention>
    </pattern>

    <pattern>
        <regexp>^five_min</regexp>
        <function>any</function>
        <retention>
            <age>0</age>
            <precision>300</precision>
        </retention>
        <retention>
            <age>2592000</age>
            <precision>600</precision>
        </retention>
        <retention>
            <age>31104000</age>
            <precision>1800</precision>
        </retention>
    </pattern>

    <pattern>
        <regexp>^one_sec</regexp>
        <function>any</function>
        <retention>
            <age>0</age>
            <precision>1</precision>
        </retention>
        <retention>
            <age>2592000</age>
            <precision>60</precision>
        </retention>
        <retention>
            <age>31104000</age>
            <precision>300</precision>
        </retention>
    </pattern>

    <pattern>
        <regexp>^one_hour</regexp>
        <function>any</function>
        <retention>
            <age>0</age>
            <precision>3600</precision>
        </retention>
        <retention>
            <age>31104000</age>
            <precision>86400</precision>
        </retention>
    </pattern>

    <pattern>
        <regexp>^ten_min</regexp>
        <function>any</function>
        <retention>
            <age>0</age>
            <precision>600</precision>
        </retention>
        <retention>
            <age>31104000</age>
            <precision>3600</precision>
        </retention>
    </pattern>

    <pattern>
        <regexp>^one_day</regexp>
        <function>any</function>
        <retention>
            <age>0</age>
            <precision>86400</precision>
        </retention>
    </pattern>

    <pattern>
        <regexp>^half_hour</regexp>
        <function>any</function>
        <retention>
            <age>0</age>
            <precision>1800</precision>
        </retention>
        <retention>
            <age>31104000</age>
            <precision>3600</precision>
        </retention>
    </pattern>

    <default>
        <function>any</function>
        <retention>
            <age>0</age>
            <precision>60</precision>
        </retention>
        <retention>
            <age>2592000</age>
            <precision>300</precision>
        </retention>
        <retention>
            <age>31104000</age>
            <precision>1800</precision>
        </retention>
    </default>
</graphite_rollup>
</yandex>

ClickHouse needs to be restarted after this.

Create ClickHouse tables

These are tables used to store metrics for Graphite.

CREATE DATABASE graphite;

CREATE TABLE graphite.metrics(
    date Date DEFAULT toDate(0),
    name String,
    level UInt16,
    parent String,
    updated DateTime DEFAULT now(),
    status Enum8(
        'SIMPLE' = 0,
        'BAN' = 1,
        'APPROVED' = 2,
        'HIDDEN' = 3,
        'AUTO_HIDDEN' = 4
    )
) ENGINE = ReplacingMergeTree(date, (parent, name), 1024, updated);

CREATE TABLE graphite.data(
    metric String,
    value Float64,
    timestamp UInt32,
    date Date,
    updated UInt32
) ENGINE = GraphiteMergeTree(date, (metric, timestamp), 8192, 'graphite_rollup');

Note the usage of GraphiteMergeTree table engine, that is explicitly designed for rollup (thinning and aggregating/averaging) Graphite data. More details can be found in ClickHouse documentation.

For ClickHouse cluster, graphite.metrics and graphite.data can be certainly converted to distributed or/and replicated tables.

Install Graphouse

Add Graphouse debian repo.

In /etc/apt/sources.list (or in a separate file, like /etc/apt/sources.list.d/graphouse.list), add repository: deb http://repo.yandex.ru/graphouse/xenial stable main. On other versions of Ubuntu, replace xenial with your version. Such as:

sudo bash -c 'echo "deb http://repo.yandex.ru/graphouse/xenial stable main" >> /etc/apt/sources.list'
sudo apt update

Install JDK8.

sudo add-apt-repository ppa:webupd8team/java
sudo apt update
sudo apt install -y oracle-java8-installer

Package oracle-java8-installer contains a script to install Java.

Set Java 8 as your default Java version.

sudo apt install -y oracle-java8-set-default

Let’s verify the installed version.

java -version 

java version "1.8.0_171"
Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)

Setup JAVA_HOME and JRE_HOME Variable

You must set JAVA_HOME and JRE_HOME environment variables, which are used by many Java applications to find Java libraries during runtime. Set these variables in /etc/environment file:

sudo bash -c 'cat >> /etc/environment <<EOL
JAVA_HOME=/usr/lib/jvm/java-8-oracle
JRE_HOME=/usr/lib/jvm/java-8-oracle/jre
EOL'

Finally Graphouse

sudo apt install graphouse

Configure Graphouse

Let’s configure Graphouse to store metrics in ClickHouse

Edit properties in graphouse config /etc/graphouse/graphouse.properties

Setup ClickHouse access. Make sure graphouse.clickhouse.host is correct. WARNING// comments are erroneous in properties file!

graphouse.clickhouse.host=localhost
graphouse.clickhouse.hosts=${graphouse.clickhouse.host}
graphouse.clickhouse.port=8123
graphouse.clickhouse.db=graphite
graphouse.clickhouse.user=
graphouse.clickhouse.password=

Set graphouse.clickhouse.retention-config to graphite_rollup as we earlier defined in

<yandex>
  <graphite_rollup>
    ...
  </graphite_rollup>
</yandex>

I.e.:

graphouse.clickhouse.retention-config=graphite_rollup

Note, config name for graphouse.clickhouse.retention-config is not a file path for /etc/clickhouse-server/conf.d/graphite_rollup.xml. You should use one of names from ClickHouse system.graphite_retentions table.

Start graphouse

sudo /etc/init.d/graphouse start

At this point, Graphouse is up and running and ready to accept metrics and store them in ClickHouse.

Next, let’s close the loop and setup ClickHouse to write its own metrics into Graphouse.

Setup ClickHouse to report metrics into Graphouse

In a recent article ‘ClickHouse monitoring with Graphite’ we described how to setup ClickHouse monitoring with Graphite. Since Graphouse is Graphite with different internals, it is possible to configure ClickHouse to write metrics to Graphouse and store them in itself. It is probably not the best thing from an operational standpoint, but still worths considering.

Edit /etc/clickhouse-server/config.xml and append something like the following:

<graphite>
        <host>127.0.0.1</host>
        <port>2003</port>
        <timeout>0.1</timeout>
        <interval>60</interval>
        <root_path>one_min_cr_plain</root_path>

        <metrics>true</metrics>
        <events>true</events>
        <asynchronous_metrics>true</asynchronous_metrics>
    </graphite>
    <graphite>
        <host>127.0.0.1</host>
        <port>2003</port>
        <timeout>0.1</timeout>
        <interval>1</interval>
        <root_path>one_sec_cr_plain</root_path>

        <metrics>true</metrics>
        <events>true</events>
        <asynchronous_metrics>false</asynchronous_metrics>
    </graphite>

Settings description:

  • host – host where Graphite is running.
  • port – plain text receiver port (2003 is default).
  • interval – interval for sending data from ClickHouse, in seconds.
  • timeout – timeout for sending data, in seconds.
  • root_path – prefix used by Graphite.
  • metrics – should data from system_tables-system.metrics table be sent.
  • events – should data from system_tables-system.events table be sent.
  • asynchronous_metrics – should data from system_tables-system.asynchronous_metrics table be sent.

Multiple <graphite> clauses can be configured for sending different data at different intervals.

ClickHouse needs to be restarted in order settings to take effect.

sudo /etc/init.d/clickhouse-server restart

Now ClickHouse starts writing metrics to Graphouse, that are in turn stored back to ClickHouse tables and can be easily checked:

clickhouse-client -q "select count(*) from graphite.data"
clickhouse-client -q "select count(*) from graphite.metrics"

Graphite-web

Graphite-web is UI to visualize metrics. Naturally, it is installed as part of Graphite, but with Graphouse it needs to be installed separately without carbon or whisper. It is not required for Graphouse, once metrics are in ClickHouse any visualization can be used, e.g. Grafana. So the rest of the article makes sense only if graphite-web is really necessary.

Install Graphite-web

Graphite has detailed installation docs and configuration docs

Following steps are required in order to install and configure graphite-web.

sudo apt install -y python3-pip python3-dev libcairo2-dev libffi-dev build-essential
export PYTHONPATH="/opt/graphite/lib/:/opt/graphite/webapp/"
sudo pip3 install --no-binary=:all: https://github.com/graphite-project/graphite-web/tarball/master

sudo apt install -y gunicorn3
sudo apt install -y nginx

sudo touch /var/log/nginx/graphite.access.log
sudo touch /var/log/nginx/graphite.error.log
sudo chmod 640 /var/log/nginx/graphite.*
sudo chown www-data:www-data /var/log/nginx/graphite.*

Create /etc/nginx/sites-available/graphite ngix config file with the following content: Write the following configuration in /etc/nginx/sites-available/graphite (availabe as a file)

upstream graphite {
    server 127.0.0.1:8080 fail_timeout=0;
}

server {
    listen 80 default_server;

    server_name HOSTNAME;

    root /opt/graphite/webapp;

    access_log /var/log/nginx/graphite.access.log;
    error_log  /var/log/nginx/graphite.error.log;

    location = /favicon.ico {
        return 204;
    }

    # serve static content from the "content" directory
    location /static {
        alias /opt/graphite/webapp/content;
        expires max;
    }

    location / {
        try_files $uri @graphite;
    }

    location @graphite {
        proxy_pass_header Server;
        proxy_set_header Host $http_host;
        proxy_redirect off;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Scheme $scheme;
        proxy_connect_timeout 10;
        proxy_read_timeout 10;
        proxy_pass http://graphite;
    }
}

Setup server_name host name in nginx config

sudo vim /etc/nginx/sites-available/graphite

Enable configuration for nginx:

sudo ln -s /etc/nginx/sites-available/graphite /etc/nginx/sites-enabled
sudo rm -f /etc/nginx/sites-enabled/default

And reload nginx to use new configuration:

sudo service nginx reload

Setup graphite-web DB

We need to create the database tables used by the graphite webapp.

sudo bash -c 'PYTHONPATH=/opt/graphite/webapp django-admin.py migrate --settings=graphite.settings --run-syncdb'

Ensure database created:

ls -l /opt/graphite/storage/graphite.db
-rw-r--r-- 1 root root 98304 Apr 19 22:49 /opt/graphite/storage/graphite.db

If your webapp is running as the ‘nobody’ user, you will need to fix the permissions like this:

sudo chown nobody:nogroup /opt/graphite/storage/graphite.db

Setup connection to Graphouse

Add graphouse plugin /opt/graphouse/bin/graphouse.py to your graphite webapp root dir. For example, if you dir is /opt/graphite/webapp/graphite/ use:

sudo ln -fs /opt/graphouse/bin/graphouse.py /opt/graphite/webapp/graphite/graphouse.py

Configure storage finder in your /opt/graphite/webapp/graphite/local_settings.py Open /opt/graphite/webapp/graphite/local_settings.py in editor and add:

STORAGE_FINDERS = (
    'graphite.graphouse.GraphouseFinder',
)

Start Graphite-web

cd /opt/graphite/conf
sudo cp graphite.wsgi.example graphite.wsgi
sudo bash -c 'export PYTHONPATH="/opt/graphite/lib/:/opt/graphite/webapp/"; gunicorn3 --bind=127.0.0.1:8080 graphite.wsgi:application'

Checking it works

Point your browser to the host, where graphite-web is running. You should see something like this:

Graphite screenshot


Conclusion

We have demonstrated how Graphouse + ClickHouse as a data storage layer can be installed and used. That provides much more efficient storage for Graphite preserving all the functionality. For the simplicity, we configured a non-replicated GraphiteMergeTree table, but it can be certainly replaced with ReplicatedGraphiteMergeTree for mission-critical monitoring applications. Also, if one server is not enough, it is possible to distribute monitoring data into multiple shards. That is not possible with Graphite but becomes easy with Graphouse thanks to ClickHouse backend.

 
Share

One Comment

  1. I am trying to use graphouse, but It seems like graphouse drop metrics.
    Ex: I am sending more than 1000 metrics from a container to graphouse, but instead of saving all it is only saving 800. I am able to see that on graphouse container all metrics are received but from graphouse to clickhouse it is not sending all metrics.
    If I send the same metrics from shell all works fine.

Comments are closed.