May 16, 2018
Introduction
A few weeks ago we described how to set up ‘ClickHouse monitoring with Graphite’. Today we are going to look at this from a different angle: how ClickHouse can be used as a Graphite backend.
Graphite
Graphite is positioned as an enterprise-scale monitoring tool that runs well on cheap hardware, and it has gained significant popularity lately. Graphite consists of 3 software components:
carbon
– a daemon that listens for datawhisper
– a database layer for storing datagraphite webapp
– a webapp that renders graphs
All is going well with Graphite until incoming data stream gets big. What does big exactly mean — it really depends. At some point storage layer, which is carbon
+ whisper
, may start performing not very well. In particular, it has following problems that start to be critical with huge volumes and critical systems:
- Lack of failover and high availability
- High disk I/O utilization.
- High disk space utilization
Let’s walk over these issues in more details.
Lack of replication possibilities
When monitoring system needs to be highly available, metrics data has to be replicated to another location for failover. The only way to do that with Graphite, it is to duplicate data stream to each instance of carbon + whisper
running on different servers. That is possible, but fragile since if one of those streams or copies fails, the monitoring data becomes incomplete and inconsistent. That could be a major issue since it may be not easy to recover significantly lowering the value of ‘replica’. It would be much more convenient to have replication and consistency natively built in the monitoring system.
High disk I/O utilization
whisper
consumes disk I/O heavily and not very efficiently. Under some conditions 100% disk I/O utilization is possible.
High disk space utilization
Graphite does not have enough control over retention of metrics data. That results in suboptimal disk space usage since different metrics often require different retention.
Of course, each of those issues is not a huge problem per se, but the more data you have the more problems you might be having. In short, Graphite does not scale well. But there is a way to change it and to make it more scalable.
Graphouse
Graphouse allows you to use ClickHouse as a Graphite storage.
ClickHouse is a very fast and scalable analytical DBMS. Used as Graphite backend it provides:
- Data distribution and replication out-of-the-box
- Efficient disk I/O usage
- Lower disk space utilization
Solution architecture
As described earlier, Graphite consists of 3 software components
carbon
– a daemon that listens for datawhisper
– a database layer for storing datagraphite webapp
– a webapp that renders graphs
Graphouse substitutes carbon
and whisper
components, presenting its own data accumulation daemon and ClickHouse as a data storage layer. Ultimately, you can replace graphite webapp
with another graphing tool as well, leaving no native Graphite components left.
So let’s walk through the Graphouse installation and setup procedures. There are three major steps:
- Install ClickHouse (it would be used as a data storage layer)
- Install Graphouse (it would be used as a metrics processing layer)
- Setup Graphouse – ClickHouse integration
- Setup Graphite-web as a graphing tool
Let’s go.
Install ClickHouse
Most likely you already have ClickHouse installed, in this case just move to Configure ClickHouse section. ClickHouse installation is explained in several sources, such as:
- for deb-based systems
- for rpm-based systems
Configure ClickHouse
Let’s configure ClickHouse to be a data storage for Graphouse.
Create rollup config file /etc/clickhouse-server/conf.d/graphite_rollup.xml
.
This file contains settings for thinning data for Graphite.
cat /etc/clickhouse-server/conf.d/graphite_rollup.xml
<yandex>
<graphite_rollup>
<path_column_name>metric</path_column_name>
<time_column_name>timestamp</time_column_name>
<value_column_name>value</value_column_name>
<version_column_name>updated</version_column_name>
<pattern>
<regexp>^five_sec</regexp>
<function>any</function>
<retention>
<age>0</age>
<precision>5</precision>
</retention>
<retention>
<age>2592000</age>
<precision>60</precision>
</retention>
<retention>
<age>31104000</age>
<precision>600</precision>
</retention>
</pattern>
<pattern>
<regexp>^one_min</regexp>
<function>any</function>
<retention>
<age>0</age>
<precision>60</precision>
</retention>
<retention>
<age>2592000</age>
<precision>300</precision>
</retention>
<retention>
<age>31104000</age>
<precision>1800</precision>
</retention>
</pattern>
<pattern>
<regexp>^five_min</regexp>
<function>any</function>
<retention>
<age>0</age>
<precision>300</precision>
</retention>
<retention>
<age>2592000</age>
<precision>600</precision>
</retention>
<retention>
<age>31104000</age>
<precision>1800</precision>
</retention>
</pattern>
<pattern>
<regexp>^one_sec</regexp>
<function>any</function>
<retention>
<age>0</age>
<precision>1</precision>
</retention>
<retention>
<age>2592000</age>
<precision>60</precision>
</retention>
<retention>
<age>31104000</age>
<precision>300</precision>
</retention>
</pattern>
<pattern>
<regexp>^one_hour</regexp>
<function>any</function>
<retention>
<age>0</age>
<precision>3600</precision>
</retention>
<retention>
<age>31104000</age>
<precision>86400</precision>
</retention>
</pattern>
<pattern>
<regexp>^ten_min</regexp>
<function>any</function>
<retention>
<age>0</age>
<precision>600</precision>
</retention>
<retention>
<age>31104000</age>
<precision>3600</precision>
</retention>
</pattern>
<pattern>
<regexp>^one_day</regexp>
<function>any</function>
<retention>
<age>0</age>
<precision>86400</precision>
</retention>
</pattern>
<pattern>
<regexp>^half_hour</regexp>
<function>any</function>
<retention>
<age>0</age>
<precision>1800</precision>
</retention>
<retention>
<age>31104000</age>
<precision>3600</precision>
</retention>
</pattern>
<default>
<function>any</function>
<retention>
<age>0</age>
<precision>60</precision>
</retention>
<retention>
<age>2592000</age>
<precision>300</precision>
</retention>
<retention>
<age>31104000</age>
<precision>1800</precision>
</retention>
</default>
</graphite_rollup>
</yandex>
ClickHouse needs to be restarted after this.
Create ClickHouse tables
These are tables used to store metrics for Graphite.
CREATE DATABASE graphite;
CREATE TABLE graphite.metrics(
date Date DEFAULT toDate(0),
name String,
level UInt16,
parent String,
updated DateTime DEFAULT now(),
status Enum8(
'SIMPLE' = 0,
'BAN' = 1,
'APPROVED' = 2,
'HIDDEN' = 3,
'AUTO_HIDDEN' = 4
)
) ENGINE = ReplacingMergeTree(date, (parent, name), 1024, updated);
CREATE TABLE graphite.data(
metric String,
value Float64,
timestamp UInt32,
date Date,
updated UInt32
) ENGINE = GraphiteMergeTree(date, (metric, timestamp), 8192, 'graphite_rollup');
Note the usage of GraphiteMergeTree table engine, that is explicitly designed for rollup (thinning and aggregating/averaging) Graphite data. More details can be found in ClickHouse documentation.
For ClickHouse cluster, graphite.metrics and graphite.data can be certainly converted to distributed or/and replicated tables.
Install Graphouse
Add Graphouse debian repo.
In /etc/apt/sources.list
(or in a separate file, like /etc/apt/sources.list.d/graphouse.list
), add repository: deb http://repo.yandex.ru/graphouse/xenial stable main
. On other versions of Ubuntu, replace xenial
with your version. Such as:
sudo bash -c 'echo "deb http://repo.yandex.ru/graphouse/xenial stable main" >> /etc/apt/sources.list'
sudo apt update
Install JDK8.
sudo add-apt-repository ppa:webupd8team/java
sudo apt update
sudo apt install -y oracle-java8-installer
Package oracle-java8-installer
contains a script to install Java.
Set Java 8 as your default Java version.
sudo apt install -y oracle-java8-set-default
Let’s verify the installed version.
java -version
java version "1.8.0_171"
Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
Setup JAVA_HOME and JRE_HOME Variable
You must set JAVA_HOME
and JRE_HOME
environment variables, which are used by many Java applications to find Java libraries during runtime. Set these variables in /etc/environment
file:
sudo bash -c 'cat >> /etc/environment <<EOL
JAVA_HOME=/usr/lib/jvm/java-8-oracle
JRE_HOME=/usr/lib/jvm/java-8-oracle/jre
EOL'
Finally Graphouse
sudo apt install graphouse
Configure Graphouse
Let’s configure Graphouse to store metrics in ClickHouse
Edit properties in graphouse config /etc/graphouse/graphouse.properties
Setup ClickHouse access. Make sure graphouse.clickhouse.host
is correct. WARNING//
comments are erroneous in properties file!
graphouse.clickhouse.host=localhost
graphouse.clickhouse.hosts=${graphouse.clickhouse.host}
graphouse.clickhouse.port=8123
graphouse.clickhouse.db=graphite
graphouse.clickhouse.user=
graphouse.clickhouse.password=
Set graphouse.clickhouse.retention-config
to graphite_rollup
as we earlier defined in
<yandex>
<graphite_rollup>
...
</graphite_rollup>
</yandex>
I.e.:
graphouse.clickhouse.retention-config=graphite_rollup
Note, config name for graphouse.clickhouse.retention-config
is not a file path for /etc/clickhouse-server/conf.d/graphite_rollup.xml
. You should use one of names from ClickHouse system.graphite_retentions
table.
Start graphouse
sudo /etc/init.d/graphouse start
At this point, Graphouse is up and running and ready to accept metrics and store them in ClickHouse.
Next, let’s close the loop and setup ClickHouse to write its own metrics into Graphouse.
Setup ClickHouse to report metrics into Graphouse
In a recent article ‘ClickHouse monitoring with Graphite’ we described how to setup ClickHouse monitoring with Graphite. Since Graphouse is Graphite with different internals, it is possible to configure ClickHouse to write metrics to Graphouse and store them in itself. It is probably not the best thing from an operational standpoint, but still worths considering.
Edit /etc/clickhouse-server/config.xml
and append something like the following:
<graphite>
<host>127.0.0.1</host>
<port>2003</port>
<timeout>0.1</timeout>
<interval>60</interval>
<root_path>one_min_cr_plain</root_path>
<metrics>true</metrics>
<events>true</events>
<asynchronous_metrics>true</asynchronous_metrics>
</graphite>
<graphite>
<host>127.0.0.1</host>
<port>2003</port>
<timeout>0.1</timeout>
<interval>1</interval>
<root_path>one_sec_cr_plain</root_path>
<metrics>true</metrics>
<events>true</events>
<asynchronous_metrics>false</asynchronous_metrics>
</graphite>
Settings description:
host
– host where Graphite is running.port
– plain text receiver port (2003 is default).interval
– interval for sending data from ClickHouse, in seconds.timeout
– timeout for sending data, in seconds.root_path
– prefix used by Graphite.metrics
– should data from system_tables-system.metrics table be sent.events
– should data from system_tables-system.events table be sent.asynchronous_metrics
– should data from system_tables-system.asynchronous_metrics table be sent.
Multiple <graphite>
clauses can be configured for sending different data at different intervals.
ClickHouse needs to be restarted in order settings to take effect.
sudo /etc/init.d/clickhouse-server restart
Now ClickHouse starts writing metrics to Graphouse, that are in turn stored back to ClickHouse tables and can be easily checked:
clickhouse-client -q "select count(*) from graphite.data"
clickhouse-client -q "select count(*) from graphite.metrics"
Graphite-web
Graphite-web is UI to visualize metrics. Naturally, it is installed as part of Graphite, but with Graphouse it needs to be installed separately without carbon
or whisper
. It is not required for Graphouse, once metrics are in ClickHouse any visualization can be used, e.g. Grafana. So the rest of the article makes sense only if graphite-web is really necessary.
Install Graphite-web
Graphite has detailed installation docs and configuration docs
Following steps are required in order to install and configure graphite-web.
sudo apt install -y python3-pip python3-dev libcairo2-dev libffi-dev build-essential
export PYTHONPATH="/opt/graphite/lib/:/opt/graphite/webapp/"
sudo pip3 install --no-binary=:all: https://github.com/graphite-project/graphite-web/tarball/master
sudo apt install -y gunicorn3
sudo apt install -y nginx
sudo touch /var/log/nginx/graphite.access.log
sudo touch /var/log/nginx/graphite.error.log
sudo chmod 640 /var/log/nginx/graphite.*
sudo chown www-data:www-data /var/log/nginx/graphite.*
Create /etc/nginx/sites-available/graphite
ngix config file with the following content: Write the following configuration in /etc/nginx/sites-available/graphite
(availabe as a file)
upstream graphite {
server 127.0.0.1:8080 fail_timeout=0;
}
server {
listen 80 default_server;
server_name HOSTNAME;
root /opt/graphite/webapp;
access_log /var/log/nginx/graphite.access.log;
error_log /var/log/nginx/graphite.error.log;
location = /favicon.ico {
return 204;
}
# serve static content from the "content" directory
location /static {
alias /opt/graphite/webapp/content;
expires max;
}
location / {
try_files $uri @graphite;
}
location @graphite {
proxy_pass_header Server;
proxy_set_header Host $http_host;
proxy_redirect off;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Scheme $scheme;
proxy_connect_timeout 10;
proxy_read_timeout 10;
proxy_pass http://graphite;
}
}
Setup server_name
host name in nginx config
sudo vim /etc/nginx/sites-available/graphite
Enable configuration for nginx:
sudo ln -s /etc/nginx/sites-available/graphite /etc/nginx/sites-enabled
sudo rm -f /etc/nginx/sites-enabled/default
And reload nginx to use new configuration:
sudo service nginx reload
Setup graphite-web DB
We need to create the database tables used by the graphite webapp.
sudo bash -c 'PYTHONPATH=/opt/graphite/webapp django-admin.py migrate --settings=graphite.settings --run-syncdb'
Ensure database created:
ls -l /opt/graphite/storage/graphite.db
-rw-r--r-- 1 root root 98304 Apr 19 22:49 /opt/graphite/storage/graphite.db
If your webapp is running as the ‘nobody’ user, you will need to fix the permissions like this:
sudo chown nobody:nogroup /opt/graphite/storage/graphite.db
Setup connection to Graphouse
Add graphouse plugin /opt/graphouse/bin/graphouse.py
to your graphite webapp root dir. For example, if you dir is /opt/graphite/webapp/graphite/
use:
sudo ln -fs /opt/graphouse/bin/graphouse.py /opt/graphite/webapp/graphite/graphouse.py
Configure storage finder in your /opt/graphite/webapp/graphite/local_settings.py
Open /opt/graphite/webapp/graphite/local_settings.py
in editor and add:
STORAGE_FINDERS = (
'graphite.graphouse.GraphouseFinder',
)
Start Graphite-web
cd /opt/graphite/conf
sudo cp graphite.wsgi.example graphite.wsgi
sudo bash -c 'export PYTHONPATH="/opt/graphite/lib/:/opt/graphite/webapp/"; gunicorn3 --bind=127.0.0.1:8080 graphite.wsgi:application'
Checking it works
Point your browser to the host, where graphite-web is running. You should see something like this:
Conclusion
We have demonstrated how Graphouse + ClickHouse as a data storage layer can be installed and used. That provides much more efficient storage for Graphite preserving all the functionality. For the simplicity, we configured a non-replicated GraphiteMergeTree table, but it can be certainly replaced with ReplicatedGraphiteMergeTree for mission-critical monitoring applications. Also, if one server is not enough, it is possible to distribute monitoring data into multiple shards. That is not possible with Graphite but becomes easy with Graphouse thanks to ClickHouse backend.
I am trying to use graphouse, but It seems like graphouse drop metrics.
Ex: I am sending more than 1000 metrics from a container to graphouse, but instead of saving all it is only saving 800. I am able to see that on graphouse container all metrics are received but from graphouse to clickhouse it is not sending all metrics.
If I send the same metrics from shell all works fine.