Building a DIY Prometheus HTTP Endpoint in ClickHouse

Today, Prometheus is a popular solution used for monitoring various applications, ClickHouse is no exception here. ClickHouse acquired a built-in Prometheus endpoint only in late 2019. Before then, users had to find other ways to collect ClickHouse metrics. clickhouse-exporter is a first example of such a tool;  it polls ClickHouse and exports metrics in Prometheus format. Altinity clickhouse-operator has its own metrics exporter that collects metrics from all controlled clusters by operator. There are others as well. Every exporter is accompanied by one or multiple Grafana dashboards that visualize data in a convenient form.

With such a wide family of different endpoints/exporters, users face one issue: different exporters expose different metrics and more importantly, the naming schema of metrics is also different. This does mean that users can not re-use the Grafana dashboard developed for clickhouse-operator in order to visualize metrics of ClickHouse Prometheus endpoint, for example. It is quite a common request from our clients, who do not run ClickHouse in Kubernetes environments.

In this article, we will explain how to set up a Prometheus endpoint in ClickHouse that can export any metric in any format and mimic clickhouse-operator behavior.

Overview

In order to export metrics from ClickHouse in Prometheus format we need to perform 3 steps:

  • Expose HTTP endpoint that can be scraped by Prometheus
  • Format metrics that are returned via HTTP endpoint in Prometheus format
  • Extract data from ClickHouse that maps to metrics properly

Let’s start with the HTTP endpoint.

Predefined HTTP Interface

Regular ClickHouse users may not know that ClickHouse is not just a DBMS, it can also be turned into a web service. The trick is a Predefined HTTP Interface.

If we keep it short, ClickHouse HTTP interface allows to define HTTP handlers in a 3 different ways that provide different levels of customization:

StaticPredefinedDynamic
Just a page, no customizationQuery is static, but may accept parametersQuery is dynamically generated using request parameters

ClickHouse uses HTTP handlers already for pre-configured web services, for example:

  • /play just simple html page
  • /replica_status show replication status of ClickHouse node, may accept ‘verbose’ as a parameter
  • /query – this is a default HTTP endpoint of ClickHouse, that can run any SQL

As you could guess already, we are going to configure the new ‘/metrics’ endpoint using the Predefined HTTP Interface. Endpoint will trigger a number of queries to ClickHouse in order to extract metrics data.

Okay, we know we can define a handler, but how to render data in a Prometheus format?

FORMAT Template

ClickHouse can return data in many different formats, and if you do not find the one that matches your use case, there is a customizable format Template. For Prometheus it is very simple:

# HELP ${name:Raw} ${help:Raw}
# TYPE ${name:Raw} ${type:Raw}
${name:Raw}{${labels:Raw}}    ${value:Raw}

It expects data with 3 columns: name, labels and value. We will discuss how to generate those in a bit. Once defined, the format definition needs to be stored at ‘/var/lib/clickhouse/format_schemas/’ (this is a default path that can be changed if needed). Let’s store it in ‘/var/lib/clickhouse/format_schemas/row.format’ and try the following SQL:

SELECT
    concat('chi_clickhouse_event_', event) AS name,
    CAST(value, 'Float64') AS value,
    description AS help,
    'hostname' || '=' || hostName() AS labels,
    'counter' AS type
FROM system.events
SETTINGS format_template_row='row.format'
FORMAT Template

# HELP chi_clickhouse_event_Query Number of queries to be interpreted and potentially executed. Does not include queries that failed to parse or were rejected due to AST size limits, quota limits or limits on the number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries.
# TYPE chi_clickhouse_event_Query counter
chi_clickhouse_event_Query{hostname=chi-taxi9-taxi9-0-0-0}    525

# HELP chi_clickhouse_event_SelectQuery Same as Query, but only for SELECT queries.
# TYPE chi_clickhouse_event_SelectQuery counter
chi_clickhouse_event_SelectQuery{hostname=chi-taxi9-taxi9-0-0-0}    516
...

We have data in Prometheus format now!

It was pretty easy so far, the only concern being how to generate multiple labels correctly. Prometheus expects that the label key will not be escaped, but the value will be escaped via double quotes. We can make it work if we provide labels as a Map, and then apply arrayMap + arrayStringConcat magic.

WITH map('hostname','my-host','shard','0') as labels_map
SELECT labels_map, 
       arrayStringConcat(arrayMap((k, v) -> concat(k, '="', v, '"'), mapKeys(labels_map), mapValues(labels_map)), ',') as labels

┌─labels_map─────────────────────────┬─labels───────────────────────┐
│ {'hostname':'my-host','shard':'0'} │ hostname="my-host",shard="0" │
└────────────────────────────────────┴──────────────────────────────┘

Now we are ready to extract all the metrics. But ClickHouse can make it even easier.

FORMAT Prometheus

The template format is very flexible, but starting from version 22.5 ClickHouse can export data in Prometheus format by itself. It allows us to export data in Prometheus wire format in the easiest way possible, without dealing with templates, with native support for Map as labels and even histogram metric type!

Our query from Template example can be as simple as this:

SELECT
    concat('chi_clickhouse_event_', event) AS name,
    CAST(value, 'Float64') AS value,
    description AS help,
    map('hostname', hostName()),
    'counter' AS type
FROM system.events
FORMAT Prometheus

The only thing that we need is to generate all metrics with a single SQL query, and that’s it!

Extracting Metrics Data

In order to export metrics required by clickhouse-operator dashboard we may refer to queries in clickhouse-operator source code here. Most of the queries are pretty straightforward but there are several challenging ones. For example, look at the query to extract table level metrics from system.parts:

SELECT
    database,
    table,
    active AS active,
    uniq(partition) AS partitions,
    count() AS parts,
    sum(bytes) AS bytes,
    sum(data_uncompressed_bytes) AS uncompressed_bytes,
    sum(rows) AS rows,
    sum(bytes_on_disk) AS metric_DiskDataBytes,
    sum(primary_key_bytes_in_memory_allocated) AS metric_MemoryPrimaryKeyBytesAllocated
FROM system.parts
GROUP BY active, database, table

One row here contains multiple metrics. It is not a problem at all if the exporter is written in Golang, but since we only have SQL in hand we need to transpose or unpivot the result set. The approach is described in this Altinity Knowledge Base article.

The idea is to combine metric data in 2 arrays: metric names and metric values. Then we use the arrayJoin function in order to turn arrays into rows. Here is the code to do it. 

WITH
    ['chi_clickhouse_table_partitions', 'chi_clickhouse_table_parts', 'chi_clickhouse_table_parts_bytes', 'chi_clickhouse_table_parts_bytes_uncompressed', 'chi_clickhouse_table_parts_rows', 'chi_clickhouse_metric_DiskDataBytes', 'chi_clickhouse_metric_MemoryPrimaryKeyBytesAllocated'] AS names,
    [uniq(partition), count(), sum(bytes), sum(data_uncompressed_bytes), sum(rows), sum(bytes_on_disk), sum(primary_key_bytes_in_memory_allocated)] AS values,
    arrayJoin(arrayZip(names, values)) AS tpl
SELECT
    tpl.1 AS name,
    CAST(tpl.2, 'Float64') AS value,
    '' AS help,
    map('database', database, 'table', table, 'active', toString(active), 'hostname', hostName()) AS labels,
    'gauge' AS type
FROM system.parts
GROUP BY
    active,
    database,
    table
ORDER BY name ASC
FORMAT Prometheus

# TYPE chi_clickhouse_metric_DiskDataBytes gauge
chi_clickhouse_metric_DiskDataBytes{active="1",database="system",hostname="chi-taxi9-taxi9-0-0-0",table="metric_log"} 5269278
chi_clickhouse_metric_DiskDataBytes{active="1",database="system",hostname="chi-taxi9-taxi9-0-0-0",table="trace_log"} 8968717
chi_clickhouse_metric_DiskDataBytes{active="0",database="system",hostname="chi-taxi9-taxi9-0-0-0",table="trace_log"} 663067
…

In order to combine multiple metric types into one query, we will create a VIEW over multiple queries connected with UNION ALL, like follows.

CREATE OR REPLACE VIEW system.operator_compatible_metrics
(
    `name` String,
    `value` Float64,
    `help` String,
    `labels` Map(String, String),
    `type` String
) AS
/* system.events query */
UNION ALL
/* system.metrics query */
UNION ALL
/* system.asynchronous_metrics query */
UNION ALL
/* system.dictionaries query */
UNION ALL
/* system.processes query */
UNION ALL
/* system.parts query */
UNION ALL
/* system.mutations query */
UNION ALL
/* system.detached_parts query */

See the full SQL source there in the GitHub repository.

That’s it. Now ‘SELECT * FROM system.operator_compatible_metrics FORMAT Prometheus’ will dump all metrics in Prometheus format that matches clickhouse-operator’s Grafana dashboard!

Getting Everything Together

Now, let’s assemble everything together in a predefined HTTP handler. The configuration file below should be placed into /etc/clickhouse-server/config.d/ folder.

http_handlers.xml:
———————-

<clickhouse>
    <http_handlers>
        <rule>
            <url>/metrics</url>
            <methods>POST,GET</methods>
            <handler>
                <type>predefined_query_handler</type>
                <query>SELECT * FROM system.operator_compatible_metrics FORMAT Prometheus</query>
                <content_type>text/plain; charset=utf-8</content_type>
            </handler>
        </rule>
        <defaults/>
    </http_handlers>
</clickhouse>

Here we defined a new ‘/metrics’ endpoint that calls a monitoring query and returns a result. <defaults/> section is required in order to enable all default ClickHouse endpoints. We can test it as follows using localhost or a real host name:

curl http://localhost:8123/metrics

So the problem is solved! We have taught ClickHouse to expose metrics without a clickhouse-operator and can now plug in the Grafana dashboard! There is a trick here, though. If the ‘default’ user requires a password or can not be used from the outside, that is quite reasonable restriction. We will need to create a user and pass user:password in prometheus config.

global:
  scrape_interval: 10s
scrape_configs:
  - job_name: clickhouse
    static_configs:
      - targets:
        - localhost:8123
    scrape_interval: 1m
    scrape_timeout: 30s
    metrics_path: "/metrics"
    basic_auth:
      username: 'username'
      password: 'password'

Conclusion

In this article, we showed how to add a predefined HTTP endpoint to ClickHouse, that exposes server metrics in Prometheus format compatible with clickhouse-operator. This approach has a number of advantages since it is easy to extend. The built-in Prometheus endpoint is carved in stone, and can not be changed quickly. The predefined HTTP handler allows us to add new metrics any time by modifying the system view. 

Building a Prometheus endpoint is not the only use case for HTTP handlers. In the next article, we will use it for yet another ClickHouse extension. Stay tuned!

Link to GitHub repository with examples used in the article. 

Share

Related:

One Comment

  1. Hello Altinity Team,

    I would like to find out how to make some changes on this tutorial and get opportunity to write custom queries, such as: size of tables for last hour, partition size of every table and etc. So, it can be sent to Prometheus and appeared on Grafana.

    Thank you, in advance!

Comments are closed.