[Joint-Webinar] How to Build Fast, Scalable Application Monitoring using Open Source

Recorded: Tuesday, January 24, 2023
Presenters: Robert Hodges, CEO at Altinity, and Roman Khavronenko, co-Founder at VictoriaMetrics

In this joint webinar, Roman Khavronenko (VictoriaMetrics co-Founder) and Altinity CEO Robert Hodges present two complementary open-source approaches to monitoring, with live demos of both and a candid comparison. Roman introduces VictoriaMetrics, a Prometheus-compatible time series database that stores metrics as schema-less tuples compressed to ~0.4 bytes per sample, demonstrating data ingestion via HTTP, MetricsQL queries, and live Linux host monitoring through node_exporter and Grafana in under 20 minutes.

Robert then builds the same kind of pipeline in ClickHouse® from scratch—a Python script reading vmstat as JSON, a partitioned MergeTree table, a curl-based insert, and a Grafana dashboard via the Altinity plugin—and shows analytical queries that go beyond dashboards plus four strategies for evolving schemas.

The takeaway: VictoriaMetrics excels at Prometheus-style monitoring of system state at regular intervals, while ClickHouse excels at analyzing events alongside business data with full SQL. Altinity runs both internally, and both presenters agree the pairing is a common, sensible combination.

Here are the slides:

Application-Monitoring-using-Open-Source-VictoriaMetrics-ClickHouse Download

Key Moments (Timestamps)

Key moments generated with AI assistance.

0:04 – Introduction: Robert Hodges and Roman Khavronenko
1:22 – Speaker introductions: Robert Hodges (Altinity CEO) and Roman Khavronenko (VictoriaMetrics co-founder)
2:13 – Why monitoring matters: diagnosing errors with specific, measurable questions
3:30 – The three requirements for monitoring: the right question, the information, the respondent
4:24 – The TSDB as the “wizard” in the monitoring system
5:50 – What is VictoriaMetrics? Open-source TSDB, Prometheus-compatible, horizontal scaling
7:19 – When to use VictoriaMetrics: Kubernetes, hardware, APM, IoT, alerting
7:45 – Metric structure: name, labels, value, timestamp
9:45 – VictoriaMetrics data model: schema-less, columnar storage, DoubleDelta + Gorilla compression
11:25 – Compression and performance: 0.4 bytes per sample, 300K samples/sec/core ingest
13:17 – Pull model vs push model: how VictoriaMetrics collects metrics
14:08 – Push model protocols: Prometheus remote write, InfluxDB, Graphite, JSON, CSV
14:29 – MetricsQL: VictoriaMetrics query language (not SQL) and why
15:51 – Demo begins: starting VictoriaMetrics binary, pushing HTTP metrics
17:20 – Demo: vmui web interface, querying metrics, plotting graphs
18:28 – Demo: node_exporter collecting real CPU/memory/network/disk metrics
19:53 – Demo: importing pre-built Grafana dashboard (22M downloads)
22:16 – VictoriaMetrics FAQ: SQL exporters, application instrumentation, cost (~$400/year for 100 instances), Kubernetes support
25:29 – Q&A: cardinality explosion protection using Bloom filters and per-window limits
27:08 – Handoff to Robert Hodges: using ClickHouse for monitoring
27:16 – ClickHouse is not a TSDB: it’s a real-time analytic database
27:28 – ClickHouse key characteristics: SQL, open source, shared-nothing, columnar, parallel, scalable
30:00 – ClickHouse load speed and ingestion: millions of events/sec, Kafka integration
31:10 – Raw unaggregated tables and materialized views for pre-computation
32:30 – Dozens of built-in input formats: JSONEachRow, CSV, Parquet, Protobuf, and more
33:50 – Time data types: Date, DateTime, DateTime64; time functions
35:18 – Building a host monitoring system with ClickHouse: vmstat + Python + Grafana
36:26 – The Python vmstat collector: reading vmstat output as JSON with timestamp and hostname
37:28 – Designing the MergeTree table: dimensions, metrics, partitioning, ordering
39:03 – Loading data: INSERT via HTTP with JSONEachRow format, curl or Python script
40:14 – Building the Grafana dashboard with the Altinity plugin for ClickHouse
41:05 – The Altinity Grafana plugin history: originally developed by Roman at Vertamedia
43:16 – Demo: vmstat Grafana dashboard, identifying a host at 100% CPU from a sysbench test
47:39 – Demo: analytical query — hosts with over 25% average load for any minute in 24 hours
48:00 – How to handle schema changes in ClickHouse time series
48:09 – Option 1: raw JSON string column + materialized columns
48:43 – Option 2: Map column for key-value pairs
49:24 – Option 3: paired arrays (keys array + values array)
49:48 – Option 4: experimental JSON data type
51:35 – When to use VictoriaMetrics vs ClickHouse: state vs events, MetricsQL vs SQL
53:53 – How Altinity uses both: VictoriaMetrics for Prometheus data, ClickHouse for custom analytics
54:05 – Roman: the core difference — VictoriaMetrics captures state, ClickHouse stores events
56:58 – Q&A: correlating metrics, logs, and traces in ClickHouse (BigTable model, single-scan aggregation)
1:00:08 – Closing remarks and invitation for future deep-dive webinar

Webinar Transcript

[0:04] — Introduction and Housekeeping

Robert: Hi everybody. Welcome to our webinar on building fast, scalable app monitoring with open source. My name is Robert Hodges; I’m CEO of Altinity. I’m joined today by Roman Khavronenko from VictoriaMetrics. We’re going to take about an hour to walk through different approaches to building monitoring systems using different types of databases.

This webinar is being recorded. Everyone who signed up will receive a link to the recording and the slides within about 24 hours. For questions, use the Q&A box in Zoom or the chat — whatever’s convenient, toss them in and we’ll answer them.

[1:22] — Speaker Introductions

Robert: My background: I’m a database geek, been working on databases since 1983. My day job is CEO of Altinity but I still do a lot of hands-on database work.

Roman: Hello everyone. My name is Roman Khavronenko. I’m a distributed systems engineer and a big fan of monitoring and observability. I’m also an engineer and co-founder of VictoriaMetrics, a time series database and monitoring solution. Welcome, and I hope you’ll find this useful.

[2:13] — Why Monitoring Matters

Roman: Let’s begin with what application monitoring actually is. Why would someone need monitoring? Imagine you’re running a user-facing service and your users suddenly start receiving errors. You ask yourself: why? And then you go to more specific questions: when did the errors start? How many users are affected? How many errors are there total? Which region or country was affected? When you have answers to those questions it’s really helpful to find the culprit and fix it as quickly as possible. Monitoring is a super useful thing. If you want to run something reliable, stable, and predictable, you need monitoring.

[3:30] — The Three Requirements for Monitoring

Roman: To make monitoring work well you need only three things. First, the right question: for example, “in which country are users getting errors?” Second, the information which contains the answer — if you don’t have data about errors per country or region, you can’t answer the question. Third, the respondent: something that can access the information and answer your questions.

Systematically, the respondent is a TSDB — a time series database — in the middle. It accepts queries from you, gets access to information, does some processing, and provides a response. There are many TSDBs out there. You need to choose wisely for your specific case, because the right one will provide correct answers that help you build stable and reliable applications.

[5:50] — What Is VictoriaMetrics?

Roman: VictoriaMetrics is an open-source time series database and monitoring solution. It can scale both vertically and horizontally, so you can build a large cluster to process hundreds of millions of data points per second. It’s very simple to operate. It’s cost-efficient. It’s Prometheus-compatible, which makes it useful with all the tools that work with Prometheus. It’s open source, free forever, and you can start using it right now.

Most of our users use it for Kubernetes monitoring, hardware or infrastructure monitoring, APM, IoT, edge computing, and alerting — basically anything related to metrics.

VictoriaMetrics as a monitoring solution has a built-in TSDB, a user interface and query interface for asking questions, and its own ways to collect metrics both via a pull model and a push model.

[7:45] — What Is a Metric?

Roman: A metric is a numeric measure or observation of something. For example: number of served requests, number of received errors, request latency, CPU usage of my laptop, or even the number of participants in this webinar.

A metric consists of a metric name (something humans can read and understand — requests_total suggests it measures the number of served requests), optional labels that add context (status code 200, status code 403), a value showing how many requests were served with each code, and a timestamp recording when the observation was taken.

[9:45] — VictoriaMetrics Data Model and Compression

Roman: Inside VictoriaMetrics the data model is schema-less. There’s no notion of schema, table, or database. You just store everything you put in. A user is free to update metrics whenever they want without making any schema changes — you only need to change your application. Updating or changing metrics is very simple.

Internally, VictoriaMetrics benefits from a columnar approach for storing data: value, timestamp, and series ID are stored in separate columns, which allows different compression codecs. Timestamps are compressed with DoubleDelta encoding. Values are compressed with a modified Gorilla compression algorithm. These techniques result in a compression ratio of around 0.4 bytes per sample, an ingest speed of around 300,000 samples per second per CPU core, and a scanning speed of around 50 million samples per CPU core. Combined with clustering and horizontal scalability, this provides effectively unlimited storage for time series processing.

[13:17] — Pull Model and Push Model

Roman: VictoriaMetrics inherited the pull model from Prometheus: the monitoring system collects metrics from applications on its own, going to each application at each interval to collect what it needs. Applications don’t need to know the monitoring system exists. This approach is very robust.

VictoriaMetrics also supports a push model where applications decide when and at what volume to deliver metrics. For the push model we support Prometheus remote write, InfluxDB protocol, Graphite, JSON, and CSV, among others.

[14:29] — MetricsQL: Why Not SQL?

Roman: VictoriaMetrics doesn’t support SQL. It supports its own query language called MetricsQL. In a recent talk at OSiCon 2022 I explained why SQL isn’t the best choice for time series data. But let me give some quick examples.

The first query requests all metrics with the name requests_total and returns two time series in response. The second query filters to only time series with status code 200 using label selectors in curly braces. The third query does an aggregation — summarizing all time series named requests_total and grouping them by the path label. Simple and intuitive for time series use cases.

[15:51] — Demo: VictoriaMetrics from Zero to Dashboard

Roman: Let me show you how to run VictoriaMetrics and visualize data. I have the VictoriaMetrics binary already downloaded from GitHub. Let me start it:

./victoria-metrics-prod

VictoriaMetrics is running. Now let me push some data. Here’s a simple shell script that runs in a loop from 1 to 100 and sends a POST request with a metric named http_requests with a label path=/home every five seconds:

curl -d ‘http_requests{path=”/home”} 1’ http://localhost:8428/api/v1/import/prometheus

To query the data I visit the VictoriaMetrics web UI (vmui) running at localhost. I type http_requests, execute the query, and see our metric incrementing once every five seconds. Looking at the last five minutes we can see the metric plotted correctly.

But that artificial metric isn’t very interesting. Let me show what I’ve been doing since the webinar started: collecting real metrics from my laptop using node_exporter, the widely-used open-source system metrics collector running on virtually every data center in the world.

Let me query node_cpu_seconds_total, a metric that tracks CPU usage broken down by CPU number and mode. I can filter to mode=”idle” and apply the rate() function to convert the cumulative counter into a per-second rate:

rate(node_cpu_seconds_total{mode=”idle”}[1m]) * 100

We can see I’m spending about 99% of the time in idle mode during this webinar — my CPU is doing very little.

Now let me import a pre-built Grafana dashboard for node_exporter. I connect Grafana to VictoriaMetrics, import the most popular Grafana dashboard (over 22 million downloads), and now I have CPU usage, memory usage, network usage, disk space usage, and many more metrics — all from open-source compatibility, no custom work required.

[22:16] — VictoriaMetrics FAQ

Roman: A few common questions.

Can I monitor my MySQL server, PostgreSQL, or other software? Probably yes. There are many open-source exporters for almost every popular software, including ClickHouse, Redis, Nginx, and more. They’re all free.

Can I monitor my applications? Yes. There are many instrumentation libraries for different programming languages to expose the metrics you want.

How expensive is monitoring? For 100 instances each emitting 1000 metrics every 30 seconds stored for one year: about 100 GB of disk space, which costs around $54/year on AWS. Plus a medium AWS instance for computation at about $360/year. Total: around $400/year. That’s the cost efficiency of VictoriaMetrics.

Can I run VictoriaMetrics in Kubernetes? Yes. We have Helm charts and a Kubernetes operator.

[25:29] — Q&A: Cardinality Explosion Protection

Robert: There’s a question about how VictoriaMetrics protects itself from metrics that introduce extreme cardinality labels.

Roman: That’s a really big problem in monitoring, called cardinality explosion. We have protection on our collecting agents which can limit the number of unique time series per time window — per hour or per day. When this limit is reached, the agents start rejecting those metrics. We internally use Bloom filters to register new time series over time periods, which helps protect the main database from cardinality explosions. There are limits you can apply to your clients to protect against this kind of issue.

[27:08] — Handoff to Robert Hodges: ClickHouse for Monitoring

Robert: Thank you Roman. Now I’m going to talk about using ClickHouse for building monitoring systems.

ClickHouse is not a time series database. It’s a real-time analytic database designed to get very fast responses on very large amounts of data. For observability and logging with ClickHouse, it’s a flexible and powerful foundation.

[27:28] — ClickHouse Architecture

Robert: Key characteristics: it understands SQL, it’s open source under Apache 2.0, it’s a general-purpose database handling everything from financial analytics to security monitoring to log management, and it handles time series data very well because most large data sets tend to be time-ordered.

A few things that make ClickHouse useful for fast response on large data sets. First, columnar storage: data for each column is stored together, highly compressed. Primary key indexes allow locating sections quickly, and skip indexes and inverted indexes allow efficient access patterns. This compressed columnar storage reduces IO and makes it possible to scan enormous amounts of data with relatively few resources.

Second, parallelization: ClickHouse parallelizes extremely well both across hosts and within a host. If you allow it, it will use every core and hardware thread on your system, throwing 100% of CPU resources at queries. This gives very fast answers by maximizing hardware utilization.

Third, built-in sharding and replication: data can extend to hundreds of machines with multiple replicas for each shard.

ClickHouse can load millions of events per second. For large monitoring problems — multiple tenants, large cloud installations — it’s very good at ingestion. It reads from Kafka natively and reads from data lakes.

[31:10] — Unaggregated Tables and Materialized Views

Robert: Typically with ClickHouse you take your ingest data and put it into an unaggregated table — every record stored as a raw row. For questions you ask constantly that need very fast answers, ClickHouse gives you materialized views: transformations or triggers that fire every time a block of data arrives in the source table, run a query across it, and put the results into a materialized view table. On very large data sets this gives answers in milliseconds for predictable question patterns.

[32:30] — Dozens of Built-in Input Formats

Robert: ClickHouse has dozens of built-in input formats. You specify INSERT, give a table name, give a format name, and ClickHouse consumes it. Tab-separated values, normal SQL tuples, JSONEachRow, Protobuf, Parquet, ORC — you name it, ClickHouse probably reads it. New formats are added frequently. Regardless of what form your data is in, there’s a good chance ClickHouse can just read it and stick it in a table.

[33:50] — Time Data Types and Functions

Robert: ClickHouse has a huge number of functions for storing and manipulating time data. Three commonly used types: Date for day-level precision, DateTime for Unix timestamp with one-second precision (recommended when using Grafana), and DateTime64 for nanosecond precision. If you’re working with Grafana, DateTime is your friend.

There are also dozens of functions to transform time data: toYYYYMM to extract year and month as an integer for bucketing by month, toStartOfYear, toStartOfMinute, and many others. Almost any operation you want to do on time data, ClickHouse has a function for it.

[35:18] — Building a Host Monitoring System with ClickHouse

Robert: Let me show you how to build a simple host monitoring system with ClickHouse. I’ll use vmstat — the popular Linux system statistics utility — as the data source.

The workflow is: collect vmstat data, feed it into ClickHouse, and build a Grafana dashboard to view and analyze it.

[36:26] — The Python vmstat Collector

Robert: Here’s the collector, about 13 lines of Python. It opens a vmstat process, reads the output, adds a timestamp and hostname, and converts everything to JSON. The output looks like this:

{“timestamp”: “2022-10-15T14:30:00”, “hostname”: “logos3”, “procs_r”: 1, “procs_b”: 0, “memory_swpd”: 0, “memory_free”: 1234567, …}

This JSON output is generated every second. It includes the timestamp, hostname, and all vmstat variables named exactly as they appear in the vmstat header line.

[37:28] — Designing the MergeTree Table

Robert: Unlike VictoriaMetrics, ClickHouse is SQL-based and has schemas. But ClickHouse tables can be much more flexible than conventional SQL tables. Here’s the table design for the vmstat data. The first three columns — timestamp, hostname, and some identifier — are dimensions. Everything else is measurements. When building these tables we do a few things specific to ClickHouse:

CREATE TABLE vmstat (

timestamp DateTime,

hostname String,

procs_r UInt32,

procs_b UInt32,

memory_swpd UInt64,

— … all other vmstat columns

) ENGINE = MergeTree()

PARTITION BY toYYYYMM(timestamp)

ORDER BY (hostname, timestamp)

MergeTree is the storage engine designed for very large tables. PARTITION BY breaks the data up by month, and ORDER BY defines both the sort order and the sparse primary key index that allows finding sections of data quickly.

[39:03] — Loading Data

Robert: Loading data into ClickHouse is very easy. Issue an INSERT via HTTP specifying the JSONEachRow format:

curl -X POST “http://localhost:8123/?query=INSERT+INTO+vmstat+FORMAT+JSONEachRow” \

–data-binary @vmstat_data.json

ClickHouse reads the stream of JSON lines and assigns each field to the matching table column. Or write a Python script that accumulates 10 records at a time and batch-inserts them. I had this script running for about a day and a half, generating all my demo data.

[40:14] — Grafana Dashboard with the Altinity Plugin

Robert: Now for the dashboard. I built a Grafana dashboard in about 90 minutes. If you’re using Grafana with ClickHouse there are two plugins available. The most popular by far is the Altinity Grafana plugin for ClickHouse — which is actually named “vertamedia” in the Grafana registry because it was originally developed by Roman, my co-presenter today, when he was working at Vertamedia a few years ago. When he left he kindly transferred maintenance to Altinity and we’ve been maintaining it ever since. There’s also a native Grafana plugin maintained in collaboration with the ClickHouse community. I use the Altinity plugin because it’s the one we maintain and know well.

For a full tutorial on creating beautiful Grafana dashboards on ClickHouse, see the Altinity blog.

[43:16] — Demo: Grafana Dashboard

Robert: Here’s the dashboard with vmstat data from several machines in my home setup. I’m tracking average CPU usage per host. I can see one particular host — logos3 — hitting 100% CPU utilization during a specific time interval. By selecting logos3 specifically I can drill down and see: memory usage, CPU usage, and I can recognize that I was running a sysbench performance test at that time that drove the CPU to 100%.

For the ClickHouse monitoring KB and additional monitoring guidance, the Altinity Knowledge Base is the best reference.

[47:39] — Going Beyond Dashboards: Analytical Queries

Robert: One key difference between ClickHouse and a system like VictoriaMetrics is that you can also do analytical queries that reach deep into the data and ask arbitrary questions.

For example, to find which hosts averaged over 25% CPU load for any full minute in the last 24 hours:

SELECT hostname, count() AS minutes_over_limit

FROM (

SELECT

hostname,

toStartOfMinute(timestamp) AS minute,

avg(100 – cpu_id) AS avg_cpu

FROM vmstat

WHERE timestamp >= now() – INTERVAL 24 HOUR

GROUP BY hostname, minute

HAVING avg_cpu > 25

)

GROUP BY hostname

ORDER BY minutes_over_limit DESC

My home servers aren’t very busy, but running sysbench did show up in the results. This is the kind of exploratory, slicing-and-dicing analysis that makes ClickHouse useful for going beyond operational dashboards to deeper business intelligence.

[48:00] — Handling Schema Changes in ClickHouse

Robert: An important question: how do you handle changes in schema when your data structure evolves — new labels, changing label names, and so on?

ClickHouse doesn’t always have to store everything in fixed columns. There are at least four approaches for handling evolving schema:

Option 1: Raw JSON string column with materialized columns. Store the complete JSON payload as a string column. Then define materialized columns that extract specific fields using JSON functions. The extracted columns appear in queries just like regular columns. For new fields, you dig into the JSON.

Option 2: Map column. Store the entire payload as a Map of key-value pairs. Pull specific keys into regular columns if needed. Queries on Map columns are efficient and flexible.

Option 3: Paired arrays. Extract keys into one array and their values into a matching array in the same order. Run queries on the arrays. Still efficient for sparse or variable schemas.

Option 4: Experimental JSON data type. ClickHouse introduced a JSON column type that accepts raw JSON documents and automatically assigns each key to internal sub-columns. Good for semi-structured data but less efficient for very diverse or deeply nested JSON.

All of these options let you simulate time series behavior — a pool of data with changing properties — without rigid schema requirements.

[51:35] — VictoriaMetrics vs ClickHouse: When to Use Which

Robert: Let me share a comparison.

ClickHouse talks SQL, stores any kind of data as rows in tables, is very easy to load data into, can pull from Kafka and object storage, supports very versatile queries, has incredibly strong aggregation capabilities, and virtually all open-source BI tools can talk to it.

Roman: VictoriaMetrics is also similar in some ways: it doesn’t talk SQL but it does talk MetricsQL and has Graphite integration. It’s optimized for storing huge volumes of time series data specifically. It’s schema-less. It has integrations with different protocols for collection. MetricsQL is not as versatile as SQL but is much more advanced for time series-specific processing. It integrates with any BI tool that speaks PromQL.

Robert: I didn’t even bother putting my performance numbers on the comparison because VictoriaMetrics is pretty comparable to ClickHouse per core depending on what you’re doing. And one thing I should say: if I were building a system, what would I choose? Both, actually. We run Altinity.Cloud and we use ClickHouse for the custom parts of our monitoring and VictoriaMetrics to store all our Prometheus data. We’re happy with both.

Roman: I would add that the difference I use to explain it is this: VictoriaMetrics is for capturing and processing the state of your application or hardware at each specific moment in time. ClickHouse can store events — when an event happens you record it and then process it. VictoriaMetrics is for storing and processing the state of the system. ClickHouse can process state but is really designed for processing events. And since it’s a general-purpose database, it can handle both.

[55:59] — How Altinity and VictoriaMetrics Can Help

Roman: VictoriaMetrics is mostly open source. About 99% of all the functionality I mentioned is available in the open-source single-node and cluster versions, free for commercial use without limitations. You can build really large systems using only open source. We also have an enterprise part that adds security improvements and automation for enterprise companies, and a managed cloud solution where you get a ready VictoriaMetrics cluster in a couple of clicks.

Robert: At Altinity we have a pretty similar business model layered on top of ClickHouse. The Kubernetes Operator for ClickHouse, Altinity Stable Builds with three years of support and production certification — it’s all open source. We run Altinity.Cloud for managed ClickHouse both in our VPCs and in your own accounts. We offer enterprise support and on-prem/self-managed support, and we do education for ClickHouse. Visit both of these; they’re great solutions.

[58:14] — Q&A: Correlating Metrics, Logs, and Traces in ClickHouse

Robert: There’s a question: when using ClickHouse to store metrics, logs, and traces, is there an ergonomic way to correlate them?

Depending on what you’re storing, the most common way to handle this in ClickHouse is to put them all in one table. Log messages can coexist with metrics data in the same table — nothing stops you from doing that. ClickHouse has very powerful aggregation functions that allow you to correlate this data through aggregation in a single scan. I have a talk on the BigTable model for ClickHouse which covers this in detail. ClickHouse is very focused on not having to scan data multiple times and not having to do self-joins because they’re slow.

For reference tables — the types of lookups you might join on — those tend to be separate tables and you join on them explicitly. For enormous data sets that won’t fit on a single machine, you’d consider bucketing by tenant or some similar approach to keep related data in a single shard where it can be queried efficiently.

[1:00:08] — Closing Remarks

Robert: Thank you very much for a great presentation, Roman. I learned a lot about VictoriaMetrics today.

Roman: Thank you for the opportunity. It would be a pleasure to continue such webinars. We only discussed basic principles today. There’s so much more to cover about building a good monitoring solution.

Robert: Absolutely. If you want to hear more, contact us. Our goal here is to educate. Get out and try both systems — they’re great. Tell the community how they work for you and contact us for help.

FAQ

When should I use VictoriaMetrics versus ClickHouse for monitoring?

VictoriaMetrics is designed for monitoring the state of systems at regular intervals using a Prometheus-compatible pull or push model, MetricsQL for time series analysis, and schema-less storage that accepts any metric without upfront definition. ClickHouse is a general-purpose analytic database that stores events rather than state snapshots, supports full SQL including joins across tables, and is well-suited when you want to combine monitoring data with other business data, build custom analytical queries, or store logs, traces, and metrics in a unified schema. Many organizations use both: VictoriaMetrics for Prometheus-style infrastructure monitoring and ClickHouse for custom application analytics. Altinity uses this combination for its own cloud platform.

What is MetricsQL and how does it differ from SQL for time series queries?

MetricsQL is VictoriaMetrics’ query language, designed specifically for time series processing. It is Prometheus-compatible but extends PromQL with additional functions. Unlike SQL, it doesn’t require defining tables, schemas, or databases — you just reference metric names and filter by labels in curly braces. It’s less versatile than SQL for joins and arbitrary aggregations, but more concise and expressive for the specific operations needed in time series monitoring, such as calculating rates, applying rolling averages, and aggregating over label dimensions.

How do I load vmstat or other JSON data into ClickHouse?

Design a MergeTree table with columns matching the JSON fields, a PARTITION BY clause (typically by month using toYYYYMM(timestamp)), and an ORDER BY clause that defines the sparse primary key index. Then issue an INSERT via HTTP using the JSONEachRow format and pipe your JSON data to the endpoint with curl or a simple Python script. ClickHouse parses the JSON and assigns each field to the matching column. A Python script that accumulates records in batches of 10 before inserting is a practical approach for continuous streaming collection.

How does VictoriaMetrics protect against cardinality explosion?

VictoriaMetrics collecting agents can be configured with a limit on the number of unique time series per time window (per hour or per day). When the limit is reached, the agent rejects new time series rather than passing them to the database. Internally, VictoriaMetrics uses Bloom filters to track new time series registrations over time periods, providing efficient cardinality protection at the ingestion layer before any data reaches the main database.

How do I handle evolving schemas (changing metric labels or fields) in ClickHouse?

ClickHouse supports at least four approaches. First, store the raw JSON payload as a String column and use materialized columns to extract specific fields on insert using JSON functions. Second, use a Map column to store all key-value pairs and query them as needed. Third, use paired arrays where one array holds the keys and a matching array holds the values. Fourth, use the experimental JSON data type which automatically parses and sub-columns the JSON documents. Each approach has tradeoffs in storage efficiency, query performance, and flexibility for very diverse or deeply nested data.

What is the Altinity Grafana plugin for ClickHouse and how is it related to VictoriaMetrics?

The Altinity Grafana plugin for ClickHouse (also known as the vertamedia-clickhouse-datasource in the Grafana registry) was originally developed by Roman Khavronenko, co-founder of VictoriaMetrics, when he was working at Vertamedia. When Roman left Vertamedia, he transferred maintenance rights to Altinity, which has continued developing it. It is the most widely used Grafana plugin for ClickHouse and is maintained by Altinity.

© 2022 Altinity, Inc. All rights reserved. Altinity®, Altinity.Cloud®, and Altinity Stable® are registered trademarks of Altinity, Inc. ClickHouse® is a registered trademark of ClickHouse, Inc. Altinity is not affiliated with or associated with ClickHouse, Inc. Kubernetes, MySQL, and PostgreSQL are trademarks and property of their respective owners.

PRODUCTS

OPEN SOURCE SOFTWARE

CLICKHOUSE^® SOLUTIONS

Get in touch with ClickHouse experts.

[Joint-Webinar] How to Build Fast, Scalable Application Monitoring using Open Source

Key Moments (Timestamps)

Webinar Transcript

[0:04] — Introduction and Housekeeping

[1:22] — Speaker Introductions

[2:13] — Why Monitoring Matters

[3:30] — The Three Requirements for Monitoring

[5:50] — What Is VictoriaMetrics?

[7:45] — What Is a Metric?

[9:45] — VictoriaMetrics Data Model and Compression

[13:17] — Pull Model and Push Model

[14:29] — MetricsQL: Why Not SQL?

[15:51] — Demo: VictoriaMetrics from Zero to Dashboard

[22:16] — VictoriaMetrics FAQ

[25:29] — Q&A: Cardinality Explosion Protection

[27:08] — Handoff to Robert Hodges: ClickHouse for Monitoring

[27:28] — ClickHouse Architecture

[31:10] — Unaggregated Tables and Materialized Views

[32:30] — Dozens of Built-in Input Formats

[33:50] — Time Data Types and Functions

[35:18] — Building a Host Monitoring System with ClickHouse

[36:26] — The Python vmstat Collector

[37:28] — Designing the MergeTree Table

[39:03] — Loading Data

[40:14] — Grafana Dashboard with the Altinity Plugin

[43:16] — Demo: Grafana Dashboard

[47:39] — Going Beyond Dashboards: Analytical Queries

[48:00] — Handling Schema Changes in ClickHouse

[51:35] — VictoriaMetrics vs ClickHouse: When to Use Which

[55:59] — How Altinity and VictoriaMetrics Can Help

[58:14] — Q&A: Correlating Metrics, Logs, and Traces in ClickHouse

[1:00:08] — Closing Remarks

FAQ

Leave a Reply Cancel reply

PRODUCTS

OPEN SOURCE SOFTWARE

CLICKHOUSE® SOLUTIONS

Get in touch with ClickHouse experts.

Key Moments (Timestamps)

Webinar Transcript

[0:04] — Introduction and Housekeeping

[1:22] — Speaker Introductions

[2:13] — Why Monitoring Matters

[3:30] — The Three Requirements for Monitoring

[5:50] — What Is VictoriaMetrics?

[7:45] — What Is a Metric?

[9:45] — VictoriaMetrics Data Model and Compression

[13:17] — Pull Model and Push Model

[14:29] — MetricsQL: Why Not SQL?

[15:51] — Demo: VictoriaMetrics from Zero to Dashboard

[22:16] — VictoriaMetrics FAQ

[25:29] — Q&A: Cardinality Explosion Protection

[27:08] — Handoff to Robert Hodges: ClickHouse for Monitoring

[27:28] — ClickHouse Architecture

[31:10] — Unaggregated Tables and Materialized Views

[32:30] — Dozens of Built-in Input Formats

[33:50] — Time Data Types and Functions

[35:18] — Building a Host Monitoring System with ClickHouse

[36:26] — The Python vmstat Collector

[37:28] — Designing the MergeTree Table

[39:03] — Loading Data

[40:14] — Grafana Dashboard with the Altinity Plugin

[43:16] — Demo: Grafana Dashboard

[47:39] — Going Beyond Dashboards: Analytical Queries

[48:00] — Handling Schema Changes in ClickHouse

[51:35] — VictoriaMetrics vs ClickHouse: When to Use Which

[55:59] — How Altinity and VictoriaMetrics Can Help

[58:14] — Q&A: Correlating Metrics, Logs, and Traces in ClickHouse

[1:00:08] — Closing Remarks

FAQ

Leave a Reply Cancel reply

CLICKHOUSE^® SOLUTIONS