Five Ways to Make Your ClickHouse® Slow (and How to Avoid Them)

Recorded: February 18 @ 08:00 am PST
Presenters: Robert Hodges
In this webinar, Altinity CEO Robert Hodges and ClickHouse® Architect Mikhail Filimonov cover five performance problems that trip up ClickHouse® users at every stage of adoption, from initial schema design through large-scale production deployments and migrations from competing systems.
The session opens with a concise overview of why ClickHouse® is fast in the first place: columnar storage, aggressive compression (typically 3,000x to 4,000x less data read from disk compared to row-oriented databases), and parallel execution. These properties make a correctly configured ClickHouse® instance extraordinarily fast, but they also mean that the wrong choices in schema design or workload management can eliminate those gains entirely.
The five problems covered are: (1) poor schema design, specifically partition keys that create too many small parts rather than large, evenly-sized ones; (2) small inserts that overwhelm the merge machinery and consume CPU and I/O that should be serving queries; (3) bad query patterns, including the use of overly precise aggregate functions when approximations would work and joins that force large hash tables to be carried throughout the scan phase; (4) insufficient hardware resources, including the frequently under-resourced ClickHouse Keeper service; and (5) migration from non-relational systems such as Splunk and Elasticsearch, which use inverted full-text indexes that have no direct equivalent in ClickHouse® and require careful rethinking of query patterns and secondary index design.
Mikhail adds expert commentary on Kafka-based ingestion and async inserts, materialized view fan-out problems, dictionary use as an alternative to joins, and sharding key selection. The session closes with a practical summary: test on real data, test concurrently, monitor proactively, scale quickly when problems arise, and build in extra time whenever you are migrating from a non-relational database.
Here are the slides:
Key Moments (Timestamps)
Key moments generated with AI assistance.
- 00:00 – Welcome and housekeeping
- 01:14 – About Altinity: enterprise support, Altinity.Cloud, the Kubernetes Operator, Project Antalya
- 03:54 – ClickHouse architecture overview: shared-nothing, MergeTree, ClickHouse Keeper
- 06:07 – The canonical use case: real-time event pipelines
- 07:29 – Why ClickHouse is fast: columnar storage, compression, and parallel reads
- 12:43 – Problem 1: Bad schema design — partition keys, parts, and the sparse index
- 16:04 – Demo: partitioning by carrier turns a sub-second query into a 5-second crawl
- 19:48 – Schema design cheat sheet: partition by time, sort by tenant
- 20:49 – Compression: data types, codecs, LZ4 vs. ZSTD
- 22:45 – System tables for measuring compression
- 23:34 – Problem 2: Too many small inserts and write amplification
- 28:39 – Solution: batching large CSV/Parquet inserts
- 29:18 – Solution: async inserts and user-level settings profiles
- 29:48 – Q&A: async inserts with Kafka (Mikhail)
- 31:25 – Problem 3: Bad queries — aggregate function selection and join rewriting
- 36:28 – Choosing approximate aggregates: uniqExact vs. uniqHLL12 (HyperLogLog)
- 38:07 – Join optimization: pushing aggregation into a subquery
- 39:29 – Reading query performance stats with system.query_log
- 40:08 – Test on real data and only fix what is actually slow
- 41:05 – Problem 4: Not enough resources — hardware, CPU, I/O, and ClickHouse Keeper
- 45:00 – The four concurrent workloads: queries, inserts, merges, updates
- 46:42 – Q&A: how to monitor ClickHouse Keeper health (Mikhail)
- 49:35 – Problem 5: Migration from non-relational databases (Splunk, Elasticsearch)
- 53:05 – Inverted indexes vs. ClickHouse skip indexes and bloom filters
- 57:09 – Summary: five anti-patterns and how to avoid each
- 59:52 – Extended Q&A: joins vs. spilling to disk, dictionaries, materialized view fan-out, sharding keys
Webinar Transcript
[00:00] – Welcome and Housekeeping
Robert: Hello everybody and welcome to our webinar on five ways to make your ClickHouse slow and how to avoid those ways. My name is Robert Hodges. I’m going to be here today with Mikhail Filimonov. Let me give people 60 seconds to join, and then we’ll get started.
All right, let’s get going on five ways to make your ClickHouse slow and how to avoid them. My name is Robert Hodges. I’m CEO of Altinity. I have with me today my colleague Mikhail Filimonov, who is our ClickHouse architect, and we are going to be talking about ClickHouse performance.
Before we dive in, a few brief things that will make this webinar more fun and enjoyable. First, it is being recorded. We will give you that recording as well as a copy of the slides. No need to make hasty notes. You’re going to get it all in email within a few hours of this talk ending. We also have time for questions. Mikhail is here and can help answer questions both in the foreground and in the background. If you have a question as we’re going along, please go ahead and pop it into the Q&A box or the chat. We will answer them no matter where you put them, and we’ll have time at the end for extra questions.
[01:14] – About Altinity
Robert: A quick introduction to Altinity. We are a vendor for ClickHouse. We are not the same as ClickHouse, Inc. We collaborate on open source, but we are competitors in the market. What do we do? We started out offering enterprise support: 24/7 support for ClickHouse for any purpose, from embedded systems to huge cloud clusters with hundreds of nodes. We also run Altinity.Cloud, which was the first ClickHouse cloud introduced in North America on Amazon, has been going strong for six years. We are the authors of the Altinity Kubernetes Operator for ClickHouse, a very popular way of running ClickHouse on Kubernetes. And we also have Project Antalya, where we’re adapting ClickHouse to use Iceberg as native storage. We’d be happy to talk about any of those. Today, though, we’re going to talk about performance.
[03:54] – ClickHouse Architecture Overview
Robert: To get started, a quick overview of the ClickHouse architecture. Most people on the call are probably familiar with this, but let’s review it just in case.
ClickHouse is what we call a real-time analytics database. It’s designed to read large quantities of data very quickly, to ingest them quickly, and to answer questions across data sets that extend into trillions of rows. The architecture is what’s called shared-nothing. This is a traditional architecture for analytic systems that has been around for decades. You have ClickHouse servers with attached storage, for example, EBS on Amazon, NVMe SSD on rack machines, connected across a network. When we create tables, we can have them replicated so that parts loaded into one replica automatically replicate to others.
The way replication works is through ClickHouse Keeper, which was originally based on ZooKeeper but has since been rewritten in C++ as a native ClickHouse project. Keeper maintains a list of which replica has which parts, so ClickHouse knows how to move data around so that every replica has its own copy. Over time, ClickHouse has also become very good at reading data from object storage, particularly S3, as well as the native APIs for Azure and Google. We cover that in other webinars.
[06:07] – The Canonical Use Case: Real-Time Event Pipelines
Robert: The canonical system that people build with ClickHouse is what we call a real-time event pipeline. Imagine a security management system reading DNS requests. Those requests are collected from individual machines, fed into an event stream like Kafka to buffer them and separate producers from consumers, and then delivered to ClickHouse, where they land in MergeTree tables. MergeTree is the workhorse engine for large-scale analytic data. This data, which can arrive at millions of rows per second, then feeds real-time dashboards, APIs for applications, and alerting systems. This has become very popular.
[07:29] – Why ClickHouse Is Fast
Robert: Let’s look at a small example. I’ll be using the on-time data set for most of this talk. It has a row for every commercial flight in the United States over the last 30 to 40 years, roughly 200 million rows in our subset. This query asks for the carriers with the worst cancellation rates. The carrier G4 had 16% of its flights canceled in 2020. That query scanned just under 200 million rows and returned in less than a second.
Now, we did everything possible to make this slow. We’re running on an M7G.xlarge Graviton with four vCPUs and 16 gigs of memory, with EBS storage throttled to 125 megabytes per second, which is like an old SATA disk drive. We forced the query to do cold reads using direct I/O, bypassing the buffer cache entirely. And yet we still get results under a second.
How does it do this? The secret is columnar storage, columnar storage that’s compressed, and access that’s parallelized. In a row database like PostgreSQL or MySQL, with 109 columns in this table, touching any column value for a row means reading the entire row. In ClickHouse, each column is stored separately in its own files. If your query touches three columns, you only read those three column files. For that query, touching three columns, a row database would read 61 gigabytes. ClickHouse, because it’s columnar, cuts that down by 98.5% to about a gigabyte. But the data actually read from disk is only 17 megabytes because of compression. That’s a reduction of 3,000x to 4,000x from what a row database would read. And ClickHouse parallelizes this scan across multiple threads. This is why ClickHouse is fast, particularly when combined with optimized hardware and lots of memory.
[12:43] – Problem 1: Bad Schema Design
Robert: So it seems like life is good, and we don’t even need to talk about performance. Well, it’s not quite that simple. Let’s find out what can actually go wrong.
The first problem people tend to run into is table design. This is the first place people land when they come to ClickHouse, and if you haven’t used it before, this is something that’s really important to understand.
ClickHouse breaks tables up into pieces called parts. Here’s a short-form table definition. We have a MergeTree table with two important clauses. First, PARTITION BY: this breaks the table into sections divided by year. Second, ORDER BY: within those partitions, parts will be ordered by carrier, followed by flight date. On disk, you’ll see parts with names like 2017_N_N_N, where the numbers indicate how many times those parts have been merged.
ClickHouse asks you to make this decision yourself. It doesn’t partition automatically. As a rule of thumb, you want to get roughly a thousand parts. You can have more or fewer, and it’ll work fine, but if you have a lot more, it won’t work well.
Within a part, there’s what we call the sparse index, sometimes called the primary key index. It’s designed to fit in memory, which means it doesn’t store an entry for every single row but for roughly every 8,000 rows by default. These index entries point to offsets in the column files, inside regions called granules, which are where the actual column data is stored.
[16:04] – Demo: Partitioning by Carrier Turns a Sub-Second Query into a 5-Second Crawl
Robert: Now, let’s make a change. What if we partition this table by carrier and break parts up by month? On the face of it, this seems reasonable. If you’ve worked in multi-tenant systems with MySQL, it’s very common to put each tenant in its own database. Why not put each tenant in their own parts? Then, if we get a GDPR request, we can just drop all the parts for that tenant.
This has some bad effects on performance. When we create the table with this partition scheme and insert the same data, that same query goes from under a second to 5 seconds: six times slower. Simply because of that partitioning choice, we’ve massively increased the number of parts, and each part has files for all 109 columns. ClickHouse now has a vast number of files to open and read to do the scan.
Why is this? For airline flights, roughly the same number of flights happen every year. So when we partition by year, we get partitions of similar size: about 5 million flights per year. If we split by carrier instead, there’s a huge amount of variation. Southwest Airlines accounts for about a seventh of all commercial flights in the United States, while many other carriers are tiny. At the tail, you get a huge number of very small parts. Small parts are slower to open and read. That’s what kills performance here.
[19:48] – Schema Design Cheat Sheet
Robert: When you want to avoid performance problems, follow this cheat sheet.
When you partition a table, partition by time. Pick a reasonable time period. If you’re holding data for 14 days, partition by day. If you’re holding it for weeks, partition by week. In our case, with 30 years of data, we partition by year. Check out the guide to how to pick an ORDER BY, PRIMARY KEY, and PARTITION BY for MergeTree for detailed rules.
The second thing, and this is something you have to fight against if you’re used to other databases: don’t split data for tenants; sort by tenant. Put the tenant column in your ORDER BY. That arranges tenants into contiguous sections of data, so queries against a specific tenant read a localized part of storage. The trade-off is that if you need to remove a tenant, you’ll have to update the parts. But in most cases, like observability, that’s a small price to pay, especially if you don’t keep data for too long.
Other schema performance levers: use proper data types, use codecs (which transform data before compression), and if I/O is the bottleneck and you have idle CPU capacity, consider switching from the default LZ4 compression to ZSTD, which compresses more aggressively with reasonable CPU cost.
[22:45] – System Tables for Measuring Compression
Robert: One of the things I love about ClickHouse is that it has the best system tables of any database I’ve ever worked with. There’s a typical query you can run to find out the compression ratios for the specific columns you’re scanning. The readable_size format makes it human-readable. You should absolutely become familiar with these system tables. You will soon love them.
[23:34] – Problem 2: Too Many Small Inserts and Write Amplification
Robert: The next common problem is too many inserts. This is very common in analytic databases, because if you learned SQL on relational databases like MySQL, you’re used to inserts that look like INSERT INTO some_table VALUES (...), inserting a row or two at a time. Small data. If you’re doing this in ClickHouse, you’re doing it wrong.
The problem is that small inserts create a large number of tiny parts. ClickHouse will then try to coalesce them through a background process called merging. What merging does is: ClickHouse finds groups of small contiguous parts and rewrites them into bigger ones. Over time, this produces large, efficiently-queryable parts. But while this is happening, it consumes CPU and I/O bandwidth in what we call write amplification, meaning data is written again and again as parts get merged into progressively larger ones.
If you’re constantly sending small inserts, you can overwhelm the merge machinery and starve your queries of the CPU and I/O they need. This is a really important property of ClickHouse.
There are a couple of ways to get around this.
First, batch the data yourself. Put your data into large CSV or Parquet files, a million rows at a time, and post them to ClickHouse’s HTTP interface. ClickHouse does not care how big these things are. You can use any reasonable size and just push it up. For more background, the Altinity blog post on async inserts also covers the underlying write amplification problem.
Second, use async inserts. When you enable them, a bunch of clients can insert small amounts of data simultaneously. ClickHouse puts the incoming rows into a single buffer. After a reasonable period of time, say half a second, or after the buffer reaches a certain size, it flushes the whole buffer to storage as a single, properly-sized part. Here’s how you set it up: you create a user with a settings profile that automatically enables async inserts, and all inserts from that account get merged automatically. See the Altinity Knowledge Base article on async inserts for configuration details and best practices.
[29:48] – Q&A: Async Inserts with Kafka
Robert: There’s a great question from Carol: is it possible to enable async inserts for Kafka-based ingestion? Mikhail, do you want to take that one?
Mikhail: Yes. If you use the Kafka table engine inside ClickHouse, it already has similar buffering functionality built into the engine itself, so you don’t need async inserts, and they will not work if you try to enable them for Kafka. Kafka itself works in a very similar way, so you don’t need async inserts for the Kafka engine. But if you’re doing inserts from an external component, for example, Kafka Connect or a Golang application that reads from Kafka and writes to ClickHouse, then for ClickHouse it’s a usual insert, and you can use async inserts in that case.
Robert: That’s exactly the answer I would have given. There’s kind of a balance here. A lot of people use Kafka because it does simplify systems in one sense: it takes care of the small insert problem and uses connections more efficiently. The flip side is that you have to manage Kafka. You can decide which one is better for your situation.
[31:25] – Problem 3: Bad Queries
Robert: All right, on to bad queries. Good queries are all similar; bad queries are all different, each in their own way. Let me give some examples and then generalize.
The basic thing to understand is that very small differences in how you ask for data can make huge differences in performance.
Here’s a simple example: a query asking for average departure delay broken out by carrier. That’s very simple. Now, a more detailed form asking for average departure delay broken down by both carrier and flight date, collecting also the tail numbers associated with the delays. That’s a reasonable ask for more information. But the second query is 4.25 times slower, and more interestingly, it uses over 20,000 times as much memory. The reason: averages are very compact in memory, but the exact unique count function stores each individual value in a hash table that grows with every unique tail number.
To understand why query shape matters so much, it helps to know what happens when ClickHouse processes a query. The query gets parsed and planned, then a scan runs across the table. In our example server with four CPU threads, each thread builds a hash table in memory holding the grouped aggregation data. At the end, those hash tables get merged, sorted, and handed back to the client. The size of those hash tables is dominated by the aggregation: more group-by keys, bigger hash tables. More memory consumed, more time to merge.
[36:28] – Choosing Approximate Aggregates: uniqExact vs. uniqHLL12
Robert: Here’s a really interesting example of how aggregate function choice matters. Counting unique values over a period of time, like unique visitors to a website per hour, or in our case, unique tail numbers associated with flights on a particular date.
If we use uniqExact, it builds a hash table and stores every individual value. For every tail number, the hash table gets a bit bigger. If we switch to uniqHLL12, which uses HyperLogLog, we get an approximate count but a fixed-size data structure. As a result, switching to HyperLogLog gives us almost exactly the same answer but 15% faster performance and five times less RAM. This is a great example of making a very simple change and getting much better results.
[38:07] – Join Optimization: Pushing Aggregation into a Subquery
Robert: ClickHouse does joins reasonably well, contrary to reputation, but there are places where you can get big performance advantages if you look closely. Here’s an example where we’re joining to fill in an airport name alongside each IATA code.
In the original query, we join directly. In an optimized version, we break out the query so the heavy scan and aggregation happen in a subquery first, returning a much smaller result set, and then join the small result set against the airport names table. The query runs much faster and uses less RAM.
Interestingly, ClickHouse has an increasingly good query analyzer, but it doesn’t make this optimization automatically at least in this case. You still have to do it yourself. What was happening was that the scanning threads were each dragging along the full airport name data in their hash tables throughout the entire scan, consuming memory and time unnecessarily. Knowing a little about how ClickHouse processes queries internally gives you insight into what you can do to fix it.
[39:29] – Reading Query Performance Stats
Robert: How do we get these performance stats? You can read them from the ClickHouse client output, but you can get even more information from system.query_log. This should be a system table you live in if you’re optimizing queries. It provides great information, and it’s all available through SQL. The Altinity Knowledge Base has excellent guidance on query analysis using system tables.
[40:08] – Test on Real Data and Only Fix What Is Actually Slow
Robert: One more thing on query optimization: test on real data. Run your queries against a realistic data set. Take the queries you don’t like and start to optimize them using the tricks we’ve discussed. But only apply optimizations if the query is actually slow. If it’s fast enough for your users and doesn’t consume too much capacity, leave it alone. ClickHouse gets a lot of stuff right all by itself. In the cases where it doesn’t, dig in and figure out why, and use optimizations to make it much faster. It is very rare that you can’t make a ClickHouse query faster, but it requires testing against real data.
[41:05] – Problem 4: Not Enough Resources
Robert: Problem number four: simply not having enough hardware resources available to ClickHouse.
Sometimes, after hours of query optimization, if you can’t improve performance further, you may have no choice but to throw hardware at the problem. On a 4-vCPU machine with 125 MB/s EBS, upgrading to a 32-vCPU instance and increasing I/O bandwidth dramatically improves cold reads, getting faster than the properly-optimized schema on constrained hardware. Adding resources is always an option when you need to buy time.
But there’s one thing that’s really important to understand about resource management with ClickHouse: it’s not just queries. ClickHouse servers are simultaneously doing four things.
Queries: answering analytical questions.
Inserts: loading data, which can be computationally expensive, for example, if you’re parsing JSON.
Merges: coalescing small parts into larger ones. This burns CPU and storage bandwidth.
Updates: ClickHouse data tends to be immutable, but not always. GDPR tenant removal, late-arriving data corrections, schema changes via mutations. In real systems, updates are pretty common.
All four of these workloads happen at the same time on the same server. They compete for CPU, RAM, storage bandwidth, and network. A particularly important thing not to forget: heavy inserts and merges put a lot of pressure on ClickHouse Keeper. And if Keeper is slow to respond, your inserts slow down. If Keeper gets overloaded enough, your replicated tables go read-only, and inserts stop entirely. This is bad.
[46:42] – Q&A: How to Monitor ClickHouse Keeper Health
Robert: There’s a great question about how to monitor ClickHouse Keeper pressure and state. Mikhail?
Mikhail: To monitor Keeper pressure, start with the usual metrics: CPU, disk I/O, things like that. One of the most important Keeper-specific metrics is outstanding requests, meaning how many processed requests are in the queue. There’s also a metric for fsync time, which shows the disk round-trip latency. Keeper relies heavily on fsync, so disk latency needs to be very low. If you have no problems, it just works. If you start to see outstanding requests climbing or fsync times going up, that’s your signal.
Robert: It’s really important to keep Keeper from getting overloaded because if Keeper starts crashing, your tables go read-only and inserts stop. We know from experience that it’s tempting to under-resource Keeper, because when the system is lightly loaded, Keeper isn’t a heavy user of resources at all. The problem is that once it gets overloaded, it’s hard to get back out of it. You need to scale it up at a time when that’s difficult. My personal recommendation: have good metrics on the things Mikhail mentioned, add a somewhat bigger machine than you think you need, and then don’t worry about it.
[49:35] – Problem 5: Migration from Non-Relational Databases
Robert: The last problem is one we’ve had to grapple with a lot over the years: migration from non-compatible databases.
Let’s clarify what “compatible” means. PostgreSQL and MySQL are compatible. If you’re running SELECT AVG(departure_delay) FROM flights GROUP BY carrier on MySQL, the same query will run on ClickHouse and return the same results, typically 1,000 to 2,000 times faster. The reason is that these are both relational, tabular systems. They have the same data types, similar SQL dialects, and similar query planning. These migrations tend to be fairly straightforward: dump the data out of MySQL or PostgreSQL and load it into ClickHouse.
Where it gets harder is migration from systems that organize data very differently.
Splunk is a popular one to migrate away from. Splunk takes incoming log messages and stores them as events in a full-text index. Splunk’s data isn’t stored in table-with-columns format. Instead, everything goes into an inverted index, which stores the values themselves (not just pointers to rows) and lets you search for any combination of terms anywhere in the data. Questions like “find all records where the tail number starts with these four characters” come back very quickly in Splunk. When you get to ClickHouse, this is a problem because ClickHouse organizes data in a completely different way.
ClickHouse does have skip indexes designed to skip reading data that can’t match a query. You can add a full-text index on specific columns that need it. ClickHouse also has Bloom filter indexes, which are powerful but need careful tuning. There’s a great Altinity blog post on Bloom filter indexes that goes into the theory and parameter selection. And ClickHouse’s LIKE operator performs very well, reportedly using SIMD instructions on Intel hardware. But for any of these options you need to first identify which columns people are actually running searches against, design the right index for those columns, and then test it on real data to confirm your choice solves the problem.
The broader point we’re making here: when you’re moving from a non-relational database, you’re moving to a different data organization with different expectations. Many companies have done this successfully and we’ve helped many customers through it, but it doesn’t happen by magic. Test realistic query patterns. Understand what people liked about the old system and figure out how to replicate it in ClickHouse. And Altinity’s migration support services can help you navigate this efficiently.
[57:09] – Summary: Five Anti-Patterns and How to Avoid Each
Robert: Let’s wrap up. We’ve hit the end of the five performance problems. These are very common when people come to ClickHouse or when their systems grow.
Bad schema: tune your schema to reduce I/O. Put realistic amounts of data in your test environment early. ClickHouse has powerful system tables to help you optimize. Partition by time, sort by tenant.
Small inserts: always make inserts as big as possible, but not so big that you’re delaying data by more than your SLA allows. If you need data visible within 30 seconds, don’t buffer for 30 seconds; buffer for 5 and leave a margin for the rest of the pipeline. Use batching or async inserts.
Bad queries: test on real data and fix the slow ones. Only optimize queries that are actually causing problems.
Hardware: test hardware on realistic workloads, including concurrency and data size. Be able to increase it before you hit problems. Have monitoring metrics that warn you before things fail.
Non-relational migrations: plan time into the schedule. Think about testing. Think about what it is that people actually value in Splunk or Elasticsearch. There is a reason why people pay enormous amounts of money for Splunk: it’s a pretty good database. When you come to ClickHouse, take into consideration what those systems do well. You don’t just drop data into ClickHouse and start saving money. There is a trade-off between how different databases organize data, and you need to adapt to ClickHouse’s model to get the full value from the investment.
And finally: don’t assume ClickHouse will be fast. When it’s been properly tested and tuned, it’s the best and fastest database I’ve ever worked with. But you need to test it, prove it for yourself, and make the investment of time and effort to get the speed.
[59:52] – Extended Q&A
Robert: Let me turn it over to Mikhail for a couple of questions that came in.
On joins vs. spilling to disk:
Mikhail: The question is about the new ability to spill join hash tables to disk and whether we should still prefer to fit the right-hand table into RAM. The answer is yes. If you can fit the right-hand table in RAM, that’s always better. Every kind of spilling to disk is always worse in performance. You do it because you have no other choice. If your join is not fitting into RAM, spilling to disk lets the query finish without running out of memory. But if you can fit it in RAM and do it the old-school way, that’s faster. Enabling spill-to-disk makes joins slower, but they will eventually finish rather than dying with out-of-memory errors.
Robert: One thing we didn’t cover in the slides, but that’s worth mentioning, is dictionaries. These were developed early on in ClickHouse and are still a really powerful feature. They allow you to keep the right-hand side of a join pre-loaded in server memory at all times.
Mikhail: Right. For slowly changing dimensions in a star schema, like resolving a client ID to a client name, or for rules-type lookups, dictionaries are the best possible approach. It can be much better than joins. But if your tooling generates joins automatically, the joins should also work. Just remember that dictionaries exist as an option in ClickHouse.
On materialized views with 10 to 20 target tables:
Mikhail: Having many materialized views triggered by a single table is generally not a good practice. Every single insert into the original table triggers inserts into all target tables. So every insert becomes 20 inserts, ClickHouse needs to deal with merges inside 20 tables, and so on. It’s like a fan-out of writes. If you can avoid it, please avoid it. The typical way to avoid it is either to push something inside the main table itself using projections, or if you need bronze/silver/gold tiers, trigger the movement between tiers not on every single insert but on a schedule, say every 10 seconds or every minute. That way your insert speed stays fast and the cost of filling the silver layer doesn’t impact the insert rate.
On sharding keys:
Mikhail: Sharding is a broad topic, but one of the main decision points is whether you have joins across tables. If you can shard so that related data is on the same shard, you can do joins locally, which is very important. If you can’t do local joins, the shards need to exchange large amounts of data across the network. Network speed is not something you can easily scale, and it will always be much slower than reading from RAM. If nodes need to exchange significant data, the query won’t scale linearly.
Robert: One thing we’ve run into at scale: when using a distributed table to query across shards, you need to be very careful about what happens on the initiator node where the query lands. If you write complex queries with many CTEs, it can force a lot of processing onto the initiator that gets serialized. I’ve seen cases where the actual scans across shards took a few seconds in aggregate across dozens of threads, but then it took 100 seconds to process the data on the distributed table engine because some operations ended up serialized. Keep your distributed queries simple where possible.
On adding shards in real time: if you partition by time, you can add shards simply by adding more hardware and letting new data naturally distribute across them. If you’re keeping data for 14 days, after 14 days, your new shards are equally balanced with the old ones. The sharding key design is still something to think hard about, but in the best case, you just add hardware and let the shards fill up.
Thank you all so much for joining us. If you have more questions, reach out to us at altinity.com, on our Slack channel, or on LinkedIn. We love databases and we love to help.
Mikhail: Thank you, everyone.
FAQ Section
Q: What is the most common ClickHouse schema mistake, and how do I fix it?
The most common mistake is choosing a PARTITION BY key that creates too many small parts, typically by partitioning on a high-cardinality column like tenant ID or by using a time granularity that’s too fine for the data volume. Each part requires ClickHouse to open and read files for every column in the table. Too many small parts means too many file opens and too little data per file, dramatically increasing scan time. The fix is to partition by time at a granularity that produces roughly a thousand parts total, then put tenant or high-cardinality dimensions into the ORDER BY clause instead. See the MergeTree key selection guide for detailed rules.
Q: When should I use async inserts vs. batching inserts at the application level?
Batching at the application level, sending large CSV or Parquet files of a million rows or more, is the preferred approach when feasible because you have full control over batch size, timing, and error handling. Use async inserts when you have hundreds or thousands of small concurrent writers that you cannot easily modify to batch, such as many microservices each writing a few rows at a time. Async inserts are not appropriate for the ClickHouse Kafka table engine, which already handles buffering internally.
Q: How do I identify slow queries and understand why they are slow?
system.query_log is the most important system table for query optimization. It records execution time, memory used, rows and bytes read, and a detailed breakdown of which parts of the query were expensive. The Altinity Knowledge Base section on useful queries has a collection of pre-built queries for analyzing system.query_log. For index effectiveness, check the server log or trace output to see how many granules your skip indexes are pruning.
Q: What are the biggest pitfalls when migrating from Splunk or Elasticsearch to ClickHouse?
Splunk and Elasticsearch use inverted full-text indexes that allow arbitrary substring or token searches across all fields with very low latency. ClickHouse organizes data in sorted columns and relies on the sort order for fast lookups. To replicate Splunk-style search patterns, you need to identify which columns are searched, add skip indexes such as Bloom filters or full-text indexes to those columns, and test extensively on real data. Bloom filters are powerful but require careful parameter tuning; see the Altinity Bloom filter blog post for theory and examples. Also, plan for engineering time: migration from a non-relational database to ClickHouse is not a drop-in replacement. Altinity’s migration support services can accelerate this process significantly.
Q: How should I think about resources for ClickHouse Keeper, and what metrics should I watch?
ClickHouse Keeper should not be under-resourced, even if it looks idle under light load. When Keeper gets overloaded under high insert rates, it causes slow inserts and eventually makes replicated tables go read-only. Key metrics to watch are: disk fsync latency (should be very low; Keeper does an fsync on every committed write), outstanding requests queue depth, and CPU utilization. Run Keeper on dedicated hardware if possible and size it larger than you think you need.
Q: What is write amplification, and why does it matter?
Write amplification is the ratio of data actually written to storage vs. data logically inserted. In ClickHouse, every small insert creates a new part. The background merge process then reads and rewrites those small parts into progressively larger ones, repeating this process until parts reach a size at which merging stops. Each rewrite amplifies the total I/O caused by the original insert. If too many small inserts arrive too quickly, the merge process can consume the I/O bandwidth and CPU cycles that queries need, slowing everything down. The solution is to send fewer, larger inserts or to use async inserts to let ClickHouse do the batching server-side.
© Altinity, Inc. All rights reserved. Altinity®, Altinity.Cloud®, and Altinity Stable® are registered trademarks of Altinity, Inc. ClickHouse® is a registered trademark of ClickHouse, Inc.; Altinity is not affiliated with or associated with ClickHouse, Inc. Kubernetes, MySQL, and PostgreSQL are trademarks and property of their respective owners.
ClickHouse® is a registered trademark of ClickHouse, Inc.; Altinity is not affiliated with or associated with ClickHouse, Inc.