New Tips and Tricks that Every ClickHouse® Developer Should Know

Recorded: April 12 @ 10:00 am PT
Presenter: Robert Hodges (CEO @Altinity).
Robert Hodges, Altinity CEO, delivers seven practical ClickHouse® developer tips designed to improve performance, reduce storage costs, and unlock powerful but underused features. The session builds directly on his 2019 “Tips and Tricks” talk, adding techniques that have either matured significantly or are entirely new.
Highlights include reading S3 Parquet files at scale using named collections and wildcards, dramatically reducing storage with column codecs and compression, and a subtle but high-impact ORDER BY trick that repeats the time column to allow the sparse index to skip large portions of a part. Robert also covers ReplacingMergeTree for real-time deduplication and upserts, storing RBAC definitions in ZooKeeper so they replicate automatically across cluster nodes, and TTL recompression to progressively shrink aging data without deleting it.
The final and most powerful tip shows how aggregation with groupArray and ARRAY JOIN can replace large-table joins entirely, delivering sub-second answers on hundreds of millions of rows. Robert closes with a recommendation to build materialized views on top of this pattern for workloads that repeat the same aggregation under high insert load.
Here are the slides:
Key Moments (Timestamps)
Key moments generated with AI assistance.
- 0:04 – Introduction and housekeeping
- 1:19 – Speaker introduction and Altinity overview
- 3:37 – ClickHouse overview: columnar, shared-nothing, vectorized
- 6:09 – Reference to the 2019 tips and tricks talk; what’s new
- 8:05 – Tip 1: Read S3 data fast with wildcards and named collections
- 9:51 – Named collections: encapsulating S3 credentials and URLs
- 11:13 – Using max_threads to parallelize S3 reads; benchmark results
- 13:04 – Parallel inserts from S3 on a single host and with s3Cluster
- 15:25 – Tip 2: Reduce storage size using codecs and compression
- 16:05 – Before picture: unoptimized sensor data table
- 16:42 – How column blocks, compression, and codecs work together
- 17:18 – LowCardinality encoding, DoubleDelta, aliases, and ORDER BY
- 20:55 – Storage results: from 4.5 bytes to ~1 byte per row
- 22:00 – Q&A: LZ4 vs. ZSTD trade-offs; codec comparison table
- 24:31 – Tip 3: Repeat time column in ORDER BY to speed up time-based queries
- 25:48 – Why time-based partitioning with high-cardinality ORDER BY causes full part scans
- 28:34 – The fix: adding toStartOfDay(time) higher in the sort order
- 29:09 – MergeTree internals: parts, mark files, compressed blocks, sparse index
- 31:48 – Q&A: Does the primary key enforce uniqueness? (Answer: no)
- 32:58 – Tip 4: Handle mutable data with ReplacingMergeTree
- 33:34 – How ReplacingMergeTree deduplicates rows based on ORDER BY
- 34:01 – Use cases: upserts, stateful event streams, MySQL replication
- 35:19 – Version column best practice; insert, update, and delete patterns
- 37:27 – Why deduplication is lazy; when FINAL is needed
- 39:05 – Row policies to filter deleted rows; recent RMT improvements
- 41:43 – Tip 5: Store RBAC in ZooKeeper (or ClickHouse Keeper)
- 42:01 – The problem: SQL-created users don’t propagate to cluster nodes
- 43:32 – The fix: enabling replicated user_directories with a ZooKeeper path
- 44:36 – Querying RBAC data from system.zookeeper
- 45:21 – Tip 6: Reduce data size and costs with TTL clauses
- 46:26 – TTL row deletion: automatic drop after 12 months
- 46:05 – TTL extensions: aggregation, moves, and recompression
- 47:09 – TTL recompression example: ZSTD at 1 and 6 months
- 47:50 – Viewing compression savings in system.parts
- 48:13 – Tip 7: Use aggregation to simulate joins
- 49:36 – Why large table joins are an anti-pattern in real-time analytics
- 51:52 – Using groupArray and ARRAY JOIN to aggregate master/detail data
- 53:14 – The if-combinator: filtering arrays by entity type during aggregation
- 54:34 – Why this works: one scan, no data movement, sub-second results
- 56:43 – Q&A: Use a materialized view to cache aggregation results
- 1:00:47 – Q&A: Row-level and table-level permissions with RBAC and row policies
- 1:02:25 – Q&A: AggregatingMergeTree and SimpleAggregateFunction tips
Webinar Transcript
[0:04] — Introduction and Housekeeping
Robert: Hi everyone, and welcome to our webinar on new tips and tricks that every ClickHouse® developer should know. My name is Robert Hodges. I work for Altinity and I’ll be your presenter today. I’m backed up by the Altinity team, who we’ll introduce shortly.
Before we go too far, let me talk about a few things that will help you enjoy this webinar more. First, it is being recorded, and we will send a link to the recording as well as to the slides to everyone who has signed up, after the webinar is over. You’ll probably get them within about 24 hours. Second, we have time for questions. You can enter them in the Q&A box or put them in chat. My colleague Kiara Tosselli is watching the chat and the questions and will highlight them as they come up. If they’re relevant to what I’m talking about I’ll answer them right then and there; otherwise we can defer them to the end.
[1:19] — Speaker Introduction and Altinity Overview
Robert: Let me do some quick introductions. My name is Robert Hodges. I’m a database geek. I’ve been working on databases for 40 years. I started with pre-relational systems and worked at places like Sybase and on MySQL for many years. Now I’m working on ClickHouse, which along with Sybase and MySQL is one of my three favorite databases of all time. My day job is Altinity CEO, so I’m responsible for running the business, but I really like database technology and I do a lot of these presentations.
The only way I can do this is that I’m backed up by the Altinity engineering team. It’s a great team with an enormous amount of database experience. Cumulatively we have many centuries of experience running database systems, particularly analytic systems, which are the focus of today’s talk.
A little about Altinity: we are an enterprise provider for ClickHouse. We have enterprise support, and we also have a cloud platform, Altinity.Cloud. It was the first cloud for ClickHouse introduced on both Amazon and GCP, and has grown very rapidly over the last two and a half years. We’re also major contributors to open source. We contribute to ClickHouse itself, and we’re the authors of the Altinity Kubernetes Operator for ClickHouse®, which some of the people on this call almost certainly use. We’re very heavily focused on Kubernetes, and in fact all of our cloud technology is based on it.
[3:37] — ClickHouse Overview
Robert: Let’s introduce ClickHouse for the few people on this call who’ve never heard of it. It’s a real-time analytic database. It understands SQL, so in that sense it’s a bit like MySQL or PostgreSQL. It runs everywhere, from bare metal to cloud, and it’s open source with an Apache 2.0 license, which is a very favorable and well-understood license. You can use it and distribute it any way you want, with very few limitations.
On the analytics side, it has a shared-nothing architecture, where compute nodes connected by a network each own and manage patches of storage. ClickHouse is iteratively moving toward a separated compute and storage model, but that’s still a work in progress. Like most analytic databases it stores data in highly compressed columns, and it’s extremely good at executing things in parallel. Vectorized execution means we break a column of data into arrays, push it onto the CPU, and take advantage of things like single instruction multiple data (SIMD) to do sums and other mathematical operations quickly. It can also spread queries across many nodes, combining I/O and compute capacity from multiple machines to answer a query quickly. It scales to many petabytes. Companies like Cloudflare have tens of petabytes on ClickHouse, we have customers with hundreds of nodes, and it underlies analytics at Microsoft Bing and Microsoft Clarity.
[6:09] — Reference to the 2019 Talk
Robert: Before I get into the new tips, let me mention the original tips and tricks talk, which I did about three and a half years ago. It was a very popular talk, originally a meetup talk and then put on YouTube, and it has thousands of views. It suggested a bunch of basic things you could do to just make ClickHouse work better, faster, and more cost efficiently. Most of those tips are still good today. The only one I’d qualify a bit is “use replication instead of backups.” That’s changed: ClickHouse now has better backup mechanisms and we do recommend using them. But it’s a great place to start.
In this new talk I have seven new tips. I’ll repeat a little bit of one: encodings, because they’re so critical. But all the other stuff is new. As I go through the tips I’ll also try to explain what’s going on behind the scenes, because if you understand how a tip works you can do other things with ClickHouse, because you understand the plumbing underneath.
[8:05] — Tip 1: Read S3 Data Fast with Wildcards and Named Collections
Robert: The first tip is something that’s a big change over the last few years: the ability for ClickHouse to read S3 data fast with wildcards and named collections. S3 support has been around in some form for about three years, but it wasn’t very good at the beginning. It has gotten way better.
A typical way to read S3 data, either for a direct query or to read data lake information and put it into a table, is using the s3() table function. Table functions are functions you can call in ClickHouse that return tabular results, similar to table engines but used inline in a query. You’ll set it up pointing to S3 and specify the format, in these examples Parquet.
There’s a feature in ClickHouse called named collections, introduced about a year and a half ago. These are super useful because previously with the S3 function you had to supply URLs, credentials, and all that information inline in the SQL call. Named collections let you store that information on the server in a configuration file. In this particular case, the configuration encapsulates the S3 bucket location, the access credentials, and the format. Importantly, it also supports wildcards in the URL, so you can point to something like orders/*.parquet and ClickHouse will read all matching files. This keeps your code clean and doesn’t expose credentials.
Once you have the named collection set up, the s3() function invocation is very simple. You just reference the collection name and ClickHouse handles the rest. You can then run aggregates directly over Parquet, and here’s the interesting part: by adjusting the max_threads setting you control the number of vCPUs assigned to process the query. I ran this query repeatedly, doubling max_threads from 2 to 32. The first three doublings basically doubled performance. By the time we reached 16 it tailed off, probably due to bandwidth limitations on the host. This was a query over 150 million rows in Parquet, and the parallelization effect is clear.
For loading data, you can do an INSERT ... SELECT directly from S3 using max_threads and the companion max_insert_threads setting, which adds parallel threads for the insert side. You can insert 50 million rows per second on a single host that way. For clusters with multiple shards, there’s s3Cluster(), which distributes the SELECT and INSERT across all machines. Note that s3Cluster() doesn’t currently support named collections, so you have to provide the arguments directly, but it’s a very fast way to load data at scale.
Robert: There’s a question from Prashant: what’s the maximum value you can set for max_threads?
You can set it to practically any value, but whether it helps depends on your hardware. By default, ClickHouse sets max_threads to half your vCPU count. You can set it up to the full vCPU count. If you set it above that it won’t do any good. And as we saw, you often see performance flatten before you even hit that ceiling. Try it out and see how far it goes.
[15:25] — Tip 2: Reduce Storage Size Using Codecs and Compression
Robert: Tip two is reducing storage size using codecs and compression. This was in my last talk, but it’s such a great feature that it’s something everybody should always be thinking about.
Let’s start with a before picture. This is a table of sensor data readings, with about a trillion rows of test data. The basic table in this case is not very optimized: it doesn’t pay particular attention to data types, doesn’t care about partitioning or ordering, and does nothing to transform the data for efficient storage.
When you store a column, the data is stored as a big array in a file. That file gets compressed. Values in the column are divided into blocks, and each block is compressed using by default LZ4, or if you specify it, ZSTD1. What codecs do is transform the data before compression, so it becomes smaller before the compression algorithm even touches it.
LowCardinality encoding is a form of codec. If you have string values in your column, for example an airline tail number of six characters, running it through LowCardinality will turn it into dictionary encoding: all the strings in the part get a dictionary with integer numbers. That string of six bytes becomes two bytes, and then ZSTD compresses integer values very well, so you get something pretty small.
To apply this, you just modify your table definition using the CODEC() keyword. For example, the DoubleDelta codec for an integer column stores not the value itself but the change of the change, the slope of the slope. For monotonically increasing integers like monitoring data, this can reduce your data by 99 percent or more. Then you hand the remainder to ZSTD and in ideal cases you can get compression rates of 99.8 percent. LowCardinality is typically effective when you have 10,000 or fewer unique string values within a part.
One additional trick: the ALIAS keyword means a column isn’t actually stored but is computed from another column on the fly. It saves space without losing any query functionality. Partitioning and ordering also affect compression because sorting values, for example monotonically increasing integers, causes codecs like DoubleDelta to work much better.
The effect is really impressive. This dataset runs over a trillion rows. The unoptimized table was already pretty good at 4.5 bytes per row due to repetition in the data. After optimizing data types and adding codecs with LZ4 we got it down to 1.3 bytes per row. Using ZSTD brings it close to 1 byte per row. That means a trillion rows of data can be stored in a little over a terabyte, and not only does that save on storage costs but queries run faster because they simply have to read less data.
Kiara: Robert, there’s a question: what are the performance impacts of using this kind of compression at query time?
Robert: Yes, clearly there’s a trade-off. ZSTD gives better compression than LZ4 but is more expensive, particularly on writes. If you’re ingesting a lot of data, ZSTD will slow down your ingestion because data has to be compressed on write. You also pay that price again whenever blocks are merged. So if you want high ingest performance but don’t mind a bit more storage, go for LZ4. If you’re really looking for deeper compression, ZSTD is the way to go. ZSTD is particularly good at compressing integers. The LowCardinality string combined with ZSTD is actually a great combination, because ZSTD is better than LZ4 at compressing the output of LowCardinality encoding.
One more thing: at any time on existing data you can ALTER TABLE and put in new codecs. ClickHouse will apply them on any newly arriving data and merges. It’s easy to change the compression and codec scheme on the fly.
There was also a question about partitioning: are there partitioning schemes supported apart from time-based? Yes. Partitioning is how you divide a table up so ClickHouse knows how to break it into pieces. You can use any value you want. Time-based partitioning is the most common because it results in relatively large parts that often align well with how you manage data, for example keeping monitoring data for seven days and then dropping it. Monthly partitioning is a very common pattern that results in about the ideal number of parts per server. But any formula that can be computed from the table works. As for table size limits, there are no practical limits. If the table gets very large you can shard it across multiple servers.
[24:31] — Tip 3: Repeat the Time Column in ORDER BY to Speed Up Time-Based Queries
Robert: Tip three is a cool one I ran into recently, and actually one where I gave someone some misinformation on the ClickHouse Slack workspace. Let me explain what’s going on and hopefully the right answer.
Time is a common component of table ordering. Big data tends to have a time dimension, and that dimension is very significant in data management. The most recent data is the most interesting; the oldest is the least. Organizing by time helps address that. We’ll often order data in large tables and include time in the ordering, which also helps compression because monitoring values that increase over time will be nicely arranged.
We’ll also often choose to partition by some value that gets us about a thousand parts on the server. ClickHouse likes that number. It balances the number of files it has to open on startup and gives it very long ranges of data to scan. Larger parts are generally faster to scan. Partitioning by month is a very common pattern to get this ideal number of parts.
But this can lead to some bad results. If we’re partitioning by month for overall efficiency but we’re most interested in data from the first day of the month, we’ll have queries like: select some aggregates where we reduce a time value to the start of a day and filter on that. Well, what’s the problem? Looking at the logs, when ClickHouse actually executes this query, because time has the highest cardinality it ends up on the right side of the ORDER BY. To go and find all records that have the time we’re interested in, because they’re spread across user IDs and session IDs, ClickHouse has to read the entire part. The log messages confirm it: we had to read all 28,544 marks just to answer a question about data from one day.
The fix is to add the time value again higher up in the sort order. Specifically, add toStartOfDay(time) near the top of the ORDER BY, before the high-cardinality columns. This puts time up there twice, and ClickHouse can now find all data for a single day in one contiguous section of the part.
Let’s look briefly at what’s going on under the covers in a MergeTree table, because this will explain why this works. For each column there’s a pair of files: a narrow mark array that indexes compressed blocks, and the compressed blocks themselves. There’s also the sparse index, which ClickHouse calls the primary key index. Unlike a primary key in MySQL, it’s not a uniqueness constraint. Its real goal is to be very small and locate ranges of data efficiently, fitting entirely in memory regardless of how big the table is. If we have a column in the sparse index we can use it to read only the relevant blocks and skip the rest.
After making the change and adding toStartOfDay(time) to the ORDER BY, the same query only scanned less than 5,000 of 61,000 marks, a roughly 12x improvement in data read. The query ran in about a third of the time.
One important note: the question of whether the primary key enforces a constraint comes up a lot. The answer is no. ClickHouse has no notion of referential integrity. This is common among data warehouses. You can get duplicate data. I’ll talk in another tip about how to avoid that problem.
[32:58] — Tip 4: Handle Mutable Data with ReplacingMergeTree
Robert: How do we deduplicate data if there are no constraints?
This is a great time to talk about ReplacingMergeTree, possibly the least known but most interesting of the MergeTree table variants. When you use ReplacingMergeTree it automatically deduplicates rows based on the ORDER BY. Here’s an example with a Sakila film table organized by language_id, studio_id, and film_id. Any row that arrives with the same values for these columns will replace the older one.
Use cases: upserts, where you want to either update or insert. ReplacingMergeTree (RMT) handles that. Another use case is stateful data, for example mobile phone calls where you’re reading call detail records and you want to know the current state of the call: the last record is the one that wins. Another very convenient use is for data replicated from other databases like MySQL, which is where this Sakila example comes from.
The best practice is to use a version column. This is optional, but it allows you to specify that the highest version number wins. Without this ClickHouse has an internal number it can use, but there are corner cases where it might not work correctly. With an explicit version you’re absolutely sure which record wins.
To insert: you add a record with version 0 and sign +1. To update: you insert a row with the current version number and sign -1 to remove the old one, then a row with version +1 to add the new one. To delete: a final row with the highest version number and sign -1.
When you do a SELECT right after this you’ll actually see two rows. That’s because RMT has a lazy approach to merging. The algorithm works even when data is spread across multiple parts inserted at different times. ClickHouse merges in the background, and when parts merge it keeps only the newest rows. But you can’t count on merges happening, because parts can reach a size where ClickHouse leaves them alone permanently. A change arriving after that point will never merge.
The solution is the FINAL keyword, placed after the table name in the query. It ensures everything is properly merged before returning results. There’s also a newer setting that automatically appends FINAL so you don’t have to change your application code.
For the deletion case, ClickHouse doesn’t actually delete rows: it leaves the last one in. But there’s a trick using row policies, a feature that adds a predicate automatically inserted into any query. You set a row policy that says: if you’re looking at this table and the sign is -1, don’t show it. That makes the deleted rows disappear from all queries.
Recent RMT improvements worth knowing: the FINAL setting can now be applied automatically. And there’s a new variant contributed by ContentSquare that automatically garbage-collects deleted rows after a period of time, eliminating the need for row policies.
[41:43] — Tip 5: Store RBAC in ZooKeeper or ClickHouse Keeper
Robert: Tip five is storing RBAC in ZooKeeper or ClickHouse Keeper, which turns out to be super useful. It’s a recent feature contributed by folks at Yandex.Cloud.
The problem: when you use role-based access control you can create users, roles, and profiles using SQL. But if you execute CREATE USER on one ClickHouse server, it gets stored by default in a local directory on that server. If you have a cluster with other ClickHouse servers, they won’t get it. When users try to connect to those other nodes, their account won’t exist.
The workaround is to use CREATE USER ... ON CLUSTER, which executes the command on every server in the cluster. But you have to repeat that every time a new server is added. And because some servers may already have the user and some may not, you have to add IF NOT EXISTS. This is painful, and people have done it and lived to tell the tale, but there’s a better way.
In your ClickHouse configuration you can enable replication of RBAC data through ZooKeeper. You add a replicated tag with a ZooKeeper path in your users.xml configuration. Whenever you now create or modify users with SQL, those changes propagate automatically to all servers in the cluster. We use this in our own cloud. It’s a great feature and we’ve seen no stability problems with it.
One security note: this does mean you need to protect ZooKeeper, because previously this information was hidden on the file system. If someone can access ZooKeeper they can see user definitions. If you had unencrypted passwords, which is a bad idea, they would be exposed.
You can also inspect the actual RBAC settings by selecting from system.zookeeper using the configured path.
[45:21] — Tip 6: Reduce Data Size and Costs with TTL Clauses
Robert: TTLs, or time-to-live rules, are great for limiting data growth. They started as a way to time out rows: you define a TTL using a time column and an interval, and ClickHouse’s background process will fire at regular intervals, find rows that match the condition, and drop them. It’s relatively efficient because ClickHouse rewrites the affected parts.
But TTLs have evolved into a much richer feature. You can also aggregate data as it becomes older, collapsing source records into summarized records over a longer period. You can do TTL moves, moving data across different types of storage. And you can recompress data.
Here’s a more interesting TTL that does all of these: recompress data after one month using ZSTD level 1, recompress the data after a month again with a higher level of ZSTD after six months, and then delete after 12 months. You can see the effect very easily in system.parts. The most recent data at default compression is around 600,000 bytes. After a month it drops to 327,000, almost a 50 percent reduction. After six months it drops by another quarter to a third. By six months out you’ve saved over 50 percent of storage with no loss of information. And of course this can also make queries faster, but the key thing is it saves storage.
[48:13] — Tip 7: Use Aggregation to Simulate Joins
Robert: This is my favorite tip by far: using aggregation to simulate joins. It’s a very powerful and mind-bending capability.
A little background. When you have very large databases with different types of entities, the standard relational approach is to have a separate table for each entity with its own primary key. But in ClickHouse we have a bias for large tables, putting all entities into a single table. The reason is that you can see them all at once with a single scan, without going to multiple tables.
This is a very important type of relationship: master/detail, like invoice header and invoice rows. In our example, a transaction header that describes the transaction in general, and transaction state changes that record what happened during the transaction. Typically in MySQL or PostgreSQL you would just do a join. But joins like this are an anti-pattern in big data systems when you want real-time behavior. To make a join work you may have to move data to different locations so join keys line up correctly, as happens with shuffles in Spark. This is slow and variable. What we’re trying to do with real-time analytics on ClickHouse is get very fast, consistent responses, with 95 percent of them under a second.
Instead of a join, we can use aggregation. Aggregation does a single scan of the data to collect what we’re aggregating. ClickHouse has powerful extensions that let aggregation simulate a join. We use GROUP BY as our join key, and ClickHouse’s groupArray aggregate function accumulates values into arrays in the order they arrive. Because we have paired values arriving in the same order, we can use ARRAY JOIN to explode them back out to normal rows, putting header information into each detail row.
There’s also the if-combinator, which says: if I’m looking at a message type of a specific value, stick it in the array; otherwise ignore the row. Without this the groupArray wouldn’t help because we have different entity types and we only want to collect values for the relevant entities. And then there’s anyIf, an aggregate function that grabs any value it sees for a particular column, used to pull header values into the detail rows.
Here’s what the result looks like: the transaction ID is shared, the change information came from the detail record, and the start time came from the header. This is blindingly fast. You have one scan through the data, and this works no matter how broadly spread out your data is. Instead of a join taking seconds or minutes, you can get the answer in a fraction of a second even across multiple hosts. It works because ClickHouse’s vectorized scan is doing exactly the same accumulation in hash tables that a join would do under the covers.
That’s it for the seven tips. Each one, if you understand how it works, gives you real insight into big data and distributed systems and how ClickHouse operates.
[56:43] — Q&A: Materialized Views for High-Ingest Aggregation
Robert: There’s a great question: the aggregation looks awesome for near-real-time analytics, but if you’re pumping thousands of new records every second into a MergeTree, would the aggregation work or would it refresh every time a new record is added?
In the example I showed, the aggregation runs dynamically. Every time you ask the question, you run the full scan. If you ask it 20 times, you pay the scan cost 20 times.
There is another way. You can put this query into a materialized view. If you know you’ll be asking the same question constantly, you can build a materialized view that runs this aggregation on every arriving block of data and puts the results into a new table. Then you query that table instead. You’ve got the data already pre-aggregated. If you have heavy insert load and constantly need fast responses, build a materialized view based on this query and query that. This is exactly the optimization our support team pointed out when reviewing this presentation.
[1:00:47] — Q&A: Row-Level and Table-Level Permissions
Robert: Another question: if you’re taking an ELT approach and ingesting data from various sources directly into ClickHouse, is there a way to enable row-level or table-level permissions so only platform engineers have access to raw data before it gets encrypted?
The short answer is yes. RBAC gives you the ability to enforce table-level permissions, similar to the MySQL security model but with some enhancements. For row-level control there are row policies, which add predicates automatically inserted into queries. One thing to note: row policies are currently one policy per table per user, so with a lot of users and a lot of tables they can be a bit difficult to manage. We’re actually working on a feature to template them so a smaller number of policies can work for everybody.
[1:02:25] — Q&A: AggregatingMergeTree and SimpleAggregateFunction
Robert: There’s a question about interesting examples of using AggregatingMergeTree. The main thing to know is that AggregatingMergeTree is used to pre-compute aggregates. They’re a bit complicated to use because they expose intermediate SQL results using special syntax. However, there’s something called SimpleAggregateFunction, a data type that can be used when you’re computing simple aggregates like max, min, or count. These can be queried without special merge syntax, which makes AggregatingMergeTree much easier to work with for these cases. Beyond that I’ll think about this question and come back with more tricks in a future talk. It’s a really powerful table type.
Robert: Thank you all for the questions and for your attention. Please do contact me directly. You can DM me on the ClickHouse Slack workspace or the Altinity Slack workspace. We’re happy to answer questions. Come to altinity.com, contact us, or if you’d like to talk more about your apps we’re here to help. Thanks again, and thank you Kiara for your help on the webinar today.
FAQ
How does ClickHouse’s s3() table function work with named collections and wildcards?
Named collections allow you to store S3 bucket URLs, credentials, and format information in a server-side configuration file rather than embedding them in SQL. You then reference the collection by name in the s3() table function call, keeping code clean and credentials out of queries and logs. Wildcards in the URL path, for example orders/*.parquet, tell ClickHouse to read all matching files in one call. Parallelism can be controlled with the max_threads and max_insert_threads settings, and for multi-node clusters the s3Cluster() function distributes the work across all shards.
What is the difference between LZ4 and ZSTD compression in ClickHouse and when should I use each?
LZ4 is ClickHouse’s default compression algorithm. It is fast to compress and decompress, making it a good choice for high-ingest workloads where write speed is critical. ZSTD produces significantly smaller files but requires more CPU on writes and during background merges. ZSTD is particularly effective at compressing integers and LowCardinality-encoded strings. For time-series or monitoring data where storage cost matters more than ingest speed, ZSTD at level 1 or higher is usually the right choice. Codecs can be changed on existing tables with ALTER TABLE ... MODIFY COLUMN without a full rewrite.
Why should I repeat the time column in ORDER BY and how does it help performance?
If a table is ordered by multiple columns and time is at the far right (because it has the highest cardinality), ClickHouse must scan the entire part to find all rows matching a time-based filter. By adding an expression like toStartOfDay(time) higher in the ORDER BY, you give ClickHouse a way to use the sparse index to skip large portions of the part. This can reduce the number of marks read from the full part down to a small fraction, cutting query time proportionally. This optimization is most valuable on large parts where most queries target a narrow time window.
What is ReplacingMergeTree and when should I use it instead of regular MergeTree?
ReplacingMergeTree automatically deduplicates rows that share the same ORDER BY key, keeping only the newest version. It is useful for upsert patterns, tracking the current state of stateful entities such as phone calls or transactions, and synchronizing mutable data from OLTP databases like MySQL. Deduplication is lazy, meaning it happens during background merges and cannot be guaranteed at query time. Use the FINAL keyword to force deduplication at query time, or enable the setting that appends FINAL automatically. Use an explicit version column to ensure deterministic behavior when rows arrive out of order.
How do TTL recompression rules work in ClickHouse?
TTL recompression rules progressively apply stronger compression to data as it ages. You define multiple TTL clauses in the table definition, each specifying a time interval and a target codec. For example, data can be recompressed with ZSTD level 1 at one month and ZSTD level 6 at six months, then deleted at twelve months. ClickHouse applies these rules during background merges and part processing. You can verify the effect by querying system.parts and comparing compressed byte counts across partitions of different ages. In practice this can reduce storage by 50 percent or more over the data lifecycle with no loss of information.
How does aggregation with groupArray replace large-table joins in ClickHouse?
Instead of joining two related entity types across multiple rows, you use GROUP BY as the join key and groupArray to accumulate paired column values into arrays during a single table scan. The ARRAY JOIN clause then explodes the arrays back into individual rows, with header values attached to each detail row. The if combinator filters which rows contribute to each array by entity type. This approach avoids data movement, requires only one scan regardless of how many rows there are, and can answer questions in a fraction of a second that would take seconds or minutes with a traditional join on large tables.
© 2023 Altinity, Inc. All rights reserved. Altinity®, Altinity.Cloud®, and Altinity Stable® are registered trademarks of Altinity, Inc. ClickHouse® is a registered trademark of ClickHouse, Inc. Altinity is not affiliated with or associated with ClickHouse, Inc. Kubernetes, MySQL, and PostgreSQL are trademarks and property of their respective owners.
ClickHouse® is a registered trademark of ClickHouse, Inc.; Altinity is not affiliated with or associated with ClickHouse, Inc.