ClickHouse® Best Practices: Ingestion, MergeTree Storage, and ORDER BY Optimization

In this Office Hours (January 2026), Altinity engineers walk through ClickHouse best practices covering ingestion methods and trade-offs, how parts, merges, and background threads affect write performance, when to tune settings like max_insert_block_size and async inserts, and how to choose and evaluate ORDER BY keys for optimal compression, query speed, and disk efficiency.

Introduction

Dive into ClickHouse Best Practices, specifically focused on data ingestion and MergeTree storage optimization, with the Altinity engineering team in our January Office Hours.

The first half session covers ingestion: which methods ClickHouse supports (SQL inserts, streaming engines, Kafka, file loading, and connectors), the trade-offs between formats like JSONEachRow, CSV, Parquet, and Protobuf, and how to choose based on your use case. The conversation then goes into how parts, merges, and background thread pools interact with ingestion performance, including the two part-count thresholds that trigger insert delays and refusals, how to prevent too many small parts through batching and asynchronous inserts, and when to tune settings like max_insert_block_size and insert quorum.

The second half shifts to ORDER BY design in MergeTree tables: how the ORDER BY drives compression ratios, primary index efficiency, disk layout, and query execution. The team walks through how to pick and evaluate ORDER BY keys using a real customer case—a UUID-and-date table where swapping column order pushed the compression ratio from 1.46 up to 18x—and explains the interplay between cardinality ordering, skip indexes, data type sizing, and merge speed. The session closes with a live audience question on max_avg_part_size_for_too_many_parts and when that setting is actually relevant.

Office Hours Highlights

[1:37] Ingestion Methods: SQL Inserts, Kafka, File Loading, and Connectors
[3:31] Choosing Between Formats: JSONEachRow vs. CSV, TSV, Parquet, and Protobuf
[5:20] How Parts, Merges, and Background Threads Affect Ingestion Performance
[8:48] Too Many Small Parts: Thresholds, Throttling, and How to Prevent It
[11:54] Async Inserts, Buffer Tables, and wait_for_async_insert
[14:14] Tuning max_insert_block_size and When Insert Quorum Makes Sense
[21:23] The Role of ORDER BY in ClickHouse Storage and Query Execution
[26:54] Skip Indexes, Cardinality, and When Index Analysis Costs More Than a Full Scan
[28:25] Primary Index Memory Sizing, Data Types, and UUID Encoding Pitfalls
[34:38] How ORDER BY Choice Drove a 140 GB Disk Saving in a Real Customer Case
[39:12] Evaluating and Iterating on ORDER BY Keys in Production
[44:41] Audience Q&A: max_avg_part_size_for_too_many_parts Explained

Office Hours Transcript

[00:00:00.00]

Without further ado, I’m going to go ahead and start this meeting. My name is Patrick Galbraith. I’m on the ClickHouse support team within Altinity, and I’m very pleased to be here today. I have two esteemed colleagues here to help answer questions.

Today is January 21st, 2026. Can’t believe it’s already here. If we haven’t yet told you, Happy New Year! One of our updates is that we have a best practices guide for ClickHouse TTLs available at the link below. I’m excited about it; there are a number of questions I have myself. Please do check it out, and hopefully you’ll find some good information in there.

Now we’re going to move on to our questions. These look like pretty good questions that I’ve been reviewing over the last several days. The topic today is ClickHouse Best Practices. First and foremost, we’ll start with something we always need to do: load data into ClickHouse.

[00:01:37.24]

The first question is: What ingestion methods does ClickHouse support, and how do you choose between them in practice?

[00:01:49.07]

That’s quite a topic. ClickHouse is compatible with standard SQL INSERT statements, so you can use those. But it’s not the best approach, as it can be slow. ClickHouse also supports variations of the INSERT statement with several input formats.

You can use streaming engines as well, and they work fine. I like the approach with Kafka or Redpanda for ingestion, but it doesn’t cover all use cases. If you have a lot of tables, you might end up spending a lot of CPU holding data from Kafka into ClickHouse. You can also use connectors such as Kafka Confluent Connect. Loading files via the ClickHouse client is quite convenient for one-shot ingestion.

How to choose in practice? It depends on your use case. If you’re using a traditional ETL approach, the choice is tied to how the driver interacts with ClickHouse. For example, if you’re using a Java tool like Pentaho, you can configure batch sizes on the Pentaho side, but the interaction with ClickHouse will be handled by the ClickHouse JDBC driver.

[00:03:31.18]

By default, the JDBC driver uses raw binary formats to send data, which is fast. Two things I consider when picking a method are convenience and performance. For example, when collecting log data with tools like Fluent Bit, you may be restricted in which message formats you can send to Kafka, so sometimes you pick the format that’s most convenient.

Generally, ingestion performance is the main factor. Some formats, like JSONEachRow, are very slow. Others are convenient and fast enough, like CSV and TSV. Some offer additional features. Parquet, for instance, or Protobuf, which lets you add domain constraints to the schema. Those are also very fast.

In practice: if I need real-time ingestion, I go with streaming ingestion, particularly with Kafka. If I’m doing ETL, I investigate what formats and features the tool I’m using supports for interacting with ClickHouse.

[00:05:08.10]

Thank you. That’s a great question, and one that could easily be discussed for hours.

[00:05:20.00]

The second question: How do parts, merges, and background threads affect ingestion performance?

[00:05:30.16]

Parts are essentially the storage units in ClickHouse. Partitions are a logical concept, which we’ll discuss in a later topic. Every time you do an insert, you create a single part. Generally that’s fine, but high-frequency inserts will create a lot of parts, which pushes ClickHouse to merge them in order to reduce the active part count for a table.

There are two thresholds for inserts. One is when ClickHouse begins delaying inserts, and the other is when it starts refusing them. If the number of active parts for a partition being actively ingested exceeds the first threshold—currently around 1,000 active parts—ClickHouse introduces delays to throttle ingestion. This applies regardless of ingestion method: whether you’re doing direct inserts or using Kafka, those delays will occur. If you exceed the maximum number of active parts per partition, inserts will be refused entirely.

As for background threads: ClickHouse allocates fixed thread pools for background activities. Merges and mutations share the same pool. Other tasks draw from the background common pool. Fetching parts for replication uses a dedicated pool.

[00:07:11.21]

If you keep those thread pools saturated, you may see high CPU usage, which impacts insert performance. On insert, ClickHouse parses the data, sorts it, forms blocks, and writes them to disk. If you’re using replication, it also commits that information to Keeper. All of that can impact ingestion throughput.

If you’re running too many merges: for example, if you’ve bumped the background pool size close to or beyond your CPU count, you’ll keep all cores busy with merging. When you’re simultaneously ingesting at high speed, these processes compete for memory, CPU, and I/O. If write latency increases due to heavy merging, inserts will slow down as well.

Moving on to too many small parts and how to prevent them. If you’re creating too many small parts, you’ll hit those same two thresholds. There’s also one thing I should mention.

There’s another limit in ClickHouse: the total number of active parts for a table. The default is fairly high: 100,000. This is also related to how you partition your table, since a very granular partition key can contribute to this.

If you’re inserting too many small parts, you’re constantly triggering the merging of level-zero parts. Ideally, the part creation rate on a server should be around one part per second. You can handle up to 10-20 parts per second depending on how many CPUs you have and how good your storage is, before things start to slow down. Beyond that, ClickHouse starts throttling and may refuse inserts. You’ll also accumulate a high number of active parts because ClickHouse is constantly prioritizing the merging of those level-zero parts.

The key idea with the merge tree is to give ClickHouse enough time to create large parts. To prevent this situation, you need to batch more data. If you create bigger parts, ClickHouse will merge those level-zero parts into larger parts, and over time it will merge those into even larger parts, up to a defined limit.

[00:10:28.13]

If you don’t allow enough time for ClickHouse to merge old partitions, or if you have no partitioning at all, parts will remain small and numerous. So even if you’re not experiencing issues with too many parts right now, you may start to in the future if you’re not giving ClickHouse room to merge.

Aggressive merging is not something I recommend. It tends to use a lot of CPU and will impact both read and write queries. Having a high number of parts increases query complexity. If instead of 10 parts, you have 1,000 parts to scan for the same query, ClickHouse has to open and close files across 1,000 folders, that significantly increases query execution time.

To prevent this: if you control ingestion, batch more. Create bigger batches on the client side and send them to ClickHouse at a lower frequency.

[00:11:54.17]

If you don’t control the ingestion client directly and you’re not using Kafka—because with a Kafka table engine you can actually control the flush interval and block size—one important thing that’s often overlooked is having a good partitioning strategy. If the average part size is large enough, too-many-parts issues won’t occur, though ingestion can still get slow.

Preventing too many small parts comes down to batching more. If you can do that on the client side, you can also use Asynchronous Inserts. Asynchronous Inserts help avoid creating too many small parts. There’s also the option of buffer tables. Buffer tables carry a risk of data loss as do Asynchronous Inserts by default, but you can use the `wait_for_async_insert` setting set to 1, which causes your application to wait until the insert is committed to the table, avoiding data loss. The core idea is the same: create larger batches of data to commit to the table.

[00:13:38.22]

Larger parts are the goal. One factor that can interfere here is partitioning, but we’ll come back to that later.

[00:13:49.11]

I see that at the bottom of the question list. Thank you very much. You touched on the Kafka question I was going to ask about, since we get that from customers and engineers all the time: Why do I have so many small parts? The controls are right there in the Kafka settings you mentioned. Thank you.

[00:14:14.18]

Question number four: How do settings like `max_insert_block_size`, async insert, or insert quorum affect data loading, and when would you tune them?

[00:14:29.07]

Not all the time, actually. It depends on which format and ingestion method you’re using. For formats like CSV or JSON Values, the server controls block emission and block sizes. The `max_insert_block_size` default is generally fine for most workloads. It targets parts of up to one million rows or 250 MB. There are also `min_insert_block_size_bytes` and `min_insert_block_size_rows` settings that interact with each other, since there are two thresholds: either the size or the row count minimum/maximum is reached.

If ClickHouse controls that process, you can tune it to create bigger parts. For a daily ETL load or a backfill, it’s a very good idea to raise the defaults. Fewer parts are better overall: you won’t hit the active-parts-per-partition threshold, you won’t see insert failures from too many parts, and you won’t see delayed inserts from a high active part count.

[00:16:06.05]

For workloads with a constant or growing rows-per-second rate, you may encounter the same issue. If you control the ingestion process—that is, you’re inserting directly into ClickHouse rather than using a Kafka engine—you can raise those three settings to create bigger parts. The right time to tune them is when your pipeline’s rows-per-second rate increases and you need to avoid flooding ClickHouse with too many parts.

For async inserts: there are legacy applications where you can’t control how they interact with the database, and they may end up doing one insert at a time, or emitting just one row per insert. One common approach to avoid this is to use a broker like RabbitMQ or Kafka to hold those events and handle the batching. If you can’t use such a component to abstract that layer, then you need to use asynchronous inserts to avoid creating many parts from single-row inserts.

Insert quorum is something I generally don’t recommend.

[00:17:47.04]

That said, there are cases where you need to ensure that once data is ingested, it’s available on a certain number of replicas, preferably all replicas. The problem is that it’s harder to work with in a fault-tolerant system. By default, ClickHouse is quite resilient: it sacrifices strong consistency in favor of availability and partition tolerance. You can keep ingesting data, and it will eventually become consistent once the issues preventing replication are resolved. For example, after a network partition, replicas will resynchronize when connectivity is restored. It’s not always perfect, sometimes you need to intervene manually, but the default behavior is robust.

Insert quorum forces ClickHouse to operate in a synchronous mode, essentially a distributed commit. The impact on data loading is that it increases the latency of each insert. For example, with a two-replica cluster, the minimum quorum is one, so there’s no real point in using quorum there. But with three replicas and a minimum quorum of two, in a DR scenario where one replica is in a remote data center with 50-100ms latency, every single part commit will incur that additional network latency.

[00:19:25.23]

If the geographically closest replica is down, and your quorum is two, you’re now forced to use that distant replica. You can also set quorum to `auto`, which dynamically probes replicas, but that probing process has its own cost and will increase ingestion latency. I would recommend insert quorum only for sensitive data where consistency guarantees are critical, not for high-throughput workloads.

[00:20:06.02]

It goes back to that quote: there are no perfect solutions, only trade-offs.

[00:20:12.08]

Exactly. I prefer the asynchronous approach, and then you either design your application to handle occasional data that isn’t yet available, or you can use ClickHouse’s sharding features to abstract some of that. For example, by distributing reads across replicas and skipping replicas that are behind by more than X seconds. That said, one common mistake is to ingest all data into a single replica. That replica won’t have much to replicate, meaning if you’re querying across your cluster, it will rarely be “too late” and may end up serving more read queries than the others in your shards. That can actually work in favor of read scaling, but again, it’s a trade-off. It may make sense for small tables.

[00:21:23.15]

That’s why we like ClickHouse. It gives us so many options. The next section is about performance at the table level with ORDER BY. Choosing ORDER BY keys: The first question in this section is, what role does ORDER BY play in ClickHouse storage and query execution?

[00:21:51.04]

For storage, the biggest impact is compression ratio. Second is that having a column sorted in a particular order helps when you need to scan it. ClickHouse can also use your ORDER BY to optimize certain queries via two mechanisms: `optimize_read_in_order` and `optimize_aggregation_in_order`. If you have `ORDER BY (A, B, C)` and your query groups by A, B, and C, ClickHouse can split the data and generate aggregation states across multiple threads without needing a shared data structure. When data is not read in a consistent order, ClickHouse has to maintain a shared hash table that all threads access simultaneously. When reading in order and the GROUP BY matches your ORDER BY, you can apply aggregation without that shared overhead.

[00:23:15.24]

If you omit the PRIMARY KEY, it defaults to being equal to your ORDER BY. That’s actually common: most tables we review don’t have an explicit PRIMARY KEY for that reason. So for this question we can assume ORDER BY equals PRIMARY KEY.

The primary index is how ClickHouse efficiently searches your table. When columns in the ORDER BY or PRIMARY KEY appear in query predicates, ClickHouse can skip large portions of data. Due to how ClickHouse’s MergeTree is organized, it locates segments of data, called granules, and then scans and applies matching operations based on your query predicates. ClickHouse always scans at least one granule. The granule size defaults to 8,192 rows but adapts dynamically based on row size, and can grow up to 10 MB of data.

[00:24:43.07]

With that in mind, not crafting your primary key carefully has a severe impact on performance. Also, the PRIMARY KEY must be a prefix of your ORDER BY, they’re strictly related. It’s important to pick a good one so you achieve both good compression and good filtering at query time. It’s not just one role among many, it is the central optimization lever in ClickHouse.

[00:25:21.21]

And it also determines how data is laid out on disk, obviously.

[00:25:28.12]

Yes. That ties into how you read data. If you’re reading parts that weren’t sorted, you have to scan entire granules because you don’t know where the data is located. If your ORDER BY is, say, a random tuple, you have no way to optimize queries, the data has no useful locality. Data locality is how the primary index in ClickHouse optimizes queries. If you choose an ORDER BY that increases data locality, you can scan efficiently even for columns that aren’t especially large. Columns correlated with ORDER BY columns also get implicitly sorted, and one way to verify that is to look at the compression ratio for those columns. Good compression generally means low entropy, low cardinality, or correlation with another ORDER BY column.

[00:26:47.08]

I think you just answered question six there. How does column cardinality…? No?

[00:26:54.13]

Actually, yes, since skip indexes also come into play here. If the column you’re targeting with a skip index doesn’t correlate with the primary index, you end up not skipping many granules because the matching data is scattered throughout the table. What a skip index says is that a granule might contain matching data using bloom filters and token BF, for instance, which have false positives. When an index says a granule doesn’t contain data, there are no false negatives, so that’s reliable. But whatever remains may or may not have a match.

Suppose your table has 1,000 granules, and you have a column that isn’t in the ORDER BY but has a skip index on it. Instead of reading those 1,000 granules, the skip index might only help you avoid 1, so you still read 999. That’s not a good outcome. Index analysis can cost more than just scanning the data. One useful trick: when you run `EXPLAIN indexes = 1` on a query, the time that operation takes reflects the skip index analysis time. If your EXPLAIN takes 20 seconds, your query will take at least 20 seconds plus the actual data scan.

[00:28:25.07]

The PRIMARY KEY and ORDER BY also impact the size of the primary index in memory. High-cardinality columns and certain data types can make the primary key consume more memory. Nowadays ClickHouse lazily loads the primary key; it won’t read all of it at startup. But once you start querying a table, it loads the primary key for the affected parts.

The size of data types matters significantly: the bigger the data type, the larger its in-memory representation. ClickHouse organizes the primary index most efficiently when you go from the lowest-cardinality to the highest-cardinality column. This reduces the number of entries ClickHouse needs to maintain in memory and makes your primary key smaller. Consider that ClickHouse is designed to handle terabytes or even petabytes of data.

[00:29:54.10]

You can’t create a primary key that represents 5% of your table’s size and expect that to scale. For a 100 GB table, 5% is 5 GB—that might fit in memory on a 128 GB machine, and you can bump the memory setting to accommodate it. But for truly large datasets, that’s not feasible, especially for data you’re not actively querying, where ClickHouse won’t bother loading the primary key.

For example, if you’re storing UUIDs as strings, each entry is 36 bytes. A better pattern: if you have a UUID, encode it as a UUID type. Even better is to use hashing, like `cityHash64`, to hash the value into a 64-bit integer, which halves the memory footprint. Computing a range on a string representation of a UUID is also non-trivial for ClickHouse.

[00:31:13.07]

A range is essentially a comparison from a start to an end in a certain order. If you sort your data with columns A, B, and C, where A has the lowest entropy or fewest distinct values, and those values correlate with B and C, you can navigate those ranges efficiently. But if A has only three distinct values, it won’t provide much selectivity, which means it’s not particularly useful as the first column in your primary key.

You can actually build the primary key in a way that allows individual columns, A, B, or C, to be used independently. The further left a column is in the ORDER BY, the more efficiently it can be used for search. This affects both the size of the primary index and how efficiently query planning works. After planning, ClickHouse knows exactly which granules to read then it moves from searching to retrieving and scanning.

[00:32:39.21]

If you have a primary key that is either too large or provides poor selectivity, you’re left with one of two problems: fast planning but excessive scanning because you’re not filtering much, or slow planning because the primary index is too large. The same logic applies to data skip indexes.

To put it simply: if you’re searching with a query that uses only one column from your primary index, the further right that column appears in the ORDER BY, the more granules ClickHouse has to scan to satisfy the query. Think about cardinality ordering: going from lowest to highest cardinality lets you identify matching ranges more efficiently. But the reverse ordering can sometimes be necessary too, depending on your access patterns. The most important predicate in your query should guide the ORDER BY. Don’t assume that lowest-to-highest cardinality will always be optimal.

[00:34:18.15]

If it is the other way around, you end up with many correspondences to the high-cardinality column that relate to your predicate, and you end up scanning more data. That’s also a problem.

[00:34:38.13]

How does the choice of ORDER BY impact compression ratios and disk usage?

[00:34:44.16]

Greatly, actually. I’ll use a UUID example because this was a recent customer tuning case. The table had two columns, essentially a UUID and a date, and they ordered by date first, then UUID, based on the cardinality of each column. Date is the lower-cardinality column compared to the UUID. But the number of UUIDs per day was very high, and they were encoding the UUID as a string.

The first thing we did was keep the same ORDER BY but switch to a proper UUID type. The compression ratio went from 1.34 up to 1.46. For three months of data, that translated to a significant saving, around 140 GB on disk. The access pattern was always filtering on the UUID. Since integers compress far better than strings, more repetition, simpler structure, we swapped the two columns in the ORDER BY.

[00:36:11.10]

After swapping, the compression ratio for that UUID column went from 1.46 up to 18. The ratio of rows per UUID was roughly 1:45 to 1:100 depending on the month. That drastically improved compression.

Since the use case always queried three months of data, essentially scanning the whole table, trying to optimize the primary key for search was pointless. Partition pruning was already handling that. So the main gain came from compression alone.

The process I’d suggest: start with the lowest-to-highest cardinality ordering for columns used in your predicates, then check compression ratios. If a column is used in nearly every query, pushing it to the left in the ORDER BY can improve query performance. But it’s an exploratory process. Keep in mind that lower compression means more disk space, which works against one of the core optimizations of columnar storage: reducing the amount of data loaded from disk.

[00:37:51.05]

A low compression ratio significantly impacts query performance. The wrong sort order yields low compression ratios across columns overall. That said, some cases are simply impossible to optimize. Certain complex data types are inherently high-entropy, packed doubles, for instance. Aggregation states stored in AggregatingMergeTree tables can also be difficult to compress, and adjusting the ORDER BY there is unlikely to help much. But that’s one of the less frequent optimization scenarios you’ll encounter.

[00:39:12.24]

Moving to the next question: How do you evaluate whether a given ORDER BY key is optimal for a real workload?

[00:39:23.19]

I can share how I approach it. It’s not a strictly scientific method, but it works well in practice. The first step is to think about your queries. Your access patterns inside the table will determine which columns belong in your ORDER BY and PRIMARY KEY. That’s where you begin; it cuts down your exploration space significantly.

After that, start loading data. One effect an ORDER BY has is on merge speed. A very large ORDER BY means merges will be slower. Once data is merged, you don’t need to re-sort it. But a heavy ORDER BY might turn merges that normally take a few seconds into multi-minute jobs, and for TTLs with `DROP PARTS`, processing large amounts of data means reading all of it, removing rows, re-sorting, and writing back to disk. That definitely impacts merging.

[00:40:46.00]

So my second step is to run an ingestion pipeline with expected production volumes and monitor how long merges are taking. After that, and I’ll stop trying to count steps, I look at compression ratios. Overall table compression is a good starting point. If you see something like 5x or 10x, that’s fine; higher is better.

Then I look at column-level compression efficiency. If I see ORDER BY columns with poor compression ratios, that’s a warning sign, especially for large columns. A column that totals 100 MB uncompressed for a huge table is not a big deal. But a few terabytes uncompressed is a big deal. Start playing with the worst-compressed column; try moving it.

[00:42:18.07]

If you move the worst-compressed column to the first position in the ORDER BY, it will generally improve compression. To validate, grab a sample, like a single partition, and create a few test tables with variations of the ORDER BY, then compare compression ratios.

Also, after any change, benchmark your representative queries. Check the query plan to see if it’s efficiently using the primary index, and measure actual query speed. A good query plan in ClickHouse doesn’t guarantee a fast query. One reason: in observability workloads, you often have very large text columns. You can get an excellent query plan, but scanning that text column can expand from a few gigabytes to several terabytes of decompressed data, and that will be slow regardless.

Once everything looks good, I leave it alone and only revisit if a new primary use case emerges that the schema wasn’t initially designed for, in which case you need to reevaluate the ORDER BY or consider a secondary schema.

[00:43:47.23]

A schema that wasn’t initially designed for such use cases requires reevaluation—either changing the ORDER BY or introducing a secondary schema tailored to the new access patterns.

[00:43:59.07]

Thank you very much. Normally I wait until the very end for questions, but given that we’re tight on time, let’s see if anyone has questions at this point.

[00:44:20.09]

There’s a question in the chat: What happens if your average part size is greater than `max_avg_part_size_for_too_many_parts`? I’ll have to verify the exact setting name, but I’ll let you go ahead.

[00:44:35.19]

I can take this one.

[00:44:41.09]

This setting was designed for large tables that don’t use partitioning. We generally advise using partitions because they’re very efficient for data retention. But if for some reason you need a table without partitions, there’s a consideration: the too-many-parts check is measured per partition. If your table has no partitions and a lot of data, the total part count can be quite large. This setting was implemented to prevent the too-many-parts exception from triggering incorrectly in that scenario, by only accounting for smaller parts in the threshold calculation.

The question is whether SELECTs are affected by the number of parts, and the answer is: yes, they are. But since you’re not partitioning the table, the part count is spread across the whole table rather than per-partition. Imagine a huge table with a trillion rows and a thousand partitions—the number of parts per partition would be divided accordingly. If you choose not to partition, or if you’re running a query that has to read through all parts regardless of partitioning, the query performance will be the same either way.

[00:46:49.03]

To clarify: partitioning helps with queries when you apply a filter on the partition expression; ClickHouse will prune partitions accordingly. But if you’re querying the table without filtering on the partition expression, it doesn’t matter whether you have partitions or not, query performance will be the same. The number of parts is what matters in that case.

So this setting doesn’t affect query speed directly, the table design does. The setting exists specifically to prevent the insert-throttling mechanism from activating for tables structured this way. Parts larger than the threshold configured in this setting will be excluded from the too-many-parts count.

The default value is reasonable; you probably shouldn’t change it unless you have a large unpartitioned table where this mechanism is triggering unnecessarily.

[00:48:46.13]

I’ve never used it, because I’ve always worked toward reducing the parts-per-second rate and haven’t run into this situation.

[00:48:53.14]

It’s not a normal situation.

[00:48:59.10]

It’s a corner case.

[00:49:02.09]

It’s fairly unusual, though it would probably come up more often in ClickHouse Cloud, because there are no shards there. All data lives in a single shared storage, and tables are not split across servers. So in ClickHouse Cloud, large tables tend to be even larger in a single context, making this scenario somewhat more relevant.

[00:49:41.20]

Right. Makes sense. Thank you.

[00:49:51.07]

Thank you very much. Looking at the clock, we’re right at the top of the hour. I think we should save the partitioning questions for the next Office Hours. There are some good ones there, and we could easily spend another hour on those topics alone.

[00:50:24.08]

I want to thank Anselmo for answering questions, and Tatiana and Diego for being here and contributing. I’m very privileged to work with these people; they are walking encyclopedias of ClickHouse.

We also have the Altinity Slack channel. You’re always welcome to join and ask questions; we love being helpful. If you have any questions related to this session or anything else, please feel free to ask. We like to put smiles on people’s faces when it comes to ClickHouse.

Thank you very much for being here today. The feedback form link is in the chat. It’s Hump Day, so have a great rest of the week, and we’ll see you again next month!

Listen to the full conversation on the Altinity YouTube Channel. For more insights on open-source ClickHouse and real-time data architecture, visit our blog.

Watch on YouTube Submit Your Questions

PRODUCTS

OPEN SOURCE SOFTWARE

CLICKHOUSE® SOLUTIONS

Get in touch with ClickHouse experts.