Amping up Real-Time Query on Data Lakes with ClickHouse® and Antalya Swarms

Recorded: May 21 @ 08:00 am PDT
Presenters: Robert Hodges & Altinity Engineering

This webinar introduces Project Antalya, Altinity’s open-source initiative to extend ClickHouse® with Apache Iceberg as shared storage and swarm clusters as elastic compute. Presented by Altinity CEO Robert Hodges and CTO Alexander Zaitsev, the session walks from problem to solution to live demo and roadmap.

The problem is straightforward: data volumes have grown exponentially, and the replicated block storage model underlying open-source ClickHouse now costs more than 10x as much as object storage at scale. At the same time, ClickHouse’s single-binary architecture bundles inserts, merges, mutations, and queries together on the same nodes, leading to overprovisioning and wasted compute.

Project Antalya addresses both problems. On the storage side, it integrates Apache Iceberg as a first-class table format, replacing expensive replicated MergeTree storage with shared Parquet files on cheap S3-compatible object storage that any Iceberg-compatible application can read. On the compute side, it introduces swarm clusters: pools of stateless ClickHouse nodes that register themselves via ClickHouse Keeper auto-discovery, receive delegated Parquet file scan tasks from a native ClickHouse initiator, and scale up or down in seconds.

The session covers the full setup flow on Amazon EKS using the Altinity Kubernetes Operator for ClickHouse®, the four-layer cache architecture that keeps S3 API costs and latency under control, performance introspection via system tables, and tips for using spot instances to reduce compute costs by up to 50%. Alexander demonstrates a preview of Altinity.Cloud’s swarm and Iceberg catalog UI, compares feature availability between upstream ClickHouse and Project Antalya builds, and walks through the roadmap: Altinity Ice (open-source Iceberg REST catalog and data loading tool), AWS S3 Tables support, tiered MergeTree-to-Iceberg storage via natural TTL syntax, and write support targeted for Q4 2025.

Here are the slides:

Real-Time Query on Data Lakes with Antalya Swarms-2025-05-21 Download

Key Moments (Timestamps)

Key moments generated with AI assistance.

00:06 – Welcome and housekeeping
01:23 – About Altinity: Altinity.Cloud and enterprise support
02:31 – What is ClickHouse? Quick overview
04:46 – The problem: exploding data volumes, expensive block storage, compute overprovisioning
08:04 – Introducing Project Antalya: Iceberg storage, swarm clusters, 100% open source
10:40 – Deployment architecture on Amazon EKS with the Altinity Operator
12:04 – Setting up a Project Antalya cluster: Terraform, manifests, and autodiscovery
13:09 – How the swarm query model works: initiator, swarm nodes, Iceberg catalog
18:00 – Querying Hive and Iceberg tables with the object_storage_cluster setting
21:32 – What is Apache Iceberg? Metadata, catalogs, time travel
23:52 – Connecting ClickHouse to an Iceberg REST catalog with CREATE DATABASE
26:40 – Cache architecture: Iceberg metadata, S3 list objects, Parquet metadata, S3 filesystem cache
29:40 – Cache configuration settings and how to put them in profiles
32:15 – Altinity.Cloud preview: launching swarms and catalogs with a single button
37:21 – Spot instances: cost savings, scaling speed, and interruption handling
40:00 – Operator settings for faster swarm startup: max concurrency
41:49 – Performance introspection via system.query_log and system.events
43:32 – Exploring catalog contents with virtual columns
45:54 – Feature comparison: upstream ClickHouse vs. Project Antalya builds
48:02 – Roadmap: Altinity Ice, AWS S3 Tables, write support, tiered MergeTree-to-Iceberg
52:00 – Resources, GitHub repo, and closing summary
55:32 – Q&A: serverless/Fargate swarms, spot instance failure handling, cost-performance analysis

Webinar Transcript

[00:06] – Welcome and Housekeeping

Robert: Hi everybody, and welcome to our webinar on amping up real-time query on data lakes with ClickHouse® and Project Antalya swarms. My name is Robert Hodges. I’ll be one of your presenters today. I’m joined by Alexander Zaitsev, who is CTO of Altinity. I’m the CEO. We’re both pretty interested in databases and have done a great deal of work on this, so we hope to present what we’ve accomplished and give you a chance to try it out yourself.

Before we dive in, a couple of housekeeping items. This webinar is being recorded. You will get a link to the recording as well as the slides after it’s done, usually within a few hours. You don’t have to scramble frantically. It’s all coming to you. We do have plenty of time for questions. Type them into the Q&A box on the Zoom control bar, or into the chat. We don’t stand on ceremony. If it’s a question that’s interesting, we’ll try to answer it as we go along, but we should have some time at the end as well.

[01:23] – About Altinity

Robert: Let’s talk briefly about Altinity. We have been in business as a provider of support and cloud operation for ClickHouse since 2017. Our two main products are Altinity.Cloud, a popular cloud platform for ClickHouse. It was the first to run ClickHouse in Amazon, GCP, Azure, and Hetzner. It runs not only in a SaaS model in our account, but also has a very popular BYOC offering where you can run it in your own account. We also offer enterprise support for those of you who love to run ClickHouse yourself. You can run it anywhere from Amazon to your own on-prem locations. We’ll help you do it better.

[02:31] – What Is ClickHouse?

Robert: For those who don’t know ClickHouse, here is a very quick review. It is a famous real-time analytic database and probably the most popular open-source analytic database at this time, with over 40,000 GitHub stars, roughly the same as Apache Spark, which has been around considerably longer.

Why have people adopted ClickHouse? You can think of it as combining the best parts of MySQL and a traditional data warehouse designed for scalable, fast reads. From the MySQL side: understanding SQL, running practically anywhere, open source with a permissive Apache 2.0 license. From the analytics side: a shared-nothing architecture where you have a bunch of nodes with attached storage, columnar data storage for more efficient IO and better compression, parallel and vectorized execution, and the ability to scale to many petabytes. Even early on, ClickHouse was already in the petabyte range for some customers, including Cloudflare, which adopted it around 2017 and was a big influence on ClickHouse’s adoption in the North American market.

[04:46] – The Problem: Storage Costs and Compute Overprovisioning

Robert: ClickHouse was a huge advance when it arrived. It could query data hundreds or even thousands of times faster than PostgreSQL or MySQL. But we’re now nine years on from when ClickHouse was open-sourced on GitHub, and data sizes are beginning to outstrip the capacity of open-source ClickHouse.

When I started working on ClickHouse in January 2019, it was pretty rare for us to have a customer with 100 terabytes of data. We now have customers who add two petabytes of data per day to their ClickHouse clusters. This exponential increase in data is causing two practical problems.

First, storage costs. The shared-nothing architecture uses block storage, ordinary filesystems attached to the nodes ClickHouse runs on. To provide high availability and allow more users to query the data, ClickHouse replicates it. The big problem is that block storage is very expensive. If you’re using EBS on Amazon, by the time you account for replication, it is over 10 times more expensive than Amazon S3. So as data sets grow to hundreds of terabytes or petabytes, storage becomes the dominating cost.

Second, compute overprovisioning. ClickHouse has a nice model in the sense that there’s a single binary that does everything: inserts, merges, mutations, queries. But these all run inside a single cluster, so each node needs to be sufficiently provisioned to handle all of these potentially simultaneously. This leads to overprovisioning and wasted resources. ClickHouse cannot currently scale compute up and down quickly. I want to emphasize that we’re talking about open-source ClickHouse here. Some of these features are available in certain commercial clouds, but if you’re running open-source ClickHouse, either yourself or with a vendor, this is a problem you will face as your data grows.

[08:04] – Introducing Project Antalya

Robert: As engineers, our job is not to complain about these problems but to do something about them. Specifically, what we’re doing is Project Antalya, which solves the storage problem by allowing ClickHouse to use shared data in object storage and provides a flexible separation of storage and compute.

Antalya is an extension to ClickHouse built out of our own repos, using the same CI/CD pipelines as our Altinity Stable® Builds for ClickHouse®. You can think of Project Antalya as a branch of ClickHouse, which is literally what it is, and it contains several important capabilities.

First, we’re adding Iceberg for shared storage. If you’ve used ClickHouse Cloud, you may be familiar with SharedMergeTree, another type of shared storage based on the MergeTree table format. What we are doing is integrating Apache Iceberg, which is now widely used as a standard for querying data in object storage. It’s cheap, globally accessible, and can be shared with other applications because the standard is open and anyone with an appropriate library can read it. That’s a really important difference and a key reason Iceberg is a good solution for the shared data problem.

Second, we’re adding swarm clusters. Swarm clusters are where we do the compute-storage separation and allow you to pour on additional compute capacity to make queries on shared data run faster.

Third, all of this is 100% open source. Anything we talk about today is something you can grab, see the code, change, and bend to your will. Our basic philosophy is that the building blocks of these analytic systems are fully open source, and what we do as a business is manage and support them.

[10:40] – Deployment Architecture on Amazon EKS

Robert: When you set up a system that uses shared storage, you’re typically going to be running in the cloud. Here’s an example of a SaaS deployment on Amazon. We’re going to run on Amazon EKS. Kubernetes is a very popular way to run databases in the cloud. Our Altinity.Cloud is built on it, and we also wrote the Altinity Kubernetes Operator for ClickHouse®.

The way you do this, if you’re running on EKS: we have code examples in the Antalya examples repo. You set up an EKS cluster with compute, storage, and networking resources attached. You install the ClickHouse operator. Optionally you add Prometheus and Grafana for monitoring. Then you set up ClickHouse and Keeper.

It’s pretty simple. We have a project called Antalya examples that shows you how to run Project Antalya. It has documentation, code samples for Terraform, Kubernetes manifests, Docker, and so on. The first step is to clone the project. If you’re bringing it up on Kubernetes on Amazon, just cd into the Terraform directory and run terraform apply. Then update your kubeconfig. These are probably familiar commands to anyone working on Amazon. There are similar commands for GKE and other systems.

[13:09] – How the Swarm Query Model Works

Robert: Here’s the typical configuration we support. We start with a ClickHouse cluster that in most ways is identical to ClickHouse as you know it today. It can do everything that upstream ClickHouse can do. One of the things we’re very focused on is maintaining 100% compatibility with upstream ClickHouse as we add these capabilities.

What this ClickHouse cluster does is it has extensions so that when we’re reading Parquet data on object storage, we can delegate the file scans out to the swarm cluster. The swarm cluster is a bunch of stateless ClickHouse nodes that register themselves when they come up. We use Keeper to do this: as a swarm cluster node comes up, it goes to a path inside Keeper and registers its hostname. The native ClickHouse cluster that receives queries can then find the swarm nodes so it can send queries to them. Everything beyond that is just query planning and delegating scans to the swarm cluster, which then reads data from Iceberg or directly from S3.

Setting up the swarm cluster is very simple. You have four manifest files to run. These are adapted for Amazon but can be changed in a few minutes to run on any Kubernetes, including Minikube. Once it’s up, you can log in and verify that all nodes are up and can see each other. For anyone with basic Kubernetes knowledge, this is quick and easy.

The swarm cluster ClickHouseInstallation resource is in most ways identical to a regular ClickHouse cluster definition using the Altinity operator. One key thing is that swarm nodes only have shards, not replicas. There is just a single replica per shard. If you want to make the swarm bigger or smaller, you change that number and Kubernetes will adjust accordingly. There is also a reference to Keeper for autodiscovery.

Autodiscovery is a great ClickHouse feature that we extended so that you can actually have multiple Keeper instances: one for replication and an additional one just for swarm registration. The configuration adds a cluster_discovery tag that sets the Keeper path everyone agrees on for registering or finding swarm members.

[18:00] – Querying Hive and Iceberg Tables with the Swarm

Robert: Once you have this set up, you can begin running queries on data in object storage. It can be plain files on S3, files using Hive partitioning where partition column values are baked into the file path, or Iceberg tables which use a catalog to store metadata.

For Hive tables, there’s a special query setting called object_storage_cluster where you give it the name of your swarm. There’s also a setting similar to max_threads that specifies how many swarm nodes to use. If you don’t set it, we default to all available nodes. You can also have different swarms for different purposes. Underneath the covers, the query is being distributed out to potentially many servers, but from the user’s perspective it just runs and hands back results.

[21:32] – What Is Apache Iceberg?

Robert: With Hive, we go straight to S3 and take the data as we find it, either inferring the structure or reading it directly. Iceberg makes this much easier because it stores metadata. It has a way of defining metadata for files: which ones are in the table, what are their partition keys, and what is the current state of the table. One of the things that’s really cool about Iceberg is time travel. It shows the state of the table over time and you can go to different versions.

One thing to get used to about Iceberg is there’s no database in the traditional sense. Iceberg is really like database storage that has been detached from the database and made available through an open protocol that anyone can read. That’s defined in the Iceberg spec, which describes how metadata is managed and the process for doing things like adding files or changing them.

Most people use a catalog to interact with Iceberg. In the examples we’ll show, the catalog is a REST server that allows you to ask which tables are available and what their metadata looks like. The table files themselves are stored on S3, as is the actual Iceberg metadata content. A Python data science app using libraries like pyiceberg and pyarrow can read Iceberg metadata and select the matching files, and ClickHouse does exactly the same thing.

[23:52] – Connecting ClickHouse to an Iceberg REST Catalog

Robert: The first thing you do is tell ClickHouse about the catalog. In ClickHouse, a catalog is really a database. It’s just a list of tables. When connecting ClickHouse to Iceberg, you define a database with a CREATE DATABASE statement and point it at the catalog. The key settings are catalog_type = 'rest', the location of the catalog server, the path to the files on S3, and authorization information such as a bearer token. Once you do that, the tables in the Iceberg catalog become visible just as if they were tables inside ClickHouse itself.

One thing to notice is that these tables have double-barreled names. That’s because Iceberg allows files to have multiple scopes that ClickHouse doesn’t natively understand. So when you query them, you need to put those double-barreled names in backtick quotes. By the way, the data set we’re using in examples is the Amazon public blockchain data available on S3 in Hive format, which is a great way to quickly test things when you’re bringing this up.

Once you’re at this point, querying this data on object storage looks just like querying a MergeTree table inside ClickHouse, except you add SETTINGS object_storage_cluster = 'swarm' to tell it how to fetch the data.

[26:40] – Cache Architecture: Four Layers to Control Costs and Latency

Robert: Caches are a really important topic. Object storage is great, but if you have to go to S3 every single time you run a query to download files, things will be catastrophically slow. And one of the dirty secrets of S3 is that you are charged for API calls. Keep hammering the S3 API and two things could happen: you could get rate limited, or you could get a huge bill from Amazon. Reducing the number of API calls is job one to make this stuff run correctly and cost-effectively.

We have a four-layer cache architecture. For the initiator node, two important caches are the Iceberg metadata cache and the S3 list objects cache.

The Iceberg metadata cache stores parsed table definitions in memory. Iceberg metadata files are stored in voluminous JSON format that is slow to parse. If you have to re-parse it on every query, it can add seconds of overhead. The S3 list objects cache is also critical. The ListObjectsV2 API call in S3 is notoriously slow, and for queries that scan a small number of files out of a large collection, caching the object list can boost speed by 100x or more.

For the swarm nodes themselves, there are two more caches. The Parquet metadata cache keeps column metadata, min/max statistics, and Bloom filter indexes in memory so the swarm doesn’t have to reopen files every time to look inside them. If no blocks are needed, the swarm skips the file entirely. The S3 filesystem cache stores actual blocks read from object storage on local disk. This is a long-standing ClickHouse feature that stores S3 file blocks so they don’t have to be re-read. The goal with disk-based caching is to get those blocks into the OS buffer cache in memory, which is where things really speed up.

Most of these caches are on by default. The filesystem cache does require some configuration: you need to specify the cache path on disk and its maximum size. Since swarm nodes need local disk space for the cache, they do need some storage provisioned even though they’re otherwise stateless.

A great way to handle these settings is to put them into ClickHouse profiles so that users get them automatically without having to know about them.

[32:15] – Altinity.Cloud Preview: Swarms and Catalogs with a Single Button

Alexander: Hello everybody. I’m going to explain and discuss how to integrate these features into Altinity.Cloud and show you a preview of the Altinity.Cloud integration, as well as some other features applicable not just to the cloud but to any cloud user with Project Antalya.

As a cloud provider for ClickHouse, we help users operate it with the least possible effort. We’ve started to bake Project Antalya features into Altinity.Cloud. This is a preview. To launch a swarm, you just need a name, instance type, and number of nodes. You can configure spot instances in advance. If configured as spot, it will run on spot instances if available.

Under the covers, a lot of what Robert just explained is done automatically: the swarm gets registered in ClickHouse Keeper, the latest Project Antalya build is deployed, the swarm is configured for best performance with file cache enabled, and it’s registered in the cluster discovery registry.

Once the swarm is running, you can enable it for an existing ClickHouse cluster with a single button click. This adds the necessary Keeper registry configuration and discovery settings to the ClickHouse cluster. After a restart, the swarm shows up in your system.clusters table and is ready to use.

We’ve also developed our own REST catalog in Altinity.Cloud, which will be open-sourced. We make it available very easily. If you want to connect your existing ClickHouse instance to the catalog, there’s a wizard that generates the CREATE DATABASE statement for you. Once configured, you can immediately run SHOW TABLES and see the tables in the catalog.

Here’s an example: a simple query with grouping takes half a second on a five-node swarm. If you run the same query on a single node, it runs for 15 to 20 seconds. That’s the easy extension of your compute resources to access Iceberg data.

[37:21] – Spot Instances: Cost Savings and Trade-offs

Alexander: Some interesting tips. Spot instances often promise a 90% cost reduction compared to on-demand prices. In reality, the price is variable and depends on current demand and availability. You can expect something around 50% savings, which is still very good.

Spot instances are also very quick to scale up and scale down, on the order of seconds even in EKS without an autoscaler. You also pay for accurate seconds of usage, not full hours. If you use on-demand instances, you typically pay for a full rounded hour even if you started for five minutes. With spot instances, you pay for the exact seconds of usage, which is really good for autoscaling scenarios.

There are of course downsides. Spot instances can be interrupted at any time. If AWS needs to fulfill an on-demand request of the same type and there’s nothing available, it will terminate a spot instance. And sometimes spot instances of a certain type are simply not available. What would be ideal is a controller smart enough to fail over to on-demand instances when spot is not available, but that’s outside the scope of this talk.

[40:00] – Operator Settings for Faster Swarm Startup

Alexander: If you’re using the Altinity Kubernetes Operator for ClickHouse® and the examples we’ve shown, there are settings you can tune to start swarms faster. By default the operator is tuned for reliability and makes changes conservatively to ensure the cluster never goes offline. But for swarms, which don’t need storage, some of those guarantees can be relaxed.

One thing you can raise is the concurrency ratio. With the settings shown, you can tell the operator to start the swarm with 100% parallelization: create all nodes at once. If you request five or twenty nodes, it will try to create them all simultaneously, and it’s then a matter of whether the cloud provider can provide those resources at that moment.

With default settings, the operator goes one by one or at about 50% concurrency. In a new operator version we’re releasing this week or next, you’ll be able to configure max concurrency on a per-resource basis. Your regular ClickHouse clusters keep their safe sequential upgrade logic while swarms apply changes as fast as possible.

[41:49] – Performance Introspection via System Tables

Alexander: ClickHouse has everything built in for introspection. The most important system table is system.query_log. If you run a query or set of queries, it has a special ProfileEvents section tied to each query where you can see what was happening. You can see metrics like how many times the Iceberg metadata file cache was used, and how many requests were sent to S3. Our goal is always to reduce that S3 interaction as much as possible.

system.events, system.metrics, and system.asynchronous_metrics also have interesting counters specific to Iceberg and the swarm. We’re going to build Grafana dashboards based on these tables to help you see what’s happening inside. We’re not inventing anything new here; we’re building on the existing system table infrastructure that ClickHouse already provides.

[43:32] – Exploring Catalog Contents with Virtual Columns

Alexander: One interesting tip is how to explore what’s inside the catalog. ClickHouse has virtual columns, and using a specific query you can check all the paths that exist in the catalog for a particular table. You’ll see the filenames and can calculate the size of your data, which is not trivial to do otherwise. This data about all paths and sizes is already cached and stored in metadata, and in the future we’ll add extra logic to extract it even faster.

It’s also possible to query Iceberg data directly using either the Iceberg database engine or a table function, and they will produce the same results because they’re reading the same underlying data.

[45:54] – Feature Comparison: Upstream ClickHouse vs. Project Antalya Builds

Alexander: Here is the comparison of what works with upstream ClickHouse versus Project Antalya builds. Some features work in both, like swarm discovery and Iceberg catalogs. Some features, like running swarm queries with the object_storage_cluster syntax, do not work in upstream. You can use table functions in upstream for S3 to some extent, but for Iceberg with our authentication scheme, it’s not straightforward.

Some of the caches we implemented have been merged to upstream ClickHouse. Others have not. For example, the Parquet metadata cache, which is really important for S3 queries, is not supported in upstream. The full set of features is only available in Project Antalya builds.

Having said that, the ClickHouse team is working hard on catalog support as well. They’ve generalized it as a DataLakeCatalog database engine that represents Iceberg and other catalog types in a single definition. We will definitely adopt that as it matures.

[48:02] – Roadmap

Alexander: First, we’re announcing Altinity Ice. This is the open-source toolset for Iceberg REST catalogs that includes ice-rest-catalog for running a catalog and ice for loading data. Right now it’s not easy to load data into Iceberg from ClickHouse, and other tools like PyIceberg or Spark require a lot of infrastructure. With ice, you give it data and it loads it to the catalog with a single command. It’s a simple command-line utility.

In June, we’re targeting support for AWS S3 Tables. AWS announced last year their own extension of S3 called S3 Tables, which are catalog-managed Iceberg tables. They’ve since added an Iceberg REST endpoint to access this data in addition to AWS Glue. Our next goal is to test it and make any changes needed for ClickHouse to support it natively.

The Altinity.Cloud UI I previewed will be available in June, giving cloud users access to swarm launching, hosted catalogs, and all currently available features.

We’re also working on performance testing and comparisons with other technologies, including upstream ClickHouse and competitors like Trino. We plan to publish a performance report in June.

The write story is very important and currently in progress. We’re working on write support for Parquet first, which is a prerequisite for writing to Iceberg efficiently. Somewhere around Q3 2025, we expect to combine our efforts with the upstream ClickHouse team on this.

Robert: The main feature that Project Antalya stands for is tiered MergeTree and Iceberg tables. We’ve started work on this, but there are many parts to it. We expect it to be fully production-ready by Q4 this year. What it will allow is extending MergeTree to Iceberg: if you want data older than, say, one month to go to Iceberg, it will be automatically moved there using natural ClickHouse TTL syntax. On the query side, you’ll still query them with your regular SQL. No query rewrite will be needed.

[52:00] – Resources and Summary

Robert: Go to the Antalya examples repo. If you want to log an issue, log it on the Altinity repo, not upstream ClickHouse. They won’t know what you’re talking about. If you want to submit pull requests, we gladly welcome them there. We also wrote a blog article three weeks ago that gives a lot of the getting-started information. And join our Slack. If you’re a customer, reach out directly through the Slack or file support cases.

In summary: we’ve talked about the problem we’re solving, storage and compute costs, by extending ClickHouse to use Iceberg as table storage. This is a good time to do it. There’s a lot of industry adoption of Iceberg, and the fact that it’s shareable is a huge win because it allows other applications to use the same data, enabling completely new architectures.

Swarm clusters are the basic mechanism for compute-storage separation. They work now on reads. Writes are coming as fast as we can do them. Caches are critical to enabling the performance and cost efficiency that makes this approach worthwhile. As time goes on we hope you’ll need to understand these details less and less, but for now it’s useful to know what’s happening underneath.

Alexander showed a bunch of additional tricks, from spot instances to diagnostic queries. Two upcoming attractions: Project Antalya features are going into Altinity.Cloud as fast as we can build them, and Altinity Ice is already public and we’ll have a proper blog article together shortly.

If you’d like to sign up and try this stuff out in Altinity.Cloud, contact us. That’s going to make a lot of this very easy to access and use.

[55:32] – Q&A

Robert: There’s a great question from Nathan: has anyone looked at using a managed Kubernetes-native provider such as AWS Fargate or GCP Cloud Run for running swarm nodes?

I love this question. We’ve been talking about Kubernetes today because most of our stuff runs on Kubernetes, but everything we showed you when it comes to swarm clusters will run anywhere. Swarms can be built on anything that can spin up instances. As long as those swarm nodes can see the Keeper and register themselves, they will become accessible to the initiators when they need to delegate queries.

Use your imagination. I find Cloudflare Workers to be a fascinating option. Cloudflare recently also announced Iceberg support for their storage buckets, which is another area we’re planning to dig into. If you have ideas about where to create swarm clusters, try it out, report what you find, and help us fix any issues in the Project Antalya code so it works with those providers.

Alexander: On the question of what happens if a spot instance goes away: we did test that, and the query fails. This is on our to-do list to solve. There’s a fundamental corner case where if you’ve already started receiving results from a swarm node and it goes away mid-stream, you either get half-baked results or you fail. We may put a switch in for that. What we will do first is make the more obvious cases more robust: if we try to connect to a swarm node when planning the query and it doesn’t answer before giving any results back, we’ll drop it from the list and try another.

In general, we cannot avoid failure very easily when a swarm has already started streaming to the client. What we’ll do is make the failure window as small as possible and then probably add a setting where you can say whether you care about completeness or not. There are certain types of queries, especially in observability, where if a small part of the query returns partial results, it’s no big deal. So we’ll expose that as a configurable option.

Robert: On the cost-performance analysis question: we’re working on it. There was a great article from ClickHouse Incorporated recently on querying Parquet directly. ClickHouse is now about 2x slower on Parquet than on MergeTree, except in cases where queries can use a primary key index, where MergeTree still has a big advantage. But Parquet query performance is catching up fast, and the interesting comparison is against other systems like DuckDB and Trino as well.

Alexander: I’d add that storage is about four times less expensive when you go to object storage from local block storage. So the typical cost-performance analysis shows you might be two times slower but at four times less cost. And you can add extra swarm compute to compensate for that slowness. That’s exactly where the swarms are such an interesting idea: you can dial in exactly the price-to-performance point you want.

Robert: And for workloads that are time-sensitive, like trading applications where you need fast answers during market hours, you spin up a swarm designed to hit that latency target and then spin it back down when the trading day is done. Those are exactly the kinds of use cases this architecture enables.

Thank you so much to everybody in the audience for attending. It’s a real pleasure to see the enthusiasm around this and to hear these great questions and ideas. Alexander, thank you so much for helping on this webinar. Please do join, help us make it better. Good day, everybody, and we hope to see you soon working on a Project Antalya cluster.

FAQ Section

Q: What is Project Antalya and what problems does it solve?

A: Project Antalya is an open-source extension to ClickHouse® built by Altinity that solves two problems becoming acute at petabyte scale. First, replicated block storage used by open-source ClickHouse is more than 10x more expensive per gigabyte than S3 object storage. Project Antalya integrates Apache Iceberg as a first-class table format so ClickHouse can store and query data in cheap, shared Parquet files on S3. Second, ClickHouse’s single-binary architecture bundles queries, inserts, merges, and mutations together, making it hard to elastically scale compute. Project Antalya introduces swarm clusters, pools of stateless ClickHouse nodes that can scale up or down in seconds and be pointed at Iceberg data on demand.

Q: How do swarm clusters work and how are they set up?

A: Swarm clusters are pools of stateless ClickHouse nodes that register themselves in ClickHouse Keeper auto-discovery when they start up. A native ClickHouse cluster, called the initiator, receives a query, reads the Iceberg catalog to identify which Parquet files need to be scanned, then delegates those individual file scans to the swarm nodes in parallel. The swarm nodes stream results back to the initiator, which performs the final merge and sort and returns results to the user. Setting up a swarm cluster using the Altinity Kubernetes Operator for ClickHouse® requires only a few manifest files and is nearly identical in definition to a regular ClickHouse cluster, except that swarm nodes have no replicas and no persistent MergeTree storage.

Q: Why are caches so important for object storage query performance?

A: Without caching, every query must download fresh data from S3, which is both slow and expensive: Amazon charges per API call, and the ListObjectsV2 API in particular is notoriously slow. The Project Antalya cache architecture has four layers: an Iceberg metadata cache on the initiator node (avoiding slow JSON re-parsing on every query), an S3 list objects cache (which can boost performance by 100x on queries scanning small portions of a large collection), a Parquet metadata cache on swarm nodes (holding column statistics and Bloom filter indexes to skip unnecessary file reads), and an S3 filesystem cache on local disk that stores actual data blocks so they don’t have to be re-fetched from S3. Most of these caches are on by default, but the filesystem cache requires a small amount of configuration to specify its path and maximum size.

Q: What is the difference between Project Antalya and upstream ClickHouse for Iceberg queries?

A: Some capabilities overlap. Both upstream ClickHouse and Project Antalya support integrating with Iceberg REST catalogs and include Parquet Bloom filter and Iceberg partition pruning. However, the object_storage_cluster setting that dispatches swarm queries is a Project Antalya extension not available in upstream. Additionally, the Parquet metadata cache and S3 list objects cache were contributed to upstream but declined; they are only available in Project Antalya builds. The full set of performance-optimized features, including swarm query dispatch and the complete cache stack, requires running a Project Antalya build.

Q: What is Altinity Ice and how does it help with loading data into Iceberg?

A: Altinity Ice is an open-source command-line tool that makes it easy to start an Iceberg REST catalog and load data into it. It consists of two executables: ice-rest-catalog, which runs a REST catalog backed by etcd or a database, and ice, which loads Parquet files into an Iceberg catalog with a single command. Before Ice, loading data into Iceberg required either PyIceberg, Spark, or other heavy infrastructure that is complex to set up. Ice is designed to be a simple wrapper over the Iceberg Java API and works anywhere: Kubernetes, EKS, or local environments.

Q: What is the roadmap for writing data to Iceberg from ClickHouse in Project Antalya?

A: As of this webinar, Project Antalya supports reads from Iceberg. Write support is in active development. The first step is improving Parquet write support in ClickHouse, which is a prerequisite for efficient Iceberg writes. This work will be combined with the upstream ClickHouse team’s parallel efforts, targeting around Q3 2025. The larger goal is tiered storage: the ability to automatically move MergeTree data older than a defined threshold into Iceberg using natural ClickHouse TTL syntax, with no query rewrite required to continue reading from both MergeTree and Iceberg tiers simultaneously. This feature is targeted to be fully production-ready in Q4 2025.

© Altinity, Inc. All rights reserved. Altinity®, Altinity.Cloud®, and Altinity Stable® are registered trademarks of Altinity, Inc. ClickHouse® is a registered trademark of ClickHouse, Inc.; Altinity is not affiliated with or associated with ClickHouse, Inc. Kubernetes, MySQL, and PostgreSQL are trademarks and property of their respective owners.

PRODUCTS

OPEN SOURCE SOFTWARE

CLICKHOUSE^® SOLUTIONS

Get in touch with ClickHouse experts.

Related:

Leave a Reply Cancel reply