Project Antalya Progress — ALTER TABLE EXPORT and Other Cool Features

Aurora Borealis bent
Her arch around the skies,
And up the wondering billows gazed
With phosphorescent eyes
Lydia Sigourney – Icebergs
Last month we released a long-awaited update to Altinity Antalya based on ClickHouse® 25.6. The new release is an important step in the Project Antalya roadmap that delivers better integration with data lakes and improved performance of queries over Iceberg data. Full release notes are available here. Let us highlight some of the important features that are now available for the users.
MergeTree Part Export to S3
The new SQL command is an extension of ALTER TABLE:
ALTER TABLE <MergeTree table> EXPORT PART <part_name> TO TABLE <S3 table>
The export is asynchronous, similar to mutations. It preserves MergeTree ordering and minimizes resources required for the export. As such it resembles ClickHouse moves between volumes, but the original data is not deleted. In order to track the progress, there is a new system.exports
table.
Let’s try exporting the popular Star Schema Benchmark dataset from ClickHouse to Parquet using this new feature. We will later use it to run some performance tests.
Star Schema Benchmark, or SSB for short, first appeared in 2007 as a modification of the TPC-H benchmark. Vadim Tkachenko from Percona used it in 2017 in order to run one of the first ever independent benchmarks of ClickHouse. Later on we modified it into the denormalized version, and it became our standard benchmark to compare performance of cloud VM types. See this or this, for example.
The original approach assumed data generation, but we have stored the dataset in S3 while back. In order to load the dataset into ClickHouse, we can use this simple command:
CREATE TABLE ssb.lineorder_wide
ENGINE = MergeTree
PARTITION BY LO_ORDERMONTH
PRIMARY KEY (S_REGION, C_REGION, P_MFGR, S_NATION, C_NATION, P_CATEGORY)
ORDER BY (S_REGION, C_REGION, P_MFGR, S_NATION, C_NATION, P_CATEGORY, LO_CUSTKEY, LO_SUPPKEY)
AS
SELECT *, toYYYYMM(LO_ORDERDATE) as LO_ORDERMONTH FROM
s3('s3://altinity-clickhouse-data/ssb/data/lineorder_wide_bin/*.bin.zstd', NOSIGN, 'Native')
LIMIT 10
SETTINGS max_threads=4, max_insert_threads=4, input_format_parallel_parsing=0;
The table contains 600 million rows and takes 73GB of storage. Note, that toYYYYMM(LO_ORDERDATE)
is materialized as a separate column. This is a limitation of export functionality that will be fixed later.
Once data is loaded, we also run OPTIMIZE FINAL in order to reduce the number of parts.
We are ready to export parts into Iceberg. Since we are testing it in Altinity.Cloud, we have a default catalog available and can create an Iceberg database like this:
CREATE DATABASE ice
ENGINE = DataLakeCatalog('http://ice-rest-catalog:5000')
SETTINGS
catalog_type = 'rest',
warehouse = 's3://altialya-2fv4arm7-iceberg',
auth_header = '[HIDDEN]'
The export can be done in the following steps:
Step 1: Create Destination S3 Table
Antalya can not export directly to Iceberg yet, so we have to export it to an S3 table first:
CREATE TABLE ssb.lineorder_wide_export as ssb.lineorder_wide
ENGINE = S3(
's3://altialya-2fv4arm7-iceberg/ssb/lineorder_wide/external_data',
format = 'Parquet',
partition_strategy = 'hive'
)
PARTITION BY LO_ORDERMONTH
Note that we use the S3 bucket of the Iceberg warehouse catalog directly. In Altinity.Cloud, the access is granted for ClickHouse process automatically, so no extra credentials are needed. Also for convenience we use a data path that matches the table location that we are going to export.
It is important that partitioning should be exactly the same for MergeTree and S3 tables. This is why we had to extract the LO_ORDERMONTH
column. The S3 engine table does not support partitioning expressions for Hive tables yet. It is going to be fixed in future releases.
Step 2: Generate EXPORT PART Commands
It looks ugly, but until there are no EXPORT TABLE
or EXPORT PARTITION
commands, we will have to generate export commands. It is not that hard:
SELECT 'ALTER TABLE ssb.lineorder_wide EXPORT PART \'' || name || '\' TO TABLE ssb.lineorder_wide_export settings allow_experimental_export_merge_tree_part=1;'
FROM system.parts
WHERE active and (database, table) in ('ssb','lineorder_wide')
FORMAT TSVRaw
It will produce EXPORT PART
statements for every part in the table:
ALTER TABLE ssb.lineorder_wide EXPORT PART '199201_1_81_2' TO TABLE ssb.lineorder_wide_export settings allow_experimental_export_merge_tree_part=1;
We are finally ready to run those commands and monitor progress in the system.exports table. Here is how a typical row looks like:
source_database: ssb
source_table: lineorder_wide
destination_database: ssb
destination_table: lineorder_wide_export
create_time: 2025-10-15 13:34:57
part_name: 199209_152_242_2
destination_file_path: ssb/lineorder_wide/external_data/LO_ORDERMONTH=199209/199209_152_242_2_75F9C52EB98248D1DE79F824B4289EA7.parquet
elapsed: 12.728964259
rows_read: 4530176 -- 4.53 million
total_rows_to_read: 7470835 -- 7.47 million
total_size_bytes_compressed: 915958664 -- 915.96 million
total_size_bytes_uncompressed: 1292033922 -- 1.29 billion
bytes_read_uncompressed: 904616045 -- 904.62 million
memory_usage: 660317081 -- 660.32 million
peak_memory_usage: 826682461 -- 826.68 million
By default, it is using 8 parallel threads, and it takes about 5 minutes to export 600M rows into S3 bucket on a 8vCPU VM. Now we are ready for the final step.
Step 3: Create Table With ice
Since the data is already inside the warehouse location, we only need to register files without copy using ice.
ice insert ssb.lineorder_wide -p \
's3://altialya-2fv4arm7-iceberg/ssb/lineorder_wide/external_data/*' \
--no-copy --thread-count=10 \
--partition='[{"column":"LO_ORDERMONTH"}]'
It takes a few seconds, and we can see the table in ClickHouse! It is even smaller than the MergeTree source, taking only 56GB – thanks to more aggressive Parquet default compression.
The typical query runs in half a second from Iceberg:
SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROM ice."ssb.lineorder_wide"
WHERE toYear(LO_ORDERDATE) = 1993
AND LO_DISCOUNT >= 1 AND LO_DISCOUNT <= 3 AND LO_QUANTITY < 25
We are going to discuss benchmarking in a separate article. Now let’s look into other new features.
Iceberg Improvements
We mentioned in the previous chapter that the Iceberg table takes 56GB. How did we know that? Of course from system.tables! This is yet another improvement of Altinity Antalya – now a user can see more useful information about Iceberg in system.tables
.
So if we run:
SELECT * FROM system.tables WHERE database = 'ice' and table = 'ssb.lineorder_wide'
We will see row counts, size and even partition and sorting keys (the latter one is empty since we did not define it)
Row 1:
──────
database: ice
name: ssb.lineorder_wide
engine: IcebergS3
...
partition_key: LO_ORDERMONTH
sorting_key:
...
total_rows: 600038145 -- 600.04 million
total_bytes: 56616254533 -- 56.62 billion
...
Let’s now try this one:
SELECT LO_ORDERMONTH, count() FROM ice."ssb.lineorder_wide"
GROUP BY 1 ORDER BY 1
┌─LO_ORDERMONTH─┬─count()─┐
1. │ 199201 │ 7733051 │
2. │ 199202 │ 7233211 │
3. │ 199203 │ 7721819 │
4. │ 199204 │ 7476931 │
5. │ 199205 │ 7731519 │
...
It runs in less than 0.3s! Thanks to count() optimization based on Iceberg metadata. It is controlled by the allow_experimental_iceberg_read_optimization
setting that is enabled by default in Antalya builds. If you disable it, the query above would take a long eight seconds!
What’s Next?
We have shown some of the new features of Altinity Antalya. As you can see, this is a work in progress, and functionality seems incomplete. So in 25.8 we are going to release a few more important updates:
EXPORT PARTITION
– Allows users to export full table partition without generating part specific SQL statements- Partition expressions in export destination tables, so we could use
toYYYYMM(LO_ORDERDATE
) without the need for an extra column - Even faster count() queries on Iceberg tables
But hold on. Why do we need to keep two versions of the data – one in MergeTree and another one in Iceberg? One of the main goals of Project Antalya is to have hot data stored in MergeTree, and cold in Iceberg. The EXPORT is an important step. The ultimate goal is to query data consistently from both sources. This is already coming. Get ready for something really cool!
ClickHouse® is a registered trademark of ClickHouse, Inc.; Altinity is not affiliated with or associated with ClickHouse, Inc.