The Tournament of AWS CPUs in Altinity.Cloud

The Tournament of AWS CPUs in Altinity.Cloud

Bright shone the lists, blue bent the skies,
And the knights still hurried amain
To the tournament under the ladies’ eyes,
Where the jousters were Heart and Brain.

― Sidney Lanier. The Tournament

AWS keeps adding new hardware to their cloud services. A few months back we were testing new ultra-fast Graviton3 ARM instances that showed outstanding results for ClickHouse. Since then AWS added the 7-th generation of Intel and AMD powered VMs, so we could not resist hailing them into the lists to do an honorable contest. Today, we challenge Intel m7i, AMD m7a, and Graviton m7g in the ClickHouse benchmark tournament! TLDR; we’ve got a new champion!

The Three Knights

Let me introduce the challengers today. I will use quotes from AWS website:

M7a.xlarge

Amazon EC2 M7a instances, powered by 4th generation AMD EPYC processors, deliver up to 50% higher performance compared to M6a instances. These instances support AVX-512, VNNI, and bfloat16, which enable support for more workloads, use Double Data Rate 5 (DDR5) memory to enable high-speed access to data in memory, and deliver 2.25x more memory bandwidth compared to M6a instances.”

M7i.xlarge

Amazon EC2 M7i instances are next-generation general purpose instances powered by custom 4th Generation Intel Xeon Scalable processors (code named Sapphire Rapids) and feature a 4:1 ratio of memory to vCPU. EC2 instances powered by these custom processors, available only on AWS, offer the best performance among comparable Intel processors in the cloud – up to 15% better performance than Intel processors utilized by other cloud providers.”

M7g.large

Amazon EC2 M7g instances, powered by the latest generation AWS Graviton3 processors, provide the best price performance in Amazon EC2 for general purpose workloads. M7g instances are ideal for applications built on open-source software such as application servers, microservices, gaming servers, mid-size data stores, and caching fleets. They offer up to 25% better performance over the sixth-generation AWS Graviton2-based M6g instances. M7g instances feature Double Data Rate 5 (DDR5) memory, which provides 50% higher memory bandwidth compared to DDR4 memory to enable high-speed access to data in memory.”

The last entry was our previous winner. It performed better compared to 6-th generation m6i instances.

Altinity.Cloud infrastructure allows user to add new node types with a single user click, so I added a new m7i.xlarge and m7a.xlarge node types:

After 5 minutes, the new node types were available to start a cluster. Note: adding node types dynamically is currently available only for Altinity.Cloud Anywhere users; please contact Altinity support if you need extra node types in your Altinity.Cloud account.

In order to give all three node types even chances, I started 3 equally sized single node ClickHouse instances: 4 vCPUs, 16GB of RAM, and also 200GB of gp3 EBS storage.

For benchmarks, we always use the latest available ClickHouse version, which is 23.8.2.7 at the time of writing.

Once started, the 3 instances look great next to each other on my screen, only missing coats of arms on their shields.

The Field of Battle

One of the standard benchmarks we are using in order to test ClickHouse and hardware performance is the 600M row Star Schema Benchmark (SSB). We will use the same procedure as described in “Ultra-fast Data Loading and Testing in Altinity.Cloud”  and  “Altinity.Cloud Extends Managed Service to ARM” articles. 

For every instance, I created a ‘lineorder_wide’ table using this script. Data was loaded from an S3 bucket. There is one small difference compared to previous tests: in the latest ClickHouse versions, NOSIGN parameter needs to be added in order to access public S3 buckets:

INSERT INTO lineorder_wide
SELECT * FROM
s3('https://s3.us-east-1.amazonaws.com/altinity-clickhouse-data/ssb/data/lineorder_wide_bin/*.bin.zstd', NOSIGN, 'Native')
SETTINGS max_threads=4, max_insert_threads=4;

It is worth mentioning that max_threads=4 was the optimal setting for 4 vCPU instances. I tried higher and lower values but those resulted in slower loading times. With 4 threads the loading took 520-530s for every instance.

I have also run OPTIMIZE FINAL to rule out merges.

Once the data is loaded, we can run test queries. We use the same benchmark script as in the previous articles. It performs one warmup run for every query, then takes 3 test runs, and extracts the average query time from the query_log table using query SQL comments technique in order to exclude network latency. The test run command for ‘m7a’ cluster is the following:

$ TRIES=3 CH_CLIENT=clickhouse-client CH_HOST=m7a.us-east1.dev.altinity.cloud CH_USER=admin CH_PASS=*** CH_DB=default QUERIES_DIR=flattened/queries ./bench.sh

flattened/queries: 91d01b043e3e345d65172dbee7de9107  -
Q1.1.sql ... 0.357
Q1.2.sql ... 0.034
Q1.3.sql ... 0.054
Q2.1.sql ... 0.349
Q2.2.sql ... 0.187
Q2.3.sql ... 0.16
Q3.1.sql ... 0.268
Q3.2.sql ... 0.208
Q3.3.sql ... 0.257
Q3.4.sql ... 0.007
Q4.1.sql ... 0.138
Q4.2.sql ... 0.052
Q4.3.sql ... 0.051
Total :	2.124

It has been similarly executed for m7i and m7g.  The results are summarized in this chart:

Total query time for all queries (sec):

â–  23.8 m7a.xlargeâ–  23.8 m7i.xlargeâ–  23.8 m7g.xlarge
2.1242.9753.773

As you can see the AMD m7a wins the tournament hands down! Intel m7i is the second, though it is more than 50% improvement over the m6i series. Graviton m7g, which was the winner over m6 instances, is trailing far behind. Technology progress in CPU speed is remarkable.

It is interesting to learn why AMD is performing so well for ClickHouse workloads. We’ve seen similar results in GCP as well.

For comparison, I’ve re-run tests on 6-th family of instances, here are the total test times:

â–  23.8 m6a.xlargeâ–  23.8 m6i.xlargeâ–  23.8 m6g.xlarge
3.7054.0555.565

Note that the full experiment from starting ClickHouse to finishing the tests took me about 30 minutes. The speed of testing different hardware configurations in Altinity.Cloud is unbeatable.

The Cost

It is easy to get stellar performance in the cloud nowadays, but it sometimes may cost you a fortune. For easier comparison let’s use normalized values, taking the minimum time and cost as 1, and adjusting others proportionally:

â–  23.8 m7a.xlargeâ–  23.8 m7i.xlargeâ–  23.8 m7g.xlarge
SSB Benchmark time2.1242.9753.773
Normalized time11.41.78
On demand cost per hour in us-east10.23180.20160.1632
Normalized cost1.421.221
Score (normalized time x cost)11.221.25

As you can see, the m7a is not only the fastest, but also the most expensive one. The price difference is compensated well by performance, so at the end of the day the price-performance score is the best. The last row represents our score – lower is better. The m7g is very cheap, but is also the slowest, so the price-performance is much worse. The m7i is a bit better than m7g.

Conclusion of the Tournament

AWS constantly delivers new features in their cloud, and powerful computers are good examples. M7g performance seemed really outstanding a few months ago, but now it is surpassed by newer m7a and m7i instance types.  Performance of m7a was well above our expectations – looks like it is the new best for ClickHouse! 

Unfortunately, the availability of 7-th generation is still limited, but we expect it to be available in more regions soon. Please log a support case if you want to try them out. 

The absolute flexibility and agility of Altinity.Cloud makes it possible to run ClickHouse with any instance type in any AWS, GCP, or Microsoft Azure region, and, with Altinity.Cloud Anywhere, in any customer environment as well. Start a free trial if you would like to try it out!

Share