Bright shone the lists, blue bent the skies,
And the knights still hurried amain
To the tournament under the ladies’ eyes,
Where the jousters were Heart and Brain.
― Sidney Lanier. The Tournament
AWS keeps adding new hardware to their cloud services. A few months back we were testing new ultra-fast Graviton3 ARM instances that showed outstanding results for ClickHouse. Since then AWS added the 7-th generation of Intel and AMD powered VMs, so we could not resist hailing them into the lists to do an honorable contest. Today, we challenge Intel m7i, AMD m7a, and Graviton m7g in the ClickHouse benchmark tournament! TLDR; we’ve got a new champion!
The Three Knights
Let me introduce the challengers today. I will use quotes from AWS website:
“Amazon EC2 M7a instances, powered by 4th generation AMD EPYC processors, deliver up to 50% higher performance compared to M6a instances. These instances support AVX-512, VNNI, and bfloat16, which enable support for more workloads, use Double Data Rate 5 (DDR5) memory to enable high-speed access to data in memory, and deliver 2.25x more memory bandwidth compared to M6a instances.”
“Amazon EC2 M7i instances are next-generation general purpose instances powered by custom 4th Generation Intel Xeon Scalable processors (code named Sapphire Rapids) and feature a 4:1 ratio of memory to vCPU. EC2 instances powered by these custom processors, available only on AWS, offer the best performance among comparable Intel processors in the cloud – up to 15% better performance than Intel processors utilized by other cloud providers.”
“Amazon EC2 M7g instances, powered by the latest generation AWS Graviton3 processors, provide the best price performance in Amazon EC2 for general purpose workloads. M7g instances are ideal for applications built on open-source software such as application servers, microservices, gaming servers, mid-size data stores, and caching fleets. They offer up to 25% better performance over the sixth-generation AWS Graviton2-based M6g instances. M7g instances feature Double Data Rate 5 (DDR5) memory, which provides 50% higher memory bandwidth compared to DDR4 memory to enable high-speed access to data in memory.”
The last entry was our previous winner. It performed better compared to 6-th generation m6i instances.
Altinity.Cloud infrastructure allows user to add new node types with a single user click, so I added a new m7i.xlarge and m7a.xlarge node types:
After 5 minutes, the new node types were available to start a cluster. Note: adding node types dynamically is currently available only for Altinity.Cloud Anywhere users; please contact Altinity support if you need extra node types in your Altinity.Cloud account.
In order to give all three node types even chances, I started 3 equally sized single node ClickHouse instances: 4 vCPUs, 16GB of RAM, and also 200GB of gp3 EBS storage.
For benchmarks, we always use the latest available ClickHouse version, which is 18.104.22.168 at the time of writing.
Once started, the 3 instances look great next to each other on my screen, only missing coats of arms on their shields.
The Field of Battle
One of the standard benchmarks we are using in order to test ClickHouse and hardware performance is the 600M row Star Schema Benchmark (SSB). We will use the same procedure as described in “Ultra-fast Data Loading and Testing in Altinity.Cloud” and “Altinity.Cloud Extends Managed Service to ARM” articles.
For every instance, I created a ‘lineorder_wide’ table using this script. Data was loaded from an S3 bucket. There is one small difference compared to previous tests: in the latest ClickHouse versions,
NOSIGN parameter needs to be added in order to access public S3 buckets:
INSERT INTO lineorder_wide SELECT * FROM s3('https://s3.us-east-1.amazonaws.com/altinity-clickhouse-data/ssb/data/lineorder_wide_bin/*.bin.zstd', NOSIGN, 'Native') SETTINGS max_threads=4, max_insert_threads=4;
It is worth mentioning that
max_threads=4 was the optimal setting for 4 vCPU instances. I tried higher and lower values but those resulted in slower loading times. With 4 threads the loading took 520-530s for every instance.
I have also run
OPTIMIZE FINAL to rule out merges.
Once the data is loaded, we can run test queries. We use the same benchmark script as in the previous articles. It performs one warmup run for every query, then takes 3 test runs, and extracts the average query time from the
query_log table using query SQL comments technique in order to exclude network latency. The test run command for ‘m7a’ cluster is the following:
$ TRIES=3 CH_CLIENT=clickhouse-client CH_HOST=m7a.us-east1.dev.altinity.cloud CH_USER=admin CH_PASS=*** CH_DB=default QUERIES_DIR=flattened/queries ./bench.sh flattened/queries: 91d01b043e3e345d65172dbee7de9107 - Q1.1.sql ... 0.357 Q1.2.sql ... 0.034 Q1.3.sql ... 0.054 Q2.1.sql ... 0.349 Q2.2.sql ... 0.187 Q2.3.sql ... 0.16 Q3.1.sql ... 0.268 Q3.2.sql ... 0.208 Q3.3.sql ... 0.257 Q3.4.sql ... 0.007 Q4.1.sql ... 0.138 Q4.2.sql ... 0.052 Q4.3.sql ... 0.051 Total : 2.124
It has been similarly executed for m7i and m7g. The results are summarized in this chart:
Total query time for all queries (sec):
|■ 23.8 m7a.xlarge||■ 23.8 m7i.xlarge||■ 23.8 m7g.xlarge|
As you can see the AMD m7a wins the tournament hands down! Intel m7i is the second, though it is more than 50% improvement over the m6i series. Graviton m7g, which was the winner over m6 instances, is trailing far behind. Technology progress in CPU speed is remarkable.
It is interesting to learn why AMD is performing so well for ClickHouse workloads. We’ve seen similar results in GCP as well.
For comparison, I’ve re-run tests on 6-th family of instances, here are the total test times:
|■ 23.8 m6a.xlarge||■ 23.8 m6i.xlarge||■ 23.8 m6g.xlarge|
Note that the full experiment from starting ClickHouse to finishing the tests took me about 30 minutes. The speed of testing different hardware configurations in Altinity.Cloud is unbeatable.
It is easy to get stellar performance in the cloud nowadays, but it sometimes may cost you a fortune. For easier comparison let’s use normalized values, taking the minimum time and cost as 1, and adjusting others proportionally:
|■ 23.8 m7a.xlarge||■ 23.8 m7i.xlarge||■ 23.8 m7g.xlarge|
|SSB Benchmark time||2.124||2.975||3.773|
|On demand cost per hour in us-east1||0.2318||0.2016||0.1632|
|Score (normalized time x cost)||1||1.22||1.25|
As you can see, the m7a is not only the fastest, but also the most expensive one. The price difference is compensated well by performance, so at the end of the day the price-performance score is the best. The last row represents our score – lower is better. The m7g is very cheap, but is also the slowest, so the price-performance is much worse. The m7i is a bit better than m7g.
Conclusion of the Tournament
AWS constantly delivers new features in their cloud, and powerful computers are good examples. M7g performance seemed really outstanding a few months ago, but now it is surpassed by newer m7a and m7i instance types. Performance of m7a was well above our expectations – looks like it is the new best for ClickHouse!
Unfortunately, the availability of 7-th generation is still limited, but we expect it to be available in more regions soon. Please log a support case if you want to try them out.
The absolute flexibility and agility of Altinity.Cloud makes it possible to run ClickHouse with any instance type in any AWS, GCP, or Microsoft Azure region, and, with Altinity.Cloud Anywhere, in any customer environment as well. Start a free trial if you would like to try it out!