Slash CI/CD Bills (Part 1): Using Hetzner Cloud GitHub Runners for ClickHouse Builds

Continuous integration and continuous delivery (CI/CD) pipelines are critical to any software development project, and ClickHouse is no exception. Like many other great open-source projects, ClickHouse development occurs on GitHub, and it uses GitHub Actions to implement its CI/CD pipeline. The CI/CD pipeline is configured to run on each new pull request (PR) and each new commit inside a PR. The pipeline consists of many workflows and takes significant computing resources to complete. The costs of each CI/CD run quickly add up, and we’ve faced the challenge of controlling those costs, a challenge that many of you might be familiar with.

As mentioned in previous blogs, we do our own builds of ClickHouse, Altinity Stable Builds. We constantly execute our modified version of the ClickHouse CI/CD pipeline in our fork. If you need to become more familiar with Altinity Stable Builds, check out our 2023 blog article, Why Every ClickHouse User Should Appreciate Altinity Stable Builds, which provides a nice introduction. As part of porting the ClickHouse CI/CD pipeline to our fork, we have faced many challenges and explored different approaches on which servers the actual runs should be performed. In the end, we were successful in using my TestFlows GitHub Hetzner Runners project that was originally developed for personal needs. This solution quickly matured from providing runners to execute TestFlows test programs to running our heavy-weight ClickHouse pipeline and being featured on the Awesome Hetzner Cloud GitHub repository as one of the awesome projects that you can use with your Hetzner Cloud account.

So if controlling CI/CD bills is essential for you or you’re interested in exploring non-AWS-oriented runner solutions for your projects hosted on GitHub, we invite you to read on and see how using Hetzner Cloud GitHub runners for ClickHouse helped us control our ClickHouse CI/CD bills. It might be a good fit for you, too! In the second follow-up article, we will show you how you can set this up for your own GitHub repositories so you can try it out for yourself.

Why do we need self-hosted runners?

GitHub provides free CI/CD minutes and storage that work for simple open-source projects. You can find the exact limits on the About billing for GitHub Actions page. The current limits are as follows:

But if you run out of those free minutes, then you will have to look at the Per-minute rates. The current rates for the first three cheapest Linux instances are:

Taking the first Linux instance with just 2 vCPUs, the price is $0.008 per minute, which is $0.48 per hour. A quick lookup of the current Hetzner Cloud cheapest instance shows that CCX13 with 2 vCPUs would cost you €0.02 per hour, which at the current exchange rate is about $0.022 cents. This is a whopping 21.8x (0.008 * 60 / 0.022) difference! If we quickly look at the AWS on-demand hourly rates, we can see that t4g.large, 2 vCPU with 8GB RAM, would cost us $0.0672 per hour. However, with AWS, you can use spot instances, which currently show the price of $0.0307 per hour for the same t4g.large instance. Nonetheless, AWS pricing is complicated, and spot instances come with quirks, such as spot instance interruption notices, which can mess with your CI/CD jobs.

Therefore, the first reason for using self-hosted runners is to minimize CI/CD costs. The more time your CI/CD pipeline takes and the bigger instances it requires, the more it will cost you for each run. The second reason is security. Do you want to build critical binaries on some unknown public GitHub Actions runner? Most likely, no! So, in this case, you must use self-hosted runners.

For ClickHouse CI/CD, both of the points above apply. The pipeline requires significant resources and time, and security concerns leave self-hosted runners as the only available option.

Problems with standard solutions

Once using self-hosted runners for your GitHub repository becomes inevitable, there are many options to choose from. First, you would read the About self-hosted runners page and most likely want to adopt an autoscaling solution. The Supported autoscaling solutions section now mentions just one: Actions Runner Controller which runs on Kubernetes. However, until recently, the official documentation recommended the Terraform module Self-Hosted Scalable GitHub Actions runners on AWS. If you google around or search on GitHub, you will find other projects you could try.

Out of the two, we have not tried the Kubernetes solution, and there were two reasons for it. The first reason was that while we love Kubernetes and our cloud and customers use it extensively, we did not want the complexity of managing Kubernetes outside of our production environments and our infrastructure team, as these runners would be primarily used and managed by the ClickHouse developers and QA engineers. The second reason is that the code appeared to be complex given that it implemented a full Kubernetes operator, which would not allow us to add easily any functionality we needed or fix issues in the code if any would arise.

However, we did try the Terraform module Self-Hosted Scalable GitHub Actions runners on AWS a try and got it working after some obstacles. Unfortunately, we did not stick with it for the following reasons. One being that in some cases we found that runners would not be created or be stolen by other jobs. Second being that managing this service was complex as it involved AWS permissions and services such as AWS Lambdas where we would hit its limits. Third, the code again was complex and it was hard to understand what it was actually doing. The fourth and the final reason that once during two days of heavy CI/CD runs we run up a $2k AWS bill even though the configuration indicated that we were using spot instances. But maybe we messed up the configuration and maybe we could have spent more time to figure things out, but why? Instead we picked a simpler and more attractive solution using Hetzner Cloud.

Why Hetzner Cloud?

Hetzner Cloud positions itself as a “Truly thrifty cloud hosting” platform. Indeed, we found it to be quite thrifty. We can see it from the rough cost analysis we already did in the “Why do we need self-hosted runners?” section above, where we had the following costs for a basic Linux, 2 vCPU instance.

NamePrice per hour
GitHub instances$0.48
Hetzner Cloud$0.022
AWS on-demand hourly rate$0.0672
AWS spot instance$0.0307

Hetzner Cloud wins, and AWS spot instances come close in second place. However, remember the spot instance interruption notices you could receive if you use one of those spot instances. While it might not be a big deal, it still needs to be considered, as Hetzner Cloud instances provide the same availability as AWS on-demand hourly rate machines.

Another big reason we went with Hetzner Cloud for our CI/CD pipeline is simple and predictable billing. Am I the only one who feels AWS always tries to nickel and dime you? While cost calculators can help, it was a challenge to feel we were in control of the bill.

This is not the case with Hetzner Cloud, where you are billed in hourly increments that include the price for the instance plus the IPv4 address, which is clearly shown on their price page. That CCX13 instance with 2 vCPUs priced at €0.02 per hour already includes the IPv4 address, provisioned with only the IPv6 address the same instance is slightly cheaper and costs only €0.0192 per hour. So, highly competitive prices and predictable billing were a win for the Hetzner Cloud.

However, the disadvantage of the Hetzner Cloud is the lack of the fantastic instance type selection that AWS Cloud provides. But, once we found the instance types that did the job for us, we did not find a need to look any further. Hetzner Cloud was a winner for us to run our ClickHouse CI/CD pipeline.

Resources needed for ClickHouse CI/CD

To demonstrate the resources needed to run our ClickHouse CI/CD pipeline, I will take a run for a recently updated 23.3.19.33.altinitystable release. We can look at the usage statistics showing how long each job took to execute in the run. Here is the summary:

StatisticsValue
Number of jobs156
Total duration5d 9h 38m

So the 23.3.19.33.altinitystable pipeline took 5 days, 9 hours 38 minutes, or 129 hours 45 minutes of compute time. That is a lot of compute time! Visualization of the pipeline is as follows:

Of course, we did not wait to complete the pipeline in five days. Instead, as seen from the graph above, the pipeline is heavily parallelized, with different jobs running in parallel on different runners simultaneously, giving us a total execution time of 5 hours 39 minutes. Using multiple runners for different jobs gives us an almost 26x speed up! Also, we must remember that the pipeline builds and tests both x86 and ARM64 binaries and requires both x86 and ARM runner instances.

Costs to run the pipeline

For the example run above, the pipeline used a mix of the following Hetzner Cloud instances:

NameTypePrice per hourDuration (h:m:s)Worst CaseBest Case
CPX51x86 shared, 16 vCPU, 32GB RAM€0.08882:04:00€12.232€7.222
CAX41ARM64 shared, 16 vCPU, 32GB RAM€0.039545:00:42€2.962€1.778
CCX53x86 dedicated, 32 vCPU, 128GB RAM€0.30852:29:10€2.159€0.767
CPX41x86 shared, 8 vCPU, 16GB RAM€0.04170:11:43€0.208€0.008

The total estimated cost for the run is as follows:

TypeCostAverage per hour
Worst€17.56€0.1353
Best€9.77€0.0753

The worst-case estimate is when runners are not being reused, and therefore, each job duration is rounded to the nearest hour given that servers are billed per hour. The best-case estimate is when servers are assumed to be fully reused, and you get per-minute billing. The average per-hour cost is obtained by dividing each estimated cost by the total run duration of 129.75 hours.

The real cost is somewhere between these two extremes, and our empirical observations show that it is closer to the best-case scenario, around €10 to €11 per run, given that we have runner server recycling enabled.

We find that the Hetzner Cloud CI/CD costs are competitive and highly cost-effective, considering the significant amount of resources required to execute our ClickHouse CI/CD pipeline. Using GitHub-provided runners would be cost-prohibitive, and AWS spot instances would be the next best thing, but spot instance prices constantly fluctuate, so keeping control of our bills would be more complex.

Conclusion

Keeping control of bills while running ever larger CI/CD pipelines is a challenge many companies face. Using Hetzner Cloud for our ClickHouse CI/CD pipeline has helped us handle the cost of these bills. The ClickHouse pipeline is vast and needs to be executed many times; therefore, minimizing costs per run is critical. The simplicity of setting up self-hosted runners using the TestFlows Hetzner GitHub Runners project was a massive win for us. If you are interested, stay tuned for the second part of this series, where we will show you how you can quickly set up self-hosted GitHub runners using Hetzner Cloud for your own GitHub repositories and see if using Hetzner Cloud for your self-hosted CI/CD runners could bring the same simplicity and cost benefits. In the end, getting control of your CI/CD bills makes your development and QA teams happy, as they do not have to worry about pipeline growth or the number of runs they can perform daily.

Share

Related: