Building ClickHouse® and Making Your First Contribution: A Tutorial

Recorded: October 6, 2021
Presenter: Robert Hodges, CEO, and Vasily Nemkov, Software Engineering Team Lead @Altinity
In this hands-on tutorial, Altinity lead server engineer Vasily Nemkov walks through the complete process of contributing code to ClickHouse from scratch. Vasily has merged approximately 70 pull requests into ClickHouse, with notable features including IPv4/IPv6 data types, DoubleDelta and Gorilla codecs, the DateTime64 data type, and the system.session_log table.
The session covers four main areas. First, the motivation for contributing: ClickHouse is the fastest-growing open-source project by external commits, the code is unusually clean and well-organized C++, and contributors are recognized in the system.contributors table and can receive a personal badge from ClickHouse creator Alexey Milovidov. Second, the mechanics of adding a new function, demonstrated by implementing a classic helloWorld() function: creating a CPP file in the src/Functions/ directory, implementing the IFunction interface, returning a constant string column, and registering the function in registerFunctionHelloWorld.cpp. Third, two build methods — Docker-based and on-host — with detailed discussion of prerequisites (CMake, Ninja, Clang 11, 60+ GB RAM recommended), build times (30–120 minutes), and the resulting package structure. Fourth, the full CI and review workflow: running unit tests (~40,000 cases), stateless SQL-based tests (~3,500 cases), writing a reference file for a new test, submitting to GitHub from a fork, signing the CLA, and navigating the ~50 CI checks that run after a maintainer approves the PR for CI.
The Q&A covers submodule changes, distributed compilation, running style checks locally, release version targeting, and backport criteria.
If you’d like to receive the PowerPoint presentation, please contact us at marketing@altinity.com.
Key Moments (Timestamps)
Key moments generated with AI assistance.
- 0:03 – Introduction: Robert Hodges hosts, Vasily Nemkov presents
- 1:34 – Vasily introduction: Altinity engineer, ~70 merged PRs, key features contributed
- 2:49 – What is ClickHouse? OLAP database, open source, Fortune 500 use cases
- 4:10 – Why contribute: fastest-growing open source project, clean C++ codebase
- 5:49 – Recognition: system.contributors table, contributor badge from Alexey Milovidov
- 6:55 – Four ways to contribute: benchmarks, docs, tests, code
- 8:27 – The development process overview: clone, change, build, test, push
- 8:55 – Cloning the repository and updating submodules (~80+, takes 10–15 min)
- 9:25 – Making changes: the complexity hierarchy from easy (functions) to hard
- 9:54 – Adding a helloWorld() function: the “venerable tradition” demo
- 10:05 – Where functions live: src/Functions/ directory
- 10:35 – Minimal function implementation: IFunction interface, getReturnType, executeImpl
- 11:41 – Registering the function in registerFunctions.cpp and FunctionFactory
- 12:26 – Two ways to build: Docker-based vs. on-host
- 12:33 – Docker build: convenient for packages, works on any Docker-capable OS
- 13:00 – On-host build: faster, better for development, requires Linux/Ubuntu
- 13:48 – Hardware requirements: 60 GB RAM recommended, 32 GB minimum, multi-core
- 14:17 – Disk space requirements: ~20 GB for on-host binary builds
- 15:31 – Docker build command walkthrough: output directory, source directory, compiler selection
- 17:48 – Build duration: 35 minutes (fast AWS) to 90–120 minutes on typical hardware
- 18:51 – Docker build output: deb/rpm/tgz packages, clickhouse-common-static, debug info
- 21:07 – On-host build prerequisites: CMake, Ninja, Clang, tzdata
- 21:43 – Out-of-source build directories for multiple build flavors
- 22:18 – CMake configuration and Ninja build commands
- 23:32 – Smoke testing: running the server, connecting the client, SELECT version()
- 26:06 – Types of tests: unit tests, stateless tests, integration tests, performance tests
- 27:38 – Unit tests: C++/Google Test, ~40,000 cases, ~4 minutes to run
- 27:50 – Stateless tests: SQL/bash/Python end-to-end, ~3,500 cases, ~45 minutes
- 28:35 – Integration tests: distributed queries, MySQL/PostgreSQL connectors
- 28:56 – Adding a new stateless test: SQL file + reference file
- 30:55 – Running tests with clickhouse-test: path to binary, filter by test name
- 32:01 – Creating a branch, committing, and pushing to a fork
- 33:09 – Creating the pull request: PR description template, changelog category
- 34:42 – Signing the CLA: GitHub bot check, one-time process
- 35:25 – CI checks: ~50 jobs — style, fast test, various builds, stateless/integration tests
- 37:44 – How to read CI check pages: build logs, test logs, debugging failures
- 39:50 – Addressing maintainer review comments, getting approval, merging
- 40:22 – Merged PR: purple tag, node on timeline, and release timing
- 40:53 – Release process: monthly releases, LTS releases (twice yearly), backports
- 41:57 – Q&A begins
- 42:00 – Q&A: submodule changes and local modifications
- 43:44 – Q&A: building with distcc (not officially supported)
- 44:44 – Q&A: running style checks locally before submitting
- 45:32 – Q&A: bonus materials — how to run unit, stateless, integration, performance tests
- 46:04 – Q&A: Which release will a merged fix appear in?
- 48:00 – Robert on monthly vs. LTS releases and Altinity Stable Build support windows
- 49:00 – Q&A: Who handles backports, and what is the criterion?
- 50:18 – Q&A: testing frameworks in ClickHouse (TestFlows, Google Test, custom harnesses)
- 51:21 – Q&A: future topic suggestion — tracing a SELECT through the codebase
- 53:30 – Q&A: resources for learning ClickHouse code architecture
- 55:02 – Robert: hiring announcement and OSiCon conference announcement
- 56:51 – Closing remarks
Webinar Transcript
[0:03] — Introduction
Robert: Welcome to “Building ClickHouse and Making Your First Contribution: A Tutorial.” My name is Robert Hodges. I’m the CEO of Altinity. I’ll be your host today. The actual content is going to be delivered by my colleague Vasily Nemkov, who is a lead server engineer and has implemented many features in ClickHouse. If you’re a ClickHouse user, you’ve probably used a few of them already.
Before I turn it over to Vasily, a couple of housekeeping notes. This is being recorded. We will publish the recording along with a link to the slides to everyone who signed up after the webinar, within 24 hours. We have plenty of time for questions: post them in the Q&A box, and I’ll be monitoring it while Vasily is talking. If something is relevant to what we’re covering at the time, we can answer it right then. You can also type things into the chat box. With that, Vasily, it’s all yours.
[1:34] — Speaker Introduction: Vasily Nemkov
Vasily: Hi, Robert, thank you. My name is Vasily Nemkov. I’ve been working on ClickHouse for about three years and have merged about 70 pull requests. The probably most noticeable features are IPv4 and IPv6 data types, the DoubleDelta and Gorilla codecs, the DateTime64 data type with extended range for DateTime, the system.session_log table, and many other minor fixes and features. I work at Altinity, the number-one enterprise ClickHouse provider. We provide extraordinary support for our users, implement custom features and custom bug fixes for our customers, and we also recently opened the Altinity Cloud, which is ClickHouse in the cloud. We’re also a major committer and contributor to the ClickHouse repository in the US and EU.
[2:49] — What Is ClickHouse?
Vasily: So what is ClickHouse? It’s an astonishing piece of software. First of all, it’s an OLAP database, which means it really excels at managing a lot of data and providing valuable insights. If you’ve ever seen Google Analytics or Yandex Metrica, it runs mostly on something like ClickHouse. It’s able to go through large amounts of data and provide aggregates, for example how many users of a given country were visiting your website. It’s widely used by Fortune 500 companies for log processing, time series, financial transaction management like stock market analysis, and much more. It’s also open-source software, which makes it even greater.
It’s the fastest-growing open-source project right now. If I recall correctly, in 2021 it surpassed Elasticsearch in the number of committers and number of commits from outside contributors. It’s written in highly optimized modern C++. It’s about half a million lines of code if you exclude all the third-party libraries. It utilizes about 80 to 84 submodules and components. It has lots of tests and provides very useful, detailed documentation which is also in the repository and which you can contribute to and verify.
[4:10] — Why Contribute?
Vasily: Why should you contribute? Well, probably the fastest way to implement a new feature or fix a specific bug. ClickHouse is well-tested and most common bugs and features are already implemented, but if you have some corner case that isn’t covered you can figure it out yourself and provide useful feedback and a useful fix to the community.
Since it’s very well written, you will learn a lot in the process. It’s really clean code, not much smelly code. It’s a joy to work with.
You will also gain some fame. ClickHouse has a specific system table called system.contributors. If you have any merged PR in the ClickHouse repository, you can find your name in that table. Also, if you are able to merge anything into the ClickHouse master you have an opportunity to receive a ClickHouse contributor badge from Alexey Milovidov himself, the originator of ClickHouse. And ClickHouse participated in the Arctic Code Vault program, so if you merged anything before the next snapshot to that program you’ll receive a really cool badge.
[6:55] — Four Ways to Contribute
Vasily: There are multiple ways to contribute. First, you can run a benchmark on your hardware. There’s a specific webpage that shows all the hardware used to run a defined set of benchmarks on ClickHouse, which allows you to compare how different hardware competes against each other. Second, you can update docs. Documentation is the most outdated thing in any software project. You can find a typo, find some missing piece, add it to the documentation, create a PR, and get yourself a contributor page. Third, you could add some tests. If you have a specific corner case you’d love to ensure never breaks, especially if your critical system depends on it, you can contribute the test to ClickHouse and make sure it always runs in CI/CD. Fourth, and probably the most useful, you can contribute code. We’re going to walk through contributing code today.
The development process is pretty straightforward: you write code, prepare a review, and it gets released. Breaking it down: you clone the repository, make your changes, build ClickHouse, run the tests, add some tests, and push your work to GitHub.
[8:55] — Cloning the Repository
Vasily: Cloning is pretty easy and standard. You clone the official ClickHouse repository and update all the submodules. As I mentioned there are about 80-plus submodules, so it will take about 10 to 15 minutes depending on how close you are to the GitHub CDN.
[9:25] — Making Changes
Vasily: Making changes is a bit harder. According to Alexander, there’s a sort of complexity hierarchy. Things on the left side are relatively easy to add; things on the right are relatively hard. We’re going to honor the venerable tradition of my people and start with adding a helloWorld() function: a function that prints the simple string “hello world.”
There are multiple guides on how you should write simple C++ code in ClickHouse, and you can check those out later from the links in the slides. Since I already navigate the ClickHouse source pretty well, I know that all the functions go to the src/Functions/ directory. We add a new CPP file to that directory.
Here’s the minimal piece of code you can add in order to have a function implemented in ClickHouse. The function is a class that implements a specific interface, with a lot of boilerplate stuff to get it up and running. The two most important parts are the getReturnType method, which tells ClickHouse what the return type of the function is going to be, and the executeImpl method, which actually processes all the inputs and produces the outputs. In this case the function takes zero arguments and only produces a column of a constant string: “hello world.”
You also need a register function that registers your function in the FunctionFactory. The FunctionFactory is basically a singleton that dispatches and allows the ClickHouse code to access the function instance you want to run. You forward-declare your register function and then in the main registerFunctions.cpp file you execute it against the given factory instance. That’s pretty easy.
[12:26] — Building ClickHouse
Vasily: What’s going to be a little bit harder is building it. There are two ways.
The first is the built-in Docker build, which is really convenient if you want to produce .deb, .rpm, and .tgz packages. It’s easy to set up and requires only Docker. It can run on almost any OS that supports Docker. However, it’s somewhat slower than the native build.
The second, more usual way is to build on the host directly. This is most suitable for the development process. It can be done on any Linux-based distribution, and Ubuntu or Ubuntu-based distributions work fine. You need some development packages pre-installed, and it’s about two times faster on my machine than building with Docker, because in Docker you’re not only building the ClickHouse binary itself, you’re also building all the packages.
Since ClickHouse is a pretty big project with a lot of templates and heavily optimized C++, you need a pretty decent machine. It’s recommended to have at least 60 GB of RAM. 32 GB is much better, and you should have multiple cores. More is always better. On my laptop with six cores it’s okay, but the better hardware you have the less time it will take to build. For disk space, if you only build the binaries on the host, it takes about 20 GB total: about 5 GB for the ClickHouse source code with all the dependencies and about 10 to 15 GB of intermediary files produced by the compiler. If you plan to build packages you’ll need even more space.
[15:31] — Docker Build Command
Vasily: Here’s an example of the Docker build command. A few important things to notice. The output directory is where all your packages are going to end up. I’m running this from the root of the ClickHouse source directory and I have a special directory for the packages. That’s mounted to the output directory in the Docker machine. You also need to mount the source directory, which is the root of the ClickHouse source, to the build directory inside Docker. You need to remember to not only clone the repository but to update and fetch all the submodules.
Next, you select the compiler. As of the time of recording, the most recommended compiler is Clang 11. ClickHouse can also be built with GCC 9 and Clang 12, but that’s not the officially recommended way. You can optionally produce RPM and tgz packages, or remove those flags if you don’t want them. Then you run that on the specific Docker image provided by the ClickHouse builder, using the latest image from the ClickHouse Docker Hub.
Build time: 30 to 90 to possibly 120 minutes depending on your hardware. On my machine it builds in about an hour. The best time I’ve seen on a beefy AWS machine was 35 minutes.
The result is a lot of files in your package directory: .deb packages for multiple components. The most important one is clickhouse-common-static, which contains the ClickHouse binary itself. The clickhouse-client and clickhouse-server packages depend on this one. They’re pretty slim packages containing mostly configs. The biggest package is the one with debug info, which can be several hundred megabytes and is useful if you want to debug your installation. There’s also clickhouse-test, which contains the stateless tests. All of this is also available as RPM packages and tgz packages.
[21:07] — On-Host Build
Vasily: The on-host build requires several packages pre-installed: a compiler, CMake, the LLVM-based lld linker (which makes linking at least twice as fast on my machine), Ninja for fast parallel builds, and tzdata for timezone information for DateTime and DateTime64 classes. Install those on your Ubuntu-based system and you’re ready to go.
I prefer to have a build directory out-of-source because I tend to have multiple build directories: a release flavor, a debug flavor, a flavor with Address Sanitizer on, a flavor with Memory Sanitizer on, or with a specific set of features disabled. All built from the same sources. For today we’ll have a single default build directory.
The build process: make the directory, descend into it, select the compiler, run CMake against the sources pointing to the ClickHouse source directory, and then simply type ninja clickhouse. It takes from 30 minutes to about 120 minutes depending on your machine. The result is in the programs/ directory inside your build folder.
An important note about ClickHouse: the single binary can be used both as a server and as a client. It chooses which mode to run in depending on which arguments you provide.
[23:32] — Smoke Testing the Build
Vasily: To verify whether your build was successful, the simplest way is to smoke test it. To run ClickHouse as a server, you need a set of configs. Luckily ClickHouse itself includes a predefined set of configs sufficient to run on your local machine. Go to the programs/server/ directory inside the source tree and start the ClickHouse binary in server mode from that directory. Then open another terminal and execute the ClickHouse binary in client mode from any directory.
You’ll see the client connect to the server on localhost using the native protocol, reporting the server version and producing some warnings. Since we built in debug mode, there’ll be a warning that the server will not perform at full speed because it was built with debug info. That’s expected.
Let’s type a simple query: SELECT version(). That sends the query from the client to the server, the server executes the version function, and returns the exact server version string. We’ve successfully smoke-tested whether the build works. But for a database system, that’s not nearly enough.
[26:06] — Types of Tests
Vasily: ClickHouse has a plethora of tests. The most important ones for today are unit tests and stateless integration tests.
Unit tests are written in C++ using the Google Test framework. There are about 40,000 test cases and they take about 4 minutes to run.
Stateless tests are SQL, bash, or Python-based tests that test ClickHouse end to end, sending a request from the client or from a MySQL or HTTP client and verifying what the server actually returns. There are about 3,500 of those tests and they take about 45 minutes to run.
Integration tests verify ClickHouse against more complex scenarios: distributed queries, interconnection with MySQL, PostgreSQL, or other systems. And there are performance tests, which verify that ClickHouse works at or above expected performance levels against predefined prior-version baselines.
For information on how to run all of these, the bonus materials in the slides provide exact commands and details on running integration tests as well.
[28:56] — Adding a New Stateless Test
Vasily: Since we added new functionality, we want to add a new test that covers it. The simplest way is to add a simple SQL-based stateless test. To do that we need to add two files to the ClickHouse source directory.
The test itself goes to tests/queries/0_stateless/ with a number and name and the .sql extension. It contains a single SQL statement: SELECT helloWorld();. Reference SQL tests can have any number of statements, some really complex, some as simple as this.
The other required file is a reference file with the .reference extension. This contains exactly what the client should print upon executing the command. In our case: Hello, World!.
[30:55] — Running Stateless Tests
Vasily: Please note that running stateless tests requires a running server. You can run it the way I showed before, or if you’ve built and installed a .deb package you can start it that way. What matters is that you have a server up and running and clients can connect to it.
To run your tests, go to the tests/ directory and use the special test driver program clickhouse-test. The most important arguments are the path to the ClickHouse binary, which will be used as the client, and the set of tests you want to execute. If you omit the test filter, all stateless tests in the directory will run, which takes a long time. To run only our new test, pass the test name as a filter.
It takes a couple of seconds for the test harness to boot up, process the test, connect to the server, and verify the result. As you can see, the output of the client generated by feeding the SQL file contents matched the reference file, so the test has passed. In debug mode this took about two seconds; in release mode it would be much faster.
[32:01] — Committing and Pushing to a Fork
Vasily: Now it’s time to emit and push your work to the repository. The way you do it is the same as any other project. Create a new branch for the feature, add all the files you’ve added and modified, commit your code, and push to your fork. The reason you push to a fork is that you most likely don’t have write access to the ClickHouse repository. You need to create a pull request from your fork.
Once you execute git push, GitHub gives you a nice URL you can follow to create the pull request.
[33:09] — The Pull Request and Review Process
Vasily: Here’s a pull request I already pushed beforehand. The first thing you’ll see when you open a pull request is a template for the PR description and changelog. You have to edit this to describe what’s actually going on, what you’re modifying, and what benefit it brings to ClickHouse. You also need to pick the appropriate category for your PR. For our simple function: the category is New Feature and the changelog entry is simple because the functionality is simple.
[34:42] — Signing the CLA
Vasily: Once you’ve set the description and created the pull request, the next thing to do is sign the CLA (Contributor License Agreement). There’s a GitHub check that verifies any user contributing to ClickHouse has signed the CLA. If you haven’t signed yet, you’ll see a comment from a bot with a button to follow to the CLA page and sign it. It’s pretty simple. Once signed, the check is green.
[35:25] — CI Checks
Vasily: After someone from the ClickHouse team reviews the PR and confirms it’s safe to run in CI, they put a special tag that triggers the CI bot to run tests against your pull request. This runs roughly 50 checks, which can be grouped as:
Style checks, which verify your code follows the ClickHouse style guide. Fast test, which is a simple subset of ClickHouse built and executed against a small subset of stateless tests. Various forms of builds against different compilers and option sets. Stateless tests run against various builds: debug, release, with Address Sanitizer, Memory Sanitizer, and so on. And integration tests.
You can check out more information about the types of checks in the reference documentation in the slides.
Once the tag is applied, checks execute gradually. Not all 50 jobs run simultaneously. First the fast test runs, then others as resources become available. If all checks are green, you’re in good shape. If some fail, you can check the details by clicking the link on the right side of the check to see exactly what went wrong. Each check page has a link to the PR, the list of executed checks, and links to build logs, test logs, and all the information you need to debug and fix failures.
[39:50] — Getting Reviewed and Merged
Vasily: Once you have all the checks green, you should address all the change requests from the ClickHouse maintainers: adding more tests, doing something differently, whatever the nature of your PR requires. Once you have the maintainer’s approval, someone from the ClickHouse team will actually merge your pull request.
Once merged, it has a nice purple tag. Your PR was merged, and you can pat yourself on the back. But once your pull request is merged to master it’s not going to be released just yet. You have to wait for the next upcoming release. Or if it was a valuable fix that needed to be back-ported, it’ll be included in a patch release as well.
[40:53] — Monthly Releases and LTS Releases
Robert: I want to add something about the release process. There’s a distinction between monthly releases and long-term support releases. LTS releases happen twice a year, roughly in March and August. We make an effort not to have too much destabilizing content in LTS releases, and then we swarm them and test them heavily. Bug fixes will get back-ported into LTS releases for up to a year from the community. At Altinity we support them longer through Altinity Stable Builds for ClickHouse: production-certified builds with up to three years of support, where we go beyond the community LTS window to continue back-porting critical fixes.
[41:57] — Q&A: Submodule Changes
Robert: Ian Watts asks: if we’re making changes in submodules, is there an easy way to redirect the build to use a local copy?
Vasily: Submodules in ClickHouse are a fragile thing, but if you want to make local changes and not push them upstream, you can modify your local version of the submodule and the local build system, whether on host or in Docker, will use your local folder. If you want those changes to go into ClickHouse officially you’d need to push a pull request to the appropriate submodule repository and then update the reference in the ClickHouse repository.
[43:44] — Q&A: Building with distcc
Robert: Boris asks: is it possible to build with a distributed compiler like distcc?
Vasily: I’m not aware of anyone building ClickHouse with distcc. It’s probably possible, but since it’s not an officially supported build path it’s not verified in CI. Most likely it might work out of the box, but you’d probably face some issues, especially on exotic architectures. And since it’s not in CI, any issues that arise from a ClickHouse update might reappear. It’s not a recommended path.
[44:44] — Q&A: Running Style Checks Locally
Robert: Matias asks: is it possible to run some checks offline before creating the pull request, for example running the style check beforehand?
Vasily: For the style check there’s a utility directory in the ClickHouse sources, I believe utils/check-style/, with a check-style script you can run against your local changes. It’s somewhat noisy because it also checks some third-party code, but it’s workable.
And in the bonus materials in the slides, which you’ll receive afterward, there are instructions for how to run unit tests, stateless tests, integration tests, and performance tests all locally against your own builds.
[46:04] — Q&A: Which Release Will a Fix Appear In?
Robert: Ian asks: if we have a fix and want to know which release it’ll be in, is there a way to see which targeted branch it’s planned for?
Vasily: This can be tricky. Bug fixes get back-ported to currently supported LTS versions. As of October 2021, the most recent LTS is 21.8, the previous one is 21.3. Any fix merged to master will definitely go into 21.8 and probably into 21.3 as well. The next monthly release will also include it. The process of branching for a release happens about a month ahead of the release date, so if you want to know exactly which version, your best option is to ask the ClickHouse team directly. They’re eager to answer questions like that.
Robert: And as I mentioned, at Altinity we actually support Altinity Stable Builds longer than the community window. So if something’s back-ported into the LTS branch, Altinity will continue to ship it in Altinity Stable for up to three years after the original LTS release.
[49:00] — Q&A: Backport Criteria
Vasily: The question is: who backports a fix, and what’s the criterion for determining whether something needs to be back-ported?
The criteria are somewhat arbitrary. Most bug fixes that are easy to back-port are done automatically. If the thing is critical, it’s done by the maintainers. If you absolutely want something back-ported and the maintainers don’t think it’s critical, you can do the back-port yourself and ask them to include it in the designated release.
[50:18] — Q&A: Testing Frameworks in ClickHouse
Vasily: Raj Kumar asks: is TestFlows or Google Test used for testing ClickHouse?
Both are used, along with a lot of other testing systems. Google Test powers the C++ unit tests. ClickHouse also utilizes its own custom test harness for stateless tests, performance checks, stateful tests, and integration tests. TestFlows is used as well. TestFlows is something we developed at Altinity.
Robert: TestFlows gives very accurate diagnostics about what functionality you’re testing. It’s an effective technique for testing interfaces, but there’s really a plethora of test frameworks in ClickHouse.
[51:21] — Q&A: Future Webinar Topic — Tracing a SELECT Through the Codebase
Robert: There’s a suggestion here that it would be absolutely amazing to have an advanced session that traces a SELECT query through the codebase, from the client via the query processor through the query pipeline and the transformation out to results.
Vasily: Yes, this is exactly a topic for a future webinar. Robert and I were discussing it and if you really like it we can probably go ahead and make it so.
[53:30] — Q&A: Resources for Learning the ClickHouse Codebase
Robert: Are there other things you could point to that would help people learn about ClickHouse’s code architecture?
Vasily: The first resource is the great talks by Alexander Milovidov and other Altinity team members, which you can find on the ClickHouse YouTube channel. Unfortunately most of those are in Russian, but there are some English talks as well. Another really valuable thing is to look at how other people implement things. If you want to add a feature, go find whether something similar already exists in ClickHouse, dig up the pull request that added it, and mimic the approach as closely as possible.
Robert: Alexei did an amazing talk on C++ performance about a year and a half ago. Since ClickHouse is so focused on performance, looking at his talks where you can find them in English is very very helpful. A lot of the time when PRs get rejected or sent back for more work, it has to do with performance. That’s a pretty strong criterion for reviewing and accepting code.
Also, building a better ClickHouse is something we at Altinity have spent years on. We encourage people who want to understand the contribution process to look at the history of Altinity’s contributions to ClickHouse as well.
[55:02] — Hiring and Conference Announcement
Robert: I’m not a C++ coder; my background is much more Java and Python. But the C++ code inside ClickHouse is the most readable C++ I’ve ever seen, and that’s saying a lot because it’s not a language known for readability. It’s really fun to work on. We are hiring. If this is something you’re deeply interested in and you want to learn how to work on the best analytic database on the planet, give us a call.
We’re also organizing a conference called OSiCon. You can check it out on the Altinity website. We just opened signups. It covers not just ClickHouse but all major low-latency databases. We’ll have presenters from Pinot, Druid, Imply, and others, along with discussions of visualization. We’d love to have you join.
Vasily: Thank you, and all the slides including the bonus materials with exact commands for running every type of test will be distributed after the webinar.
Robert: Watch your emails. Contact us at info@altinity.com if you have further questions. Good luck with ClickHouse.
FAQ
How long does it take to build ClickHouse from source?
Build time varies significantly with hardware. Using the Docker-based build, expect 35 minutes on a powerful AWS instance and up to 90 to 120 minutes on typical developer hardware. Building on-host with Ninja and Clang is roughly twice as fast as Docker but has more setup requirements. Hardware requirements are substantial: at least 32 GB of RAM is recommended and 60 GB is better, along with multiple CPU cores. Disk usage runs about 20 GB for a binary-only on-host build.
What is the simplest type of contribution a ClickHouse newcomer can make?
Documentation fixes and new tests are the simplest entry points. Finding a typo or a missing section in the ClickHouse documentation, adding it via a pull request, and signing the CLA is enough to appear in system.contributors and potentially receive a contributor badge. Adding a stateless SQL test for a corner case you care about is also straightforward: you create one .sql file with the query and one .reference file with the expected output, then run clickhouse-test to verify it passes before submitting.
What are the different test types in ClickHouse and how long do they take to run?
Unit tests are written in C++ using Google Test, with about 40,000 test cases taking roughly 4 minutes. Stateless tests are SQL, bash, or Python files that test ClickHouse end-to-end by sending queries to a live server and comparing output against reference files; there are about 3,500 of them taking around 45 minutes. Integration tests verify more complex scenarios involving distributed queries, Kafka, MySQL, PostgreSQL, and other external systems. Performance tests compare query speed against predefined baselines from prior versions.
What happens after I push a pull request to the ClickHouse repository?
First you sign the Contributor License Agreement, which is a one-time process handled by a GitHub bot. A ClickHouse maintainer then reviews the PR to confirm it’s safe for CI and adds a tag that triggers the automated CI pipeline, which runs approximately 50 checks covering style verification, a fast build and small test subset, full builds across multiple compiler and sanitizer configurations, stateless test suites, and integration tests. If checks fail, you can inspect the log for each check and reproduce failures locally. Once all checks are green and a maintainer approves the PR, the ClickHouse team merges it.
What is the difference between a monthly ClickHouse release and an LTS release, and how does Altinity Stable relate to them?
Monthly releases include all merged work and have about one month of community support. LTS releases happen twice a year, roughly in March and August, and receive up to a year of community support with bug fixes back-ported into the branch. Altinity Stable Builds for ClickHouse are based on LTS releases but go further: they are production-certified with up to three years of maintenance, include additional back-ported bug fixes and security patches beyond the community window, and are 100 percent open source and compatible with official ClickHouse builds.
© 2021 Altinity, Inc. All rights reserved. Altinity®, Altinity.Cloud®, and Altinity Stable® are registered trademarks of Altinity, Inc. ClickHouse® is a registered trademark of ClickHouse, Inc. Altinity is not affiliated with or associated with ClickHouse, Inc. Kubernetes, MySQL, and PostgreSQL are trademarks and property of their respective owners.
ClickHouse® is a registered trademark of ClickHouse, Inc.; Altinity is not affiliated with or associated with ClickHouse, Inc.