The Altinity Way to Building Enterprise QA Process in ClickHouse – Part 1

Enterprise QA Process in ClickHouse

ClickHouse is a fast column-oriented open source OLAP database with a promise to be blazing fast, linearly scalable, feature-rich, hardware efficient, fault-tolerant, and highly reliable. To compete against cloud databases like Snowflake and Redshift, we need to meet the high bar for enterprise quality. Analytic systems are the eyes that see and react to the vast amount of data from the real world. To be part of that world ClickHouse must simply work.

This article is the first in a series to introduce methods and tools we use in ClickHouse QA. We will start by discussing how we work with requirements and then move-on to test implementation and reporting in the follow-on articles. We will introduce notions of requirements-as-code, self-documenting tests, and requirements-focused reporting that fits into modern CI/CD pipelines. We will describe how we use TestFlows open source testing framework to implement each step of our QA process and how our approach fits into an open source project with widely distributed contributors. These notions and tooling work just as well for proprietary software like our Altinity.Cloud managed database platform. I hope you will find them useful for any development you do.

Now, let’s get started!

Technologies and Tools

Modern test automation requires that QA engineers have more and more development skills. In fact, all engineers in our QA group are Software Developers in Test. Developers write code to implement the features, and QA engineers write code to verify the implementation. Both spend most of the work time writing code, and thus it is natural for the QA process to adopt as many technologies from development as possible. That is why we use Markdown to write all our QA documentation. We write, organize, and execute our tests using Python 3, and we store everything in Git repositories.

While the choice of core technologies will not be surprising to anyone, we have chosen to carve our own path and use and support TestFlows open source test framework for all our QA needs. TestFlows helps us work with requirements, enables us to write and manage tests in Python, makes our tests readable, provides test specifications, integrates with ClickHouse for test result analytics, and overall brings our QA process to life. You can find more detailed information about the framework on the TestFlows website.

Currently, TestFlows officially only supports Ubuntu 18.04 but can also run on Mac. It needs at least Python 3.6. You can easily install it using pip3.

pip3 install testflows

If you want to follow along, give it a try and install it. Also make sure you have a ClickHouse installation on your system with the clickhouse-client available as we will use it to demonstrate how we write our tests to verify ClickHouse features.

Approach

Our QA group has a straightforward definition of quality that defines our approach to testing. For us, quality equals meeting requirements, and requirements are at the center of our QA process. Our three-step QA process consists of discovering and formalizing requirements, developing tests to verify requirements as well as reporting results and requirements coverage. Because we center our testing around requirements, it can seem that the process would be burdensome and hard to implement in practice. Even large companies with significant size teams sometimes fail to follow through implementing such a relatively simple workflow. It is then natural to ask, how can a smaller company do it? The answer is simple: we see it as the only way to approach testing in a structured manner allowing us to move fast and confidently as a team. TestFlows helps us at each step of the process.

Picking Something to Test

For this article we will demonstrate our practical approach to testing by writing tests for the CREATE USER statement that was added to ClickHouse as part of support for Role-Based Access Control (RBAC). The syntax for this command has many options, as can be seen below.

CREATE USER [IF NOT EXISTS | OR REPLACE] name [ON CLUSTER cluster_name]
    [IDENTIFIED [WITH {NO_PASSWORD|PLAINTEXT_PASSWORD|SHA256_PASSWORD|SHA256_HASH|DOUBLE_SHA1_PASSWORD|DOUBLE_SHA1_HASH}] BY {'password'|'hash'}] 
    [HOST {LOCAL | NAME 'name' | REGEXP 'name_regexp' | IP 'address' | LIKE 'pattern'} [,...] | ANY | NONE]
    [DEFAULT ROLE role [,...]] 
    [SETTINGS variable [= value] [MIN [=] min_value] [MAX [=] max_value] [READONLY|WRITABLE] | PROFILE 'profile_name'] [,...]

The documentation is concise, but as you can imagine testing this feature is not a trivial task. There are many things that we need to check. How can we manage this task? One way is to start writing tests straightaway, but how do we know when we have written enough of them? How can we make sure that we do not forget about some essential aspects of the behavior and that once we are done we can identify what we have tested? Most importantly, how would we know what we did not test at all?

Working With Requirements

The answer to all the questions above lies in the explicit definition of requirements. Having a list of requirements we can manage testing task affectively, as the full scope is known. We will also know the exact point when enough tests are written. This point is reached when tests verify all the requirements. We will not have to worry about forgetting anything to test, as the list of requirements again guides us. Finally, we can quickly identify what we have tested at any point of working on this task as well as know what was not tested at all.

Because testing without requirements is hard, TestFlows makes it easy to write and work with them. We simply define requirements in a Markdown document that has minimal structure. Here is how we would define a few requirements for the testing task at hand.

cat << EOF > requirements.md
# SRS001 ClickHouse \`CREATE USER\` Statement Software Requirements Specification

## Requirements

### RQ.SRS001.User.Create
version: 1.0

ClickHouse SHALL support creating user accounts using the \`CREATE USER\` statement.

### RQ.SRS001.User.Create.IfNotExists
version: 1.0

ClickHouse SHALL support the \`IF NOT EXISTS\` clause in the \`CREATE USER\` statement to skip raising an exception
if a user with the same name already exists and SHALL raise an exception if the \`IF NOT EXISTS\` clause is not specified
but a user already exists.

### RQ.SRS001.User.Create.Replace
version: 1.0

ClickHouse SHALL support \`OR REPLACE\` clause in the \`CREATE USER\` statement to replace existing
user account if already exists.
EOF

The above document defines three requirements. Each requirement is defined as a heading that starts with the RQ. prefix. At least a version attribute must be specified for each requirement. When tests are linked to requirements, the version helps us know when the test must be updated if material changes to the requirement have been made. The description of the requirement is defined by one or more paragraphs that follow it. The rest of the document can contain any text that is needed. We can include images, define tables and include links to other documents. One of the only limitations is that the document is limited to six levels of headings that match HTML <h1> to <h6> tags, but in most cases that is more than enough to structure even the most complex documents.

Such requirement documents are convenient to write, modify and store in Git and they are commonly known as Software Requirements Specifications (SRS). We treat these documents just like code and both GitHub and GitLab provide support to render them directly.

With requirements defined as a Markdown document, TestFlows can generate an HTML version, if needed, and also parse it to create Python objects that we can use to link with our tests.

You can generate a pretty HTML document with the following command.

cat requirements.md | tfs document convert > requirements.html

For large specifications, the HTML document generated by TestFlows is easier to use as it provides better navigation between requirements and the table of contents allowing you to jump around without doing much scrolling.

We generate Python objects with the tfs requirements generate command.

cat requirements.md | tfs requirements generate > requirements.py

If you open the requirements.py file, you will see objects for each of the requirements that we have defined.

# These requirements were auto generated
# from software requirements specification (SRS)
# document by TestFlows v1.6.200905.1143311.
# Do not edit by hand but re-generate instead
# using 'tfs requirements generate' command.
from testflows.core import Requirement

RQ_SRS001_User_Create = Requirement(
        name='RQ.SRS001.User.Create',
        version='1.0',
        priority=None,
        group=None,
        type=None,
        uid=None,
        description=(
        'ClickHouse SHALL support creating user accounts using the `CREATE USER` statement.\n'
        ),
        link=None
    )
...

The requirements above are nothing more than Python objects that we can easily import into modules that define our tests. For example, we can link the requirement RQ_SRS001_User_Create, a generic requirement, to a Feature that will execute one or more scenarios.

from requirements import RQ_SRS001_User_Create

@TestScenario
@Requirements(RQ_SRS001_User_Create("1.0"))
with Feature('create user'):
    for scenario in loads(current_module(), Scenario):
        scenario()

Note that most IDE’s will auto-complete requirement names when you type the import statement. This is convenient and extremely useful when working with a large number of requirements. It is an excellent example of the benefit we get by approaching testing from the developer’s point of view and using the same tools.

Writing Requirements

There is no magic to writing requirements. Just write them. Johns Hopkins University provides an excellent checklist that will help you write better requirements. However, you do have to write requirements in the first place. There is no way around it.

In the software industry, it is generally accepted that the requirements are needed even before development begins. However, it is almost never the case for various reasons. There is a perception that this process belongs to something that only NASA does. Indeed, NASA Systems Engineering Handbook does include as section about technical requirement definition as part of their system design process.

Practical Points

Here are four practical points from our experience of working with requirements.

  1. Just as you don’t have to implement a feature in one go, you should not expect to write all the requirements all at once. You can and should write requirements as you go. The smallest unit of work for development and testing should always be implementing or testing a single requirement.
  2. Requirements define the quality of the product. Before shipping the feature you need to ensure quality by verifying that requirements are met. This naturally implies that when you define and write requirements you always need to approach requirements from the testibility point of view. There is no point to define unverifiable requirements. Always write requirements from the testing point view and make sure they are all verifiable.
  3. It is normal to have generic and specific requirements. Generic requirements are verified by a suite of tests, such as defined by a Feature, and specific requirements are verified by a single test defined by a Scenario.
  4. If you treat requirements just like code then once in a while they need to be refactored and cleaned up. Refactoring and cleaning up requirements is just a natural part of the process.

Conclusion

In this article we started to look at how Altinity tests ClickHouse and contributes to making ClickHouse better for everyone. We have introduced the TestFlows open source test framework which we use in our QA process. We have highlighted that we start testing by defining requirements even when they are not specified during the development process and showed how TestFlows helps us work with requirements.

Stay tuned for the next article in which we will describe how we author our tests, explicitely define our test procedures, link our tests with requirements and how all these steps enable us to clearly monitor test coverage during the whole QA process.

Subscribe to our newsletter, or follow us on Twitter/LinkedIn, so you do not miss the next update!

Share