Blog

Decimals vs. Floats in ClickHouse®

“Nothing brings fear to my heart more than a floating point number.”
Gerald Jay Sussman

When we delve into the realm of databases, we soon discover a fundamental truth: machines interpret and process numbers quite differently from humans. That’s because at their core, computers operate solely on bits, the most basic unit of data that is represented by 0 (zero) or 1 (one), which essentially translates to the absence or presence of an electrical signal in the computer’s circuitry.

In ClickHouse, a powerful column-oriented database management system, understanding the distinction between floats and decimals is vital. The way ClickHouse stores, retrieves, and processes these datatypes can significantly affect the performance and accuracy of your data operations. For instance, when dealing with financial data, opting for decimals over floats in ClickHouse ensures you won’t lose a cent due to rounding errors.

💡 Did you know? The first versions of ClickHouse did not have Decimal support at all. Developers had to do special tricks to handle them manually. But let’s dig into some details.

We all can easily read 110110112 in a binary format (base-2), which translates to 219:

(11011011)2 = 1 x 27 + 1 x 26 + 0 x 25 + 1 x 24 + 1 x 23 + 0 x 22 + 1 x 21 + 1 x 20
            = 128 + 64 + 0 + 16 + 8 + 0 + 2 + 1
            = 219

This is straightforward and easy to understand because each binary digit is assigned a value based on its position. But what happens if we want to save a floating point number like 64.32 or save a number larger than 255 in 8 bits? This leads us to the concept of approximation and sacrificing precision to fit the data within limited storage space. For example, a 32-bit float in IEEE754 format has a range of from -3.4 x 1038 to 3.4 x 1038. It’s a huge number, so usually it is kept in scientific notation for brevity and readability. The exact numerical representation looks like this:

340,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000

Compared to an Int32 range of -231 to 231-1 that’s a significant difference! But wait, there’s even more. Floating point numbers can represent abstract mathematical values like infinity (1/0 for positive infinity and -1/0 for negative infinity) or values that are a result of undefined operations (0/0 = NaN, which translates to “Not a Number”).

☝ Comparing Floats for equality can be problematic due to rounding errors and it can lead to unexpected results. It’s often safer to compare them within a tolerance range. But don’t forget Gerald Jay Sussman’s words from the beginning of the article. The complexity of Floats becomes evident when conducting simple arithmetic operations, as they might not behave as expected:

SELECT toBool( (0.1 + 0.2) = 0.3 )
┌─toBool(equals(plus(0.1, 0.2), 0.3))─┐
│ false                               │
└─────────────────────────────────────┘

The above equation should logically return true, but it doesn’t, illustrating the limitations of floating point numbers. Numbers like 0.1 cannot be precisely represented in binary floating-point arithmetic, leading to these seemingly counterintuitive results. Let’s dive into Floats to see how and when we should use them and when we should use exact representations like Decimals.

☝ In ClickHouse, a numeric literal with a period is interpreted as an approximate (Float64) instead of exact (Decimal) number in comparison to i.e. PostgreSQL where they are interpreted as Decimal by default.

The first standardized representation of floating-point arithmetic was introduced by IEEE (the Institute of Electrical and Electronics Engineers) in 1985 with a standard named IEEE 754-1985. It was a significant milestone in the computing world, since it was a unifying approach for computer systems around the world, ensuring that floating-point arithmetic was handled consistently across different platforms. Before this standardization, various systems had their own methods of handling floating-point numbers, leading to inconsistencies and compatibility issues. This meant that a floating-point calculation performed on one system could yield different results when run on another, hindering cross-system collaboration and portability of software.

The formula for calculating the IEEE754 float number is as follows:

(-1)S x (1 + Mantissa) x 2(Exponent - Bias)

Let’s skip Bias for now (it’s just a safety valve to avoid negative Exponent values). The S is just a flag, signaling a positive or a negative number (0 is for positive and 1 is for negative). Mantissa is the actual number that is stored inside the equation and the Exponent is like a “zoom” level, determining how big or small the number is. For a 32-bit floating point number, 1 bit is for the sign, 8 bits for the exponent, and 23 for the mantissa (fraction):

Source: https://en.wikipedia.org/wiki/Floating-point_arithmetic (CC BY-SA 3.0)

(If you’d like to explore the format in more detail, there’s a fantastic tool at float.exposed that makes it easy to visualize any number you’d like.)

Now, let’s experiment by adjusting the Exponent — think of it as zooming out of the value. You’ll notice that with each increment, the number effectively doubles, illustrating the ‘zooming out’ effect. This is a key aspect of how floating-point numbers dynamically adjust their scale, allowing for a wide range of values to be represented in a compact format.

Thanks to this representation, Floats:

  • are compact (a single-precision float is 32-bits and a double-precision float is 64-bits)
  • represent a wide range of numbers (incredibly small and astronomically large numbers) all within a modest space
  • can store abstract values (NaN/Inf)
  • handle arithmetic very well on modern CPUs (thanks to FPUs)

Unfortunately, it comes with cons, too:

  • Floats come with rounding errors (they are an approximation of real numbers, not the exact values)
  • addition and multiplication are not necessarily associative (changing the order of Floats can yield a different result; see the article Operations on Floats are not associative in the Altinity Knowledge Base)
  • the same number can have different representations, which can cause problems with exact comparisons
  • some numbers have no accurate Float representation
  • compression can be worse

💡 A rule of thumb: If you need strict values, use Decimal; if you do not need strict values, use Floats. The important thing is that casting from text to Float requires parsing algorithms to make sure the text will be converted to the closest machine representable number possible. Altinity contributed the precise_float_parsing setting to ClickHouse; that allows a user to choose between faster or more precise parsing algorithms.

Have you ever wondered how to interpret a binary float in ClickHouse? Here’s a spell to reveal its secrets:

SELECT reinterpretAsFloat32(reverse(unbin('00111110001000000000000000000000'))) AS num
Query id: 1789900d-b80e-4053-b813-47a3ba050de0
┌─────num─┐
│ 0.15625 │
└─────────┘
1 row in set. Elapsed: 0.001 sec.

While there are several drawbacks, Floats are suitable for many use cases like audio processing, graphics, scientific computing, and physics, but there are other cases where you need to be accurate (like in handling money). Understanding when to employ their unique talents is key in the vast world of programming.

Decimals

Unlike floating-point numbers that are approximate representation of real numbers, Decimals offer the exact representation using base-10 fractions. When defining a Decimal, two key parameters come into play: precision and scale; that is, the total number of digits that can be stored (precision) and the number of digits that can be stored to the right of the decimal point (scale). Decimals are represented as Integer values, with the position of the decimal point embedded in the datatype.

SELECT toDecimal32(22, 8) AS x;
Code: 407. DB::Exception: Received from localhost:9000. DB::Exception: Decimal convert overflow: 2200000000 is not in range (-2147483648, 2147483647): While processing toDecimal32(22, 8) AS x. (DECIMAL_OVERFLOW)

In ClickHouse, there are 4 Decimal types: Decimal32, Decimal64, and also emulated Decimal128 and Decimal256 with main differences in maximum scale and precision bounded by their underlying Integer type. Emulated means that these larger decimal types are not directly supported by hardware-level operations in the CPU, and therefore, their operations are implemented in software resulting in significantly slower performance compared to their hardware supported equivalents.

Decimals, being precise and exact, cannot represent abstract mathematical concepts like NaN (Not a Number) or Inf (Infinity). Also, dividing a decimal number by zero will yield an exception. Furthermore, what is important in this datatype is handling of overflow. In Integer types you cannot control this behavior (this datatype will always overflow, the same as in C++), but with the Decimal datatype decimal_check_overflow setting is turned on by default, ensuring that any operation resulting in a value exceeding the Decimal’s capacity triggers an error rather than silently producing an incorrect result (although it comes with some performance impact).

⚠️ Emulated Decimals (Decimal128 and Decimal256) do not have overflow decimal check implemented. It’s a known ClickHouse issue. There are also rules for scale; that means how result scale of a Decimal should be calculated based on the operation. Multiplication causes a Decimal‘s scale to change by adding scales of multiplied Decimals; dividing result scale is the scale of the left number, and addition or subtraction will result in a scale of the max scale of the numbers.

SELECT toDecimal32(7, 7) * toDecimal32(7, 7);
Received exception from server (version 23.12.1):
Code: 69. DB::Exception: Received from localhost:9000. DB::Exception: Scale 14 is out of bounds (max scale: 9): While processing toDecimal32(7, 7) * toDecimal32(7, 7). (ARGUMENT_OUT_OF_BOUND)

It can be very frustrating at first when simple operations on those types are causing Decimal math overflow or Scale out of bounds exceptions, so you can use the multiplyDecimal() and divideDecimal() functions developed by Altinity. Those variants work significantly slower than usual operations, but they give control over the result precision.

☝ If you need precise computations and you have no time limits, then you should use divideDecimal() and multiplyDecimal(). Otherwise it is recommended to use the `/` and `*` operations respectively. As with floats, you should be aware of casting and overflow issues. Let’s look at this example:

SELECT
    CAST('9.2', 'Decimal64(2)') AS postgresql_cast,
    toDecimal64(9.2, 2) AS to_function,
    CAST(9.2, 'Decimal64(2)') AS cast_float_literal,
    CAST('9.2', 'Decimal64(2)') AS cast_string_literal
Query id: 8b9fe7ef-1990-4b59-a46b-a6790d9677bc
┌─postgresql_cast─┬─to_function─┬─cast_float_literal─┬─cast_string_literal─┐
│             9.2 │        9.19 │               9.19 │                 9.2 │
└─────────────────┴─────────────┴────────────────────┴─────────────────────┘

When ClickHouse sees a number with a decimal separator it interprets it as a Float64 literal and since 9.2 has no accurate representation (see float.exposed/0x41133333) you actually get something like 9.19999999999999. Later, this Float is cast to Decimal by removing the extra precision. The workaround is very simple; wrap the number in quotes and it will be considered as a string literal by query parser, and will be transformed to Decimal directly. Another major thing that troubles ClickHouse users is dividing Decimals. Let’s look at simple example:

SELECT CAST(200, 'Decimal32(4)') / CAST(3, 'Decimal32(4)')
Received exception from server (version 23.11.1):
Code: 407. DB::Exception: Received from localhost:9000. DB::Exception: Decimal math overflow: While processing CAST(200, 'Decimal32(4)') / CAST(3, 'Decimal32(4)'). (DECIMAL_OVERFLOW)

At a first glance there should be no overflow here, yet the ClickHouse throws a DECIMAL_OVERFLOW exception. The overflow occurs during this intermediate step, before the actual division. To understand what’s going on, we need to know what is done under the hood. Since Decimal‘s underlying type is Integer, to divide we need to first multiply the number by the total precision and a scale of both numbers using Integers; internally ClickHouse does it on UInt256 type. Then, before dividing the Integer, we need to cast it back to the underlying Decimal type (the integer_overflow column). Here is the step when integer overflows happen. If we turn on decimal_check_overflow we can see what exactly is happening:

SET decimal_check_overflow = 0;
SELECT CAST(200, 'Decimal32(4)') / CAST(3, 'Decimal32(4)') decimal_overflow,
       -- almost the same as --
       CAST(200_0000, 'UInt64') * CAST(1_0000, 'UInt256') preresult,
       preresult::Int32 integer_overflow,
       intDiv(integer_overflow, 3_0000) division,
       division / pow(10,4) as bad_result
┌─decimal_overflow─┬───preresult─┬─integer_overflow─┬─division─┬─bad_result─┐
│          -4.9161 │ 20000000000 │      -1474836480 │   -49161 │    -4.9161 │
└──────────────────┴─────────────┴──────────────────┴──────────┴────────────┘

Conclusion

At a high level, choosing between Decimals and Floats seems straightforward: use Decimals if you need accuracy, and use Floats if you need very large numbers or if you need to represent values like infinity or NaN. However, understanding how these datatypes work inside ClickHouse gives you the insight you need to understand any issues that might arise and how to work around them. With the details we’ve covered here, there’s no need to share Gerald Jay Sussman’s fears.

Share

ClickHouse® is a registered trademark of ClickHouse, Inc.; Altinity is not affiliated with or associated with ClickHouse, Inc.

Table of Contents:

Related: