Visualizing Collatz Sequences with ClickHouse® and Grafana

You know more of a road by having traveled it
than by all the conjectures and descriptions in the world.
—William Hazlitt
Math games with ClickHouse continue! Inspired by a beautiful visualization of the Collatz conjecture (see here, for example), I decided to do it with ClickHouse. ClickHouse has powerful tools to represent sequences via arrays, and it can be used for a lot of useful applications that process vectors, as shown by articles on arrays in our blog. The recent addition of recursive CTEs extended those capabilities to the next level.
The Collatz Conjecture
The Collatz Conjecture is one of the unresolved problems in number theory. It has a very simple formulation but it is extremely hard, if even possible, to provide a formal proof. Consider there is a function:
For every natural number we can build a sequence repeatedly applying a function above: n, f(n), f(f(n)) …
For example, let’s take number 3. 3 is odd, so f(3) = 3*3 + 1 = 10. 10 is even, so f(10) = 5, etc. If we continue the process we will get a sequence: 3,10,5,16,8,4,2,1. Once the sequence reaches 1, it enters a cycle: 1,4,2,1,4,2,1, so we stop at 1.
So the Collatz Conjecture is: for any natural number the sequence above always reaches 1. Sounds extremely simple, doesn’t it? It’s as “simple” as many other number theory conjectures that are extremely hard to prove formally.
The function above generates sequences for different starting points. It may be used in order to produce nice visualizations. Let’s show how to generate such sequences in ClickHouse and visualize with Grafana.
Generating Collatz Sequences
In order to generate a Collatz sequence, we need to start with some number, apply transformations using rules above, and stop after reaching 1. This is very easy to code in any programming language, but may seem non-trivial in SQL. We will apply recursive CTEs for that purpose. Recursive CTEs were introduced in ClickHouse a year ago in the 24.4 release.
Recursive CTEs are among the most subtle tools of the SQL language. They can be used for such non-trivial tasks as tree traversal, graph analysis and others where there is recursion with unknown depth. The Collatz sequence is an example of such a case – we do not know how many iterations are required, so recursion is a natural choice here. Here is a query that generates Collatz sequences for the first 10 numbers:
WITH RECURSIVE collatz as (
SELECT [number] as seq FROM numbers(1,10)
UNION ALL
SELECT arrayPushBack(seq, if(seq[-1] % 2 == 0,
intDiv(seq[-1],2),
seq[-1]*3+1)
) as new_seq
FROM collatz
WHERE seq[-1] != 1
)
SELECT * FROM collatz
WHERE seq[-1] = 1
ORDER by seq[1]
Let’s see how it works:
- We need some start state. It is first the part of the query that generates start numbers from 1 to 10 as single element arrays.
- The main body is the second part of the query. It checks the last element of an array (seq[-1] means last element) and adds a new one at the end using Collatz sequence rules
- The end of recursion is defined by WHERE condition WHERE seq[-1] != 1 – we stop when the last element is 1.
- Since CTE is UNION ALL, it generates all steps of recursion. What is left is to filter the resulting sequences using WHERE se1[-1] = 1 condition.
Here is the result of the query above:
┌─seq────────────────────────────────────────────────────┐
│ [1] │
│ [2,1] │
│ [3,10,5,16,8,4,2,1] │
│ [4,2,1] │
│ [5,16,8,4,2,1] │
│ [6,3,10,5,16,8,4,2,1] │
│ [7,22,11,34,17,52,26,13,40,20,10,5,16,8,4,2,1] │
│ [8,4,2,1] │
│ [9,28,14,7,22,11,34,17,52,26,13,40,20,10,5,16,8,4,2,1] │
│ [10,5,16,8,4,2,1] │
└────────────────────────────────────────────────────────┘
Note: To get this to work you need to use ClickHouse 24.4 or later with the analyzer enabled.
We have the data, now it’s time to visualize it.
Generating Trace Lines
There are multiple ways to visualize the sequences, but I liked the following: start from (x,y) = (0,0) and some angle phi, and for every step of a sequence draw a line using the following rules:
- If seq[i] is even, change angle phi(i+1) = phi(i) + a1
- If seq[i] is odd, change angle phi(i+1) = phi(i) – a2
- Then move the next point in the direction defined by angle phi, i.e.:
x(i+1) = x(i) + l*sin(phi)
y(i+1) = y(i) + l*cos(phi) - The length of every step may be reduced with a decay rule in order to make the graph more dense
That will produce a number of trace lines all starting at (0,0). Many traces overlap. The density of overlapping defines a color. So, let’s generate traces first using a sequence of some specific number, for example 7. We will use another cool ClickHouse function for that – arrayFold – it allows us to process an array with an accumulator. We will use an accumulator in order to keep the current point and angle while traversing the sequence:
WITH [7,22,11,34,17,52,26,13,40,20,10,5,16,8,4,2,1] as seq,
0.1 as a1,
0.1 as a2,
x -> 1 as decay,
arrayFold( acc, i, x -> arrayPushBack(acc,
(acc[-1].1 + sin(acc[-1].3)*decay(i),
acc[-1].2 + cos(acc[-1].3)*decay(i),
if( x%2 == 0, acc[-1].3 + a1, acc[-1].3 - a2)
)),
arrayEnumerate(seq), seq, [(0.0, 0.0, 0.0)]) as trace_w_angle,
arrayMap( x -> (x.1, x.2), trace_w_angle) as trace
SELECT arrayJoin(trace)
How does arrayFold work? The function signature is not SQL-ish:
arrayFold(lambda_function, arr1, arr2, ..., accumulator)
It has three or more parameters that are easiest to explain in reverse order:
- The initial state, which is usually named accumulator: [(0.0, 0.0, 0.0)] – in our case it is an array of a single tuple element that defines coordinates x, y, and a current angle.
- One or more arrays to be iterated through, in our case those are the array of indices (arrayEnumerate(seq)) and the sequence itself
- A transformation rule that takes the current state, and array elements, and generates a new state
So for every element of an input Collatz sequence, it will take the current accumulator and transform it to the new state adding the next visualization point. At the end we will get an array of points that we can map to the graph:
┌─arrayJoin(trace)──────────────────────────┐
│ (0,0) │
│ (0,1) │
│ (-0.09983341664682815,1.9950041652780257) │
│ (-0.09983341664682815,2.9950041652780257) │
│ (-0.1996668332936563,3.9900083305560514) │
│ (-0.1996668332936563,4.990008330556051) │
│ (-0.29950024994048446,5.985012495834077) │
│ (-0.29950024994048446,6.985012495834077) │
│ (-0.1996668332936563,7.980016661112103) │
│ (-0.1996668332936563,8.980016661112103) │
│ (-0.09983341664682815,9.975020826390129) │
│ (0.09883591414823306,10.95508740423137) │
│ (0.39435612080957266,11.910423893356977) │
│ (0.5930254516046339,12.89049047119822) │
│ (0.8885456582659735,13.845826960323826) │
│ (1.277964000574624,14.766887954326712) │
│ (1.757389539178827,15.644470516217085) │
│ (2.3220320125738625,16.469806131126763) │
└───────────────────────────────────────────┘
Visualization with Grafana XY Chart
ClickHouse and Grafana work great together. Altinity is a proud maintainer of the most popular Altinity Grafana data source for ClickHouse with more than 23 million downloads. In order to visualize traces we will use XY Chart. It requires two columns with XY coordinates, so final SELECT needs to be translated as:
SELECT arrayJoin(trace).1 x, arrayJoin(trace).2 y
If we run it for this Collatz sequence of 7 we will get a fancy line like the following:
Now let’s try to plot multiple traces together. We will have to combine RECURSIVE CTE and traces together. We will also generate 100 random traces from different start points below 1000000:
WITH RECURSIVE collatz as (
SELECT [rand() % 1000000] as seq FROM numbers(1,100)
UNION ALL
SELECT arrayPushBack(seq, if(seq[-1] % 2 == 0, intDiv(seq[-1],2), seq[-1]*3+1)) as new_seq
FROM collatz
WHERE seq[-1] != 1
),
seqs as (
SELECT seq FROM collatz
WHERE seq[-1] = 1
ORDER by seq[1]
),
0.1 as left,
0.1 as right,
x -> 1 as decay,
traces as (
SELECT arrayFold(
acc, i, x -> arrayPushBack(acc,
(acc[-1].1 + sin(acc[-1].3)*decay(i),
acc[-1].2 + cos(acc[-1].3)*decay(i),
if( x%2 == 0, acc[-1].3 + left, acc[-1].3 - right)
)),
arrayEnumerate(seq), seq, [(0.0, 0.0, 0.0)]) trace
FROM seqs
)
SELECT arrayJoin(trace).1 x, arrayJoin(trace).2 y
FROM traces
The result does not look very pretty yet:
This is because Grafana can not plot multiple traces. So we will do a trick: we will reverse every trace back to the starting point!
arrayConcat(trace, arrayReverse(trace))
Now it looks a bit better:
It still does not look pretty enough, so you have to experiment with parameters. After some tuning I got this nice looking tree:
It used the following parameters:
0.1 as left,
0.2 as right,
x -> 1/pow(1.1,x) as decay
Grafana XY Chart claims to support coloring. Unfortunately, I could not get it to work. Perhaps you’ll see it in a future article.
Final Words
ClickHouse is probably the most versatile and fun to use database in the world. It can do everything from real-time ingest, scanning terabytes of data, or playing with math puzzles. In this article I showed how it can be used in order to generate and visualize Collatz sequences. Along the way I illustrated two powerful ClickHouse features – recursive CTEs and the arrayFold function. Please join the joy and share your fun visualizations using ClickHouse!
ClickHouse® is a registered trademark of ClickHouse, Inc.; Altinity is not affiliated with or associated with ClickHouse, Inc.