# Into the Deep End: Learning SQL by Taking On Project Euler Puzzles (Part 1)

**Contents:**

Young man, in mathematics you don’t understand things. You just get used to them.

John von Neumann

inThe dancing Wu Li Mastersby Gary Zukav

Mathematics teaches us that the intuition of a new idea is easier to grasp after you’ve used it for a while and seen it tie together with everything else. It’s pointless to flail with analogies and book learning without getting your hands dirty, so to speak. So when tasked with learning SQL in ClickHouse and solving a few problems from the Project Euler archives, I got a brief overview of SQL syntax and went straight to work on the problems. I wanted to learn SQL by jumping into the deep end with a project, starting first with an application and letting the understanding happen along the way. Here are the details of how I did it and what I learned from each one; I hope it makes writing SQL in ClickHouse feel more accessible and fun for programmers of all levels and persuasions!

I worked on a MacBook Pro with 2 GHz Quad-Core Intel Core i5 processor with hyper-threading enabled and 16GB memory. I used ClickHouse version 24.6.1.2493 to run my SQL queries on my computer and edited my queries in DBeaver. For each of my algorithms I give execution time as the average time of five trials. This article is inspired by a similar one by Alexander Zaitsev at Altinity: “Math Games in ClickHouse®– Solving Project Euler Problems”.

## Problem 8: Largest Product in a Series

My first challenge was Problem 8, which reads as follows: “Find the thirteen adjacent digits in the 1000-digit number [shown below] that have the greatest product. What is the value of this product?” For example, we can consider the first two sets of 13 adjacent digits in the big number: {7,3,1,6,7,1,7,6,5,3,1,3,3} and {3,1,6,7,1,7,6,5,3,1,3,3,0}. The elements of the first set multiply out to 5,000,940, and the elements of the second set multiply out to 0. Clearly, the former has the greatest product, but the object of the problem is to repeat this for all other sets of 13 adjacent digits.

### Algorithm 1: splitting the whole number into 13-grams

This was my first real project in ClickHouse; rated at 5% difficulty on Project Euler, it was a warm-up for the tougher puzzles ahead. After picking the problem, I read through all of the official documentation to find the tools I could put in my pockets. That was how I found the nifty `ngrams()`

function, which takes in a string and returns an array of all substrings of length n. Here’s a simple example with n = 2.

```
SELECT ngrams('abcd', 2)
┌─ngrams('abcd', 2)─┐
│ ['ab','bc','cd'] │
└───────────────────┘
```

There’s nothing like `ngrams()`

to use on an integer, which is why I had to enter `big_num`

as a string, split that up into 13-grams, and turn each element of the 13-gram back into an integer before I could start multiplying. At this point, I also learned about `arrayMap()`

and `arrayFilter()`

functions, which work well here with `ngrams()`

. In this problem and the others that I worked on, arrays and array functions were essential to my queries. `arrayMap()`

and `arrayFilter()`

are similar functions; both take in a lambda function `func`

and an array `arr`

to work over. Then `arrayMap()`

applies `func`

to each element of `arr`

and returns this new array, while `arrayFilter()`

returns an array containing only those elements x in `arr `

such that `func(x)`

0.

This simple query demonstrates how they work on `range(1,10)`

, which is the array of integers 1 through 9.

```
SELECT arrayFilter(x -> x > 5, range(1,10)) AS arr_filt
┌─arr_filt──┐
│ [6,7,8,9] │
└───────────┘
SELECT arrayMap(x -> x * 2, range(1,10)) AS arr_map
┌─arr_map──────────────────┐
│ [2,4,6,8,10,12,14,16,18] │
└──────────────────────────┘
```

In the above example, the `arrayFilter()`

function keeps only those numbers x in the array `range(1,10)`

where x > 5. The `arrayMap()`

function takes all numbers x in the array and multiplies them by 2. With all three functions in hand, I was ready to write my first query solving Problem 8.

```
WITH '7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843858615607891129494954595017379583319528532088055111254069874715852386305071569329096329522744304355766896648950445244523161731856403098711121722383113622298934233803081353362766142828064444866452387493035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776657273330010533678812202354218097512545405947522435258490771167055601360483958644670632441572215539753697817977846174064955149290862569321978468622482839722413756570560574902614079729686524145351004748216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586178664583591245665294765456828489128831426076900422421902267105562632111110937054421750694165896040807198403850962455444362981230987879927244284909188845801561660979191338754992005240636899125607176060588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450' AS big_num,
ngrams(big_num, 13) AS big_num_as_13grams,
length(big_num_as_13grams) AS num_of_13grams,
arrayMap(a -> ngrams(a, 1), big_num_as_13grams) AS `13grams_to_1grams`,
arrayMap(x -> arrayMap(y -> toUInt256(y), x), `13grams_to_1grams`) AS str_arr_to_int_arr,
arrayMap(w -> arrayProduct(w), str_arr_to_int_arr) AS arr_product
SELECT
arrayMax(arr_product) AS max_arr_product,
argMaxArray(str_arr_to_int_arr, arr_product) AS array_w_max_product
```

The result is, which is the same as the Project Euler solution:

```
┌─max_arr_product─┬─array_w_max_product─────────┐
│ 23514624000 │ [5,5,7,6,6,8,9,6,6,4,8,9,5] │
└─────────────────┴─────────────────────────────┘
```

### How Algorithm 1 works

The query may look complicated, but it’s easy to understand if we break it up into small pieces.

- I put most of the lines in the
`WITH`

clause to print a shorter result. This was just a formatting choice and had no effect on computational efficiency. `ngrams()`

breaks the string`big_num`

into all possible adjacent groups of 13, and puts these in an array of strings named`big_num_as_13grams`

.- To compare with the next algorithm, count the number of ngrams in this array – the result is 988.
- In
`big_num_as_13grams`

, for each string of 13 digits in the array, turn that string into an array of 13 single digits. Now we have`13grams_to_1grams`

, which is an array of arrays of strings. - For each array in
`13grams_to_1grams`

, turn each string (a single digit) into an integer. - For each array of integers in
`13grams_to_1grams`

, take the product of the 13 elements. - The maximum of these products comes out to 23,514,624,000.
- The final step is not necessary for the problem, but it shows us which array of integers has the max product. It turns out to be [5,5,7,6,6,8,9,6,6,4,8,9,5]. That’s nice for checking the math.

ClickHouse solved this in 18.8 milliseconds and used 4.20 MiB of memory. Not bad, but since this isn’t too complex of a problem, I figured this could be reduced.

### Algorithm 2: removing all zeroes

The big number has lots of zeros. When considering the maximum product of adjacent digits, any 13-gram containing a zero will automatically be unnecessary. I improved the query by splitting up the big number into an array of numbers that don’t contain 0’s, which removes many 13-grams from later calculations in the query.

```
WITH '7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843858615607891129494954595017379583319528532088055111254069874715852386305071569329096329522744304355766896648950445244523161731856403098711121722383113622298934233803081353362766142828064444866452387493035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776657273330010533678812202354218097512545405947522435258490771167055601360483958644670632441572215539753697817977846174064955149290862569321978468622482839722413756570560574902614079729686524145351004748216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586178664583591245665294765456828489128831426076900422421902267105562632111110937054421750694165896040807198403850962455444362981230987879927244284909188845801561660979191338754992005240636899125607176060588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450' AS big_num,
splitByChar('0', big_num) AS split_big_num,
arrayFilter(x -> (length(x) >= 13), split_big_num) AS filtered_arr,
arrayMap(x -> ngrams(x, 13), filtered_arr) AS arr_of_arr_of_13grams,
arraySum(x -> length(x), arr_of_arr_of_13grams) AS num_of_13grams,
arrayMap(x -> arrayMap(y -> arrayProduct(z -> toUInt8(z), ngrams(y, 1)), x), arr_of_arr_of_13grams) AS arr_of_arr_product,
arrayMap(x -> arrayMax(x), arr_of_arr_product) AS arr_of_arr_max
SELECT arrayMax(arr_of_arr_max) AS final_arr_max
```

The result is:

```
┌─final_arr_max─┐
│ 23514624000 │
└───────────────┘
```

### How Algorithm 2 works

This algorithm is similar to Algorithm 1, with a few differences that make it run faster. I found it a little tricky to follow which “level” of the array I was in. Let’s look at what each line does while carefully tracking the data types.

- Again starting with the big number as a string, we cut up that string anywhere that it contains a zero. Then we get an array of strings called
`split_big_num`

, where none of these strings contains a zero. - Of all of the strings in the array
`split_big_num`

, we remove the ones that have length <13, and name this new array of strings`filtered_arr`

. - For each string in
`filtered_arr`

, we use the`ngrams()`

function that gives us an array of all 13-grams out of that string. We have an array of arrays of strings named`arr_of_arr_of_13grams`

. - Count the number of 13-grams in each array of 13-grams, then sum these to get the total number of 13-grams – the result is 263, which is much fewer than in Algorithm 1.
- Take each 13-gram and use the
`ngrams()`

function to turn it into an array of 1-grams, one digit each. Turn these 13 1-grams into integers, then take their product. We have an array of arrays of Float64 named`arr_of_arr_product`

. - For each of the arrays of Float64 (each float is a product), we take the maximum value. We have an array of Float64 named
`arr_of_arr_max`

. - Finally, we take the max of
`arr_of_arr_max`

, that is, the maximum of those previously calculated products. Our final answer is again 23,514,624,000.

ClickHouse computed this in 6.4 milliseconds and used 3.99 MiB of memory, so the optimizations noticeably decreased the execution time. Nice! That one wasn’t too hard, which gave me a pep in my step as I started the next problem.

## Problem 100: Arranged Probability

I liked playing with array functions and found another application in Euler Problem 100. This problem is harder than the first, rated 30% difficulty; on the plus side, we get to play with colors. The problem statement reads: “If a box contains twenty-one coloured discs, composed of fifteen blue discs and six red discs, and two discs were taken at random, it can be seen that the probability of taking two blue discs, P(BB) = . The next such arrangement, for which there is exactly 50% chance of taking two blue discs at random, is a box containing eighty-five blue discs and thirty-five red discs. By finding the first arrangement to contain over 1,000,000,000,000 = discs in total, determine the number of blue discs that the box would contain.”

### Algorithm 1: solution on a smaller scale

There’s a great saying that weeks of coding can save you hours of planning. In an effort to avoid this trap, I tried to optimize this problem by first considering the math. Let b be the number of blue discs in the pot before drawing and let r be the number of red discs in the pot. The desired b and r satisfy . For the very large b and r that I’m looking for in this problem, note that and will be incredibly close together. Therefore, as b and r increase, and both get closer and closer to . Now, this told me , which gave a better picture of what kind of r I needed for a given b.

To illustrate this, I wrote an elementary and inefficient query, just to see what the first few answers look like, and if my mathematical assumptions would be useful:

```
┌─────b─┬─r──────┬─b/(b+r)──────────────┬────────────1/sqrt2─┬──────(sqrt2 - 1)*b─┐
│ 3 │ [1] │ [0.75] │ 0.7071067811865475 │ 1.2426406871192854 │
│ 15 │ [6] │ [0.7142857142857143] │ 0.7071067811865475 │ 6.213203435596427 │
│ 85 │ [35] │ [0.7083333333333334] │ 0.7071067811865475 │ 35.20815280171309 │
│ 493 │ [204] │ [0.7073170731707317] │ 0.7071067811865475 │ 204.20728624993592 │
│ 2871 │ [1189] │ [0.7071428571428572] │ 0.7071067811865475 │ 1189.2071375731562 │
│ 16731 │ [6930] │ [0.7071129707112971] │ 0.7071067811865475 │ 6930.207112064255 │
└───────┴────────┴──────────────────────┴────────────────────┴────────────────────┘
```

In this table we see that for the increasing pairs (b, r), we get very close to , and r gets very close to . I suspected that this trend would continue for larger solutions. Since this was an approximation, I needed to leave a small range in which to find r. For this algorithm, I defined this range as `range_for_num_red = `

[), where the lower end of this range rounds *down* at the 32,000th decimal point, and the upper end of the range rounds *up* at the 32,000th decimal point. 32,000 is as high a parameter as the `floor()`

and `ceil()`

functions will take, so I made this window as tiny as possible to optimize my calculations.

Here’s another mathematical optimization that I found: if I have , and I want only those (b, r) where , then . This gave me an approximate minimum value for b, but I still checked values slightly below this.

This algorithm relies on the `numbers(a, b)`

function, which returns a table with a `number`

column listing all integers from a through a + b – 1, where a and b are integers. A major obstacle to solving this problem was that the `numbers()`

function takes a lot of time and storage to read rows. Since I didn’t know specifically where to find the target pair (b, r), I tried to run `numbers()`

over the range from `min_blue`

to `min_blue`

+ . After 6 minutes of ClickHouse running the query, I cut it off – maybe it would have completed eventually, but not within a desirable time period. However, when I replaced with in the code, it completed, as shown below.

```
WITH
sqrt(2) AS sqrt_2,
toUInt64(floor(1e9 / sqrt_2)) - 1e4 AS min_blue,
floor(sqrt_2, 32000) - 1 AS round_down,
ceil(sqrt_2, 32000) - 1 AS round_up
SELECT
b,
range_for_num_red,
arrayFilter(r -> r + b >= 1e9, range_for_num_red) AS filt_by_sum,
arrayMap(r -> toDecimal64((b / (b + r)) * ((b - 1) / (b + r - 1)), 18), filt_by_sum) AS arr_map_to_product,
arrayFilter(x -> (abs(x - 0.5) < toDecimal64(1e-18, 18)), arr_map_to_product) AS arr_filt_by_product
FROM
(
SELECT
number AS b,
range(toUInt64(floor(round_down * b)), toUInt64(ceil(round_up * b))) AS range_for_num_red
FROM numbers(min_blue, 1e10)
) AS range_for_num_red_subquery
WHERE notEmpty(arr_filt_by_product)
LIMIT 1
```

The result is:

```
┌─────────b─┬─range_for_num_red─┬─filt_by_sum─┬─arr_map_to_product─┬─arr_filt_by_product─┐
│ 710477454 │ [294289397] │ [294289397] │ [0.5] │ [0.5] │
└───────────┴───────────────────┴─────────────┴────────────────────┴─────────────────────┘
```

### How Algorithm 1 works

As I mentioned before, this query doesn’t solve the actual Problem 100, but it does solve the same problem for a pot containing discs total. Algorithm 1 was a step in the right direction that I knew wouldn’t get me all the way there. I’ll explain my thoughts behind this query.

- The
`WITH`

clause contains Common Table Expressions (CTE’s) for numbers that are complex to calculate. By defining`sqrt_2, round_down,`

and`round_up`

as CTE’s, ClickHouse only has to calculate these numbers once and call on them later, as opposed to calculating them again for each b value. - In the subquery
`range_for_num_red_subquery`

, we consider values of b in the range [). For each b, define`range_for_num_red`

as values of . - In the main query, for each b, use the
`arrayFilter()`

function on`range_for_num_red`

to keep only those r values satisfying . - Apply the
`arrayMap()`

function to send each r value in`filt_by_sum`

to the product , which is the probability of drawing two blue in a pot containing b + r discs. I found that turning this into a decimal instead of the default float makes the next step more accurate, so I turn this product into a Decimal64 with accuracy up to the 18th decimal point, which is the maximum accuracy for Decimal64. - Apply
`arrayFilter()`

again to filter out anything from`arr_map_to_product`

that isn’t equal to 0.5. ClickHouse easily confuses something like 0.49999999917781 for exactly 0.5, so we have to be very careful here. We require that the absolute difference between a probability in`arr_map_to_product`

and 0.5 is less than to consider them equal. - Most b candidates in the range [) don’t have an r value satisfying both these conditions, so we get a lot of b values that give us empty arrays for
`arr_filt_by_product`

. Use`WHERE`

and`LIMIT`

clauses to show only the first (b, r) pairing that solves the problem.

This takes 27.21 MiB of memory and 0.103 seconds to run when solving the problem up to . This query surely had room for improvement, but I could see that `numbers()`

would hold the whole query back from reaching . As a student of pure math, sometimes it’s amazing to see what computers can do that my mind and pencil can’t; there are also times like this where I have to use math to rewrite the problem before I can get anything useful from the computer. At this point, I realized I had overlooked a simpler and more mathematical path to the solution.

### Algorithm 2: using the arrayFold() function

Let’s look again at the table of the first solutions that I showed earlier.

```
┌─────b─┬─r──────┬─b/(b+r)──────────────┬────────────1/sqrt2─┬──────(sqrt2 - 1)*b─┐
│ 3 │ [1] │ [0.75] │ 0.7071067811865475 │ 1.2426406871192854 │
│ 15 │ [6] │ [0.7142857142857143] │ 0.7071067811865475 │ 6.213203435596427 │
│ 85 │ [35] │ [0.7083333333333334] │ 0.7071067811865475 │ 35.20815280171309 │
│ 493 │ [204] │ [0.7073170731707317] │ 0.7071067811865475 │ 204.20728624993592 │
│ 2871 │ [1189] │ [0.7071428571428572] │ 0.7071067811865475 │ 1189.2071375731562 │
│ 16731 │ [6930] │ [0.7071129707112971] │ 0.7071067811865475 │ 6930.207112064255 │
└───────┴────────┴──────────────────────┴────────────────────┴────────────────────┘
```

When I first started this problem, my hunch was that there might be a recursive pattern between the pairs of b and r values, and then I moved on. When nothing else was working later, I came back to this hunch. By staring for a long time at the values for b and r, I noticed a pattern: and . This is a much better way to solve this problem that doesn’t require ClickHouse to read 10 trillion rows through the `numbers()`

function.

While ClickHouse doesn’t exactly support recursive functions, the closest thing is the `arrayFold()`

function, and it’s perfect for this problem. `arrayFold()`

takes in a lambda function, a range to work over, and an accumulator that the function modifies. The accumulator is an n-tuple `(acc.1, acc.2, …, acc.n)`

. Here’s a simple example of how `arrayFold()`

works in ClickHouse:

```
SELECT arrayFold(
(acc, x) ->(acc.1 + x, acc.2, acc.3 * x, acc.1),
range(1,5),
(3::Int64, 2::Int64, 10::Int64, 0::Int64)
) AS final_result
```

`arrayFold()`

starts with the accumulator (3,2,10,0) and it iterates for x in `range(1,5)`

. The lambda function is `(acc, x) -> (acc.1 + x, acc.2, acc.3 * x, acc.1)`

. So, the consecutive iterations of the program look like this:

x | (acc.1,acc.2,acc.3,acc.4) |

Initial state of the accumulator | (3,2,10,0) |

1 | (4,2,10,3) |

2 | (6,2,20,4) |

3 | (9,2,60,6) |

4 | (13,2,240,9) |

That’s what’s happening under the hood. The final answer that ClickHouse returns is:

```
┌─final_result─┐
│ (13,2,240,9) │
└──────────────┘
```

I hope this shows how `arrayFold()`

can be useful for recursive functions. It lets users modify, save, or move each element of the n-tuple while iterating many times. For Problem 100, I needed an `arrayFold()`

that would use the (n-1)-th step to calculate the n-th step using the recursive relation I found.

```
SELECT
arrayFold(acc, x -> if((acc.3 + acc.4 <= 1e12), (acc.3, acc.4, 5*acc.3 + 2*acc.4 - 2, 2*acc.3 + acc.4 - 1, acc.5 + 1),
(acc.1,acc.2,acc.3,acc.4,acc.5)),
range(20),
(3::Int64, 1::Int64, 15::Int64, 6::Int64, 2::Int64))
AS "[blue(n-1), red(n-1), blue(n), red(n), iteration]"
```

This gives the result:

```
┌─[blue(n-1), red(n-1), blue(n), red(n), iteration]───────┐
│ (129858761425,53789260175,756872327473,313506783024,16) │
└─────────────────────────────────────────────────────────┘
```

### How Algorithm 2 works

This algorithm is short and powerful, returning the final answer in just a single call to `arrayFold()`

. It’s a bit more complex than the `arrayFold()`

example I gave above but it works the same, with an accumulator, lambda function, and range.

- We have an accumulator of the form:
`(acc.1, acc.2, acc.3, acc.4, acc.5)`

= (), where n is the number of solutions found so far. - The initial state of the accumulator holds the first and second solutions to the equation , which are (b,r) = (3, 1) and (15, 6), respectively. The solution counter starts at 2, because there are already 2 solutions stored in the accumulator. So for the n-th iteration, we have
`acc.5`

= n + 2. The initial accumulator looks like (3, 1, 15, 6, 2). - For the range, I somewhat arbitrarily started with
`range(20)`

, which will repeat the lambda function on the accumulator 20 times. The final solution count will show that 20 is large enough. - The lambda function takes the current accumulator and applies the recursive functions for b and r. In the n-th iteration of the function, we move from the spot to the spot, move from the spot to the spot, set the new as , and set the new as . Then we increase
`acc.5`

by one to mark that we found another solution. - Repeat this 20 times, or while
`acc.3 + acc.4`

= . Once , then the lambda function leaves the accumulator the same for the remaining iterations. - The solution counter stops when and this tells us how many pairs of solutions there are in total: 16, which means there were 14 iterations where
`arrayFold()`

modified the accumulator (since we started with two solutions). - The answer to Problem 100 is the first satisfying the criteria. ClickHouse returns this under , and we get a final answer of 756,872,327,473.

Hooray! This uses 4.0 MiB of memory and only 8.2 ms to run, which is orders of magnitude more efficient. I was sorry to part with my first algorithm, but the takeaway from this problem is to step back and look for an easier (and at all feasible) way to solve it. Rethinking the problem allowed me to play to the strengths of ClickHouse – I was shocked by how quickly it solved this complex problem. And just like that, Problem 100 is complete.

### A note on correctness

Using this recursive function to solve the problem is a bit cheap unless I prove that it describes all solutions to the problem without including anything extra. Of course, the Project Euler website confirmed that I arrived at the correct solution with this method, so if that’s proof enough for you, feel free to skip this section. But I’m ashamed to admit, I can’t offer you a complete proof. Here’s what I would need to prove for this solution to be watertight for all values: A pair () is a solution to Problem 100 if and only if it satisfies the recursive functions. In symbols, that is

.

A constructive proof that modifies the last ClickHouse query can prove one side of this if-and-only-if statement. Suppose for some pair (). To show that , I added a line to the query that will alert us if it finds a pair by recursion that does not satisfy .

```
SELECT
arrayFold((acc, x) -> multiIf(
round((acc.3 / (acc.3 + acc.4))*((acc.3 - 1) / (acc.3 + acc.4 - 1)), 10) != 0.5, (-100,-100,-100,-100,-100),
(acc.3 + acc.4 <= 1e12), (acc.3, acc.4, 5*acc.3 + 2*acc.4 - 2, 2*acc.3 + acc.4 - 1, acc.5 + 1),
(acc.1, acc.2, acc.3, acc.4, acc.5)),
range(20),
(3::Int64, 1::Int64, 15::Int64, 6::Int64, 2::Int64)
) AS "[blue(n-1), red(n-1), blue(n), red(n), iteration]"
```

What does this mess mean? While `if()`

has the form `if(condition, then do x, else do y)`

, `multiIf()`

has the form `multiIf(condition 1, then do x, else if condition 2, then do y, … , else do z)`

, for some actions x, y, z. Here, my first condition says to check that = `acc.3`

and = `acc.4`

satisfy ; if they don’t, then `multiIf()`

sets every element of the accumulator to -100. If a pair () sneaks in that shouldn’t be included, this red flag will return negative numbers for any subsequent iterations. The rest of the `multiIf()`

statement is the same as the `if()`

statement I showed earlier, so this query also returns the final answer. I ran this and got the same answer as in my query above – most importantly, no negatives.

```
┌─[blue(n-1), red(n-1), blue(n), red(n), iteration]───────┐
│ (129858761425,53789260175,756872327473,313506783024,16) │
└─────────────────────────────────────────────────────────┘
```

This proves that the recursive function doesn’t include any pairs that aren’t solutions to – that’s half of the proof. Now we need to prove the other half: that the recursive function doesn’t skip any solutions to . Then we would be able to conclude that the recursive function includes all the solutions to and no extras. For the second half, a proof by induction is necessary to prove that any solution to has the described recursive relation to the steps before and after it. This would require very tricky separation of variables to get and in terms of and . I leave this as an exercise for unusually motivated readers. If you get to the point where it starts to look a little *too* tricky, I invite you to throw mathematical rigor to the wind and accept the hand-wavy recursive relation that “looks right,” because it fits the first few solutions and gives the correct final answer.

## Conclusion

I never had a brilliant “aha” moment about SQL while playing with these problems. To circle back to the quote from John von Neumann, this project was all about “getting used to” the mechanics of SQL in ClickHouse. The puzzles were an opportunity to look at ClickHouse from various angles and learn how to use features I liked, especially array functions, and how to work around ClickHouse weaknesses, like the limitations of `ngrams()`

.

Years of studying math have instilled the famous mathematician’s laziness (some prefer to call it “efficiency”), which taught me to use logic to find better paths to the same result. This approach applied well to breaking down complex problems into fast queries in ClickHouse, proving that even abstract math can have applications in computer science.

Math will continue to be useful for my final Project Euler problem, covered in Part 2 of this series, where I added another interesting feature to my understanding of ClickHouse: C++ user-defined functions. It will come out shortly, so watch this space.

## Acknowledgements

Thank you to Alsu Giliazova, Alexander Zaitsev, Arthur Passos, and Robert Hodges at Altinity for your help with writing programs and putting this article together.