Taking a Closer Look at ClickHouse Live View Tables

By Vitaliy Zakaznikov on December 5th, 2019

Taking a Closer Look at ClickHouse Live View Tables

Vitaliy Zakaznikov ClickHouseLive Views

 

Dec 5, 2019

In the couple of previous blog posts, I have introduced Live Views tables and covered basic usage. Now, in this post, we will take a closer look at Live View tables.
Specifically, we will look at the options available for the WATCH query, then
introduce temporary Live Views, as well
as look at the new JSONEachRowWithProgress format.

Setting Things Up

As before we will be using the 19.16.3 release and will use two instances
of clickhouse-client. Don’t forget that we will need to enable the
allow_experimental_live_view setting in both sessions.

ClickHouse client version 19.16.3.6 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 19.16.3 revision 54427.

:) SET allow_experimental_live_view = 1

To keep the setup to the minimum, we will again use a very simple
source table

CREATE TABLE myints (a Int32) Engine=Memory

and pre-populate it with some initial data

INSERT INTO myints VALUES (1),(2),(3)

and use it in a simple Live View table.

CREATE LIVE VIEW lv AS SELECT sum(a) FROM myints

WATCH Query Options

The WATCH query enables us to watch a live stream of query results.
Without any options, the query returns the current query result
and infinitely waits for changes to the query result.

:) WATCH lv

WATCH lv

┌─sum(a)─┬─_version─┐
│      6 │        1 │
└────────┴──────────┘
↙ Progress: 1.00 rows, 16.00 B (8.93 rows/s., 142.95 B/s.)

To abort the WATCH query you can use Ctrl-C.

WATCH query LIMIT

However, an infinite stream of query result updates in not always necessary.
Sometimes you just want to get a fixed number of changes, one of the most
common cases being just the need to get the next update. To control the number of
query result updates that you would like to receive you can use the LIMIT
option where the value of the LIMIT can be 0 to N. The value of 0
indicates that the WATCH query should not wait for any new query results
and therefore will return immediately once query is evaluated.
One way to think of LIMIT 0 is as if you are setting the non-blocking flag.

Let’s give it a try and see that the WATCH query with the LIMIT 0
returns immediately.

:) WATCH lv LIMIT 0

WATCH lv
LIMIT 0

┌─sum(a)─┬─_version─┐
│      6 │        1 │
└────────┴──────────┘

1 rows in set. Elapsed: 0.002 sec. 

user-node :)

However, using the WATCH query with the LIMIT set to 0 is not very useful
as you might as well use the SELECT instead.

The most common value of LIMIT is just 1. In many use cases
you do not want to receive an infinite number of query result changes
but only the next one. With the LIMIT set to 1 you will receive
the current query result and will be notified only once when
the query result changes.

Again, let’s give a try. Execute WATCH query in one client and
insert one more row into our source table for the WATCH query to complete.

Once you have the WATCH query running you can insert more data as follows.

INSERT INTO myints VALUES (4)

The client that has the WATCH query running should show the following.

:) WATCH lv LIMIT 1

WATCH lv
LIMIT 1

┌─sum(a)─┬─_version─┐
│      6 │        1 │
└────────┴──────────┘
┌─sum(a)─┬─_version─┐
│     10 │        2 │
└────────┴──────────┘

2 rows in set. Elapsed: 20.346 sec.

As you can see, the WATCH query only provided one update as expected.
You might ask why is LIMIT 1 very common? The answer is simple:
it allows you to get notified the next time the query result has changed.
One such use case is watching a table that stores log messages.
In this case we can use count() for our Live View. A
separate SELECT query can be issued once we detect that the count() has changed,
indicating that there are new log messages available to read from our source table.

WATCH query EVENTS

A short form of the WATCH query can be constructed if you
specify EVENTS after the table name. In this case,
instead of the query result, you will just get the latest query
result version.

Note that the stored query will still be evaluated
every time source table data changes.

Let’s try it and confirm that with the EVENTS we only
get the current query result version without the query result itself.

:) WATCH lv EVENTS LIMIT 0

WATCH lv EVENTS
LIMIT 0

┌─version─┐
│       2 │
└─────────┘

The EVENTS is useful when you only want to
get short notifications that you can use as triggers
about changes to the query result without
looking at the actual result.

Temporary Live Views

In some use cases, for example in a dynamic web application,
Live Views can be created dynamically to display specific
live data to the user. However, it is common in such cases
that the application can exit unexpectedly without
cleaning up its Live Views. For such cases,
a TEMPORARY Live View can be used.

A TEMPORARY Live View can be created using
the CREATE TEMPORARY LIVE VIEW statement. This statement will create
a Live View which will be automatically deleted when no
users are watching the table after the timeout specified by the
temporary_live_view_timeout setting. By default
the value of temporary_live_view_timeout setting is set
to five seconds.

It is important to note that compared to a regular temporary
table, a TEMPORARY Live View table lives outside
the scope of a single session. This is done because sometimes clients
that create Live Views can disappear and reappear due to various
factors such as unstable network connection but as long as the client comes back within the timeout period the Live View table does not have to be recreated.

Because a TEMPORARY Live View table lives outside the scope
of a single session, it means that multiple clients can use the same
temporary Live View. The Live View table will automatically be deleted
after the timeout once all clients stop watching the table.

Let’s see TEMPORARY Live View in action by creating one

CREATE TEMPORARY LIVE VIEW temp_lv AS SELECT sum(a) FROM myints

and immediately running WATCH temp_lv.

:) WATCH temp_lv

WATCH temp_lv

┌─sum(a)─┬─_version─┐
│     10 │        1 │
└────────┴──────────┘

Note that the table will be deleted in five seconds so you need to be quick.

Now, abort the WATCH query using Ctrl-C and try to re-execute WATCH query after
five seconds. You should see the following error.

:) WATCH temp_lv

WATCH temp_lv

Received exception from server (version 19.16.3):
Code: 60. DB::Exception: Received from localhost:9000. DB::Exception: Table default.temp_lv doesn't exist..

As you can see, the Live View table has been deleted because the timeout after the last active user has elapsed. The fact that temporary Live Views are automatically deleted when no more clients are watching it helps with reducing resource consumption on the server. Remember, Live Views store their partial results in memory to always have the latest result ready for any client who might ask for it but it does come at a cost. First, similarly to the Materialized Views, Live View’s stored query is executed on each insert block and therefore inserts into the source table are blocked until processing completes. Second,
mergeable blocks for the query result are kept in memory and therefore each active Live View will consume memory. The amount of memory depends on the query result. Therefore, keeping the number of Live Views to a minimum will help speed up inserts and will reduce memory
consumption on the server.

Naming Temporary Live Views

In a use case where temporary Live Views are created dynamically and are shared between
different clients the question of naming temporary Live Views can arise.
The recommended solution to this problem is to use the SHA1 of the query
that is used to create the Live View table as a part of the table name.

For example, a temporary Live View for the SELECT 1 query can be created using
the lv_<SHA1> convention with the addition of the IF NOT EXISTS clause.

CREATE TEMPORARY LIVE VIEW IF NOT EXISTS lv_42364a017b73ef516a0eca9827e6fa00623257ee AS SELECT 1

Using this convention will allow other clients to easily reuse this Live View
or create a new one if it does not already exist.

Using Live Views Over HTTP Interface

The most common way that clients connect to Live View tables is via an HTTP interface.
The HTTP interface provides ease of use and portability across many systems and programming
languages and it is especially useful for web applications.

Let’s see how we can use an HTTP interface to watch our previously created lv
table using the curl utility.
Note, that for the WATCH query to work over an HTTP interface we must not
forget to set the allow_experimental_live_view setting for the connection.

echo 'WATCH lv LIMIT 0' | curl 'http://localhost:8123/?allow_experimental_live_view=1' --data-binary @-
10    2

Now, let’s re-run the same query but with the LIMIT set to 1.

echo 'WATCH lv LIMIT 1' | curl 'http://localhost:8123/?allow_experimental_live_view=1' --data-binary @-
10    2

As you can see the connection remains open and curl does not return. Instead,
the connection will block until the query result changes.
So let’s insert another row into our source table so that a new query result is pushed
to our HTTP client.

INSERT INTO myints VALUES (5)

You should observe that your curl command exits and the result should be as below.

$ echo 'WATCH lv LIMIT 1' | curl 'http://localhost:8123/?allow_experimental_live_view=1' --data-binary @-
10    2
15    3
$

Wait, did you see that? ClickHouse has just pushed a notification to the outside
world! No longer do you have to re-send the same query over and over again to your
database server and bother it with unnecessary requests. With Live View tables, ClickHouse
can now notify you that query result has changed and provide you with the new result
at the same time.

Keeping Long-Lived Connections Alive

Because watching Live View tables requires a long-lived connection, Live View
tables support heartbeats to keep the connection alive.
The interval between heartbeats is set using the live_view_heartbeat_interval
setting. By default, the live_view_heartbeat_interval is set to fifteen seconds.
Every heartbeat interval the WATCH query will provide a progress notification
that you can observe when using clickhouse-client. Let’s start a WATCH query
again in the clickhouse-client and this time we will pay attention to the
progress notifications that are displayed below the query result.

:) WATCH lv

WATCH lv

┌─sum(a)─┬─_version─┐
│     15 │        3 │
└────────┴──────────┘
↗ Progress: 1.00 rows, 16.00 B (9.70 rows/s., 155.27 B/s.)

Do you see that arrow ? before the Progress message? Pay attention and you should
see that the arrow keeps turning once in a while. How often does it turn?
You guessed it, every live_view_heartbeat_interval seconds or whenever a new query result
is returned. These periodic progress events are what the Live View table uses to keep
the long-lived connection alive.

New JSONEachRowWithProgress Format

When Live View tables were added to ClickHouse a new output format was added as well.
The new format is JSONEachRowWithProgress and is used to facilitate using
Live View tables over the HTTP interface using the HTTP long polling technique.

Let’s again use curl command to watch our Live View. This
time lets use the JSONEachRowWithProgress format.

First, execute the WATCH query with the LIMIT 0.

$ echo 'WATCH lv LIMIT 0 FORMAT JSONEachRowWithProgress' | curl 'http://localhost:8123/?allow_experimental_live_view=1' --data-binary @-
{"progress":{"read_rows":"1","read_bytes":"16","written_rows":"0","written_bytes":"0","total_rows_to_read":"0"}}
{"row":{"sum(a)":"15","_version":"3"}}
{"progress":{"read_rows":"1","read_bytes":"16","written_rows":"0","written_bytes":"0","total_rows_to_read":"0"}}
$

Now re-execute the same query but using the old JSONEachRow format so that we can
compare them.

$ echo 'WATCH lv LIMIT 0 FORMAT JSONEachRow' | curl 'http://localhost:8123/?allow_experimental_live_view=1' --data-binary @-
{"sum(a)":"15","_version":"3"}
$

If you compare the two outputs there should be no surprises. The JSONEachRowWithProgress differ
from the old JSONEachRow by having extra progress messages added to the output.
But why do we need this new output format? The answer is simple, to keep the long-lived HTTP connections
alive.

Let’s see how it works when our WATCH query is blocked waiting for new query results.
Now we will use the LIMIT 1 to block and wait for the new result.

We should see the following

$ echo 'WATCH lv LIMIT 1 FORMAT JSONEachRowWithProgress' | curl 'http://localhost:8123/?allow_experimental_live_view=1' --data-binary @-
{"progress":{"read_rows":"1","read_bytes":"16","written_rows":"0","written_bytes":"0","total_rows_to_read":"0"}}
{"row":{"sum(a)":"15","_version":"3"}}
{"progress":{"read_rows":"1","read_bytes":"16","written_rows":"0","written_bytes":"0","total_rows_to_read":"0"}}

and then every live_view_heartbeat_interval seconds we should see progress messages
added to the output.

{"progress":{"read_rows":"1","read_bytes":"16","written_rows":"0","written_bytes":"0","total_rows_to_read":"0"}}
{"progress":{"read_rows":"1","read_bytes":"16","written_rows":"0","written_bytes":"0","total_rows_to_read":"0"}}
...

These progress messages are what keeps the long-lived HTTP connection alive until
the query result changes. Without them, the HTTP connection would just timeout.

Conclusion

We have taken a closer look at Live View tables. We looked at the WATCH query options,
introduced temporary Live Views, and looked at the new JSONEachRowWithProgress format.
All these pieces should enable you to start putting Live View tables into practice.
As always until the next time, happy viewing using Live View tables!

Leave a Reply