---
url: 'https://altinity.com/blog/2018-9-28-progress-reports-for-long-running-queries-via-http-protocol'
title: 'ClickHouse&#x00AE; HTTP protocol advanced features'
author:
  name: Vladislav Klimenko
  url: 'https://altinity.com/author/vladkaltinity/'
date: '2018-10-03T16:50:00-07:00'
modified: '2018-10-03T16:50:00-07:00'
type: post
summary: 'ClickHouse HTTP protocol is good and reliable, it is a base for official JDBC, ODBC and many 3rd party drivers and integrations. It can also be used directly with http client libraries. In this article we describe two advanced features...'
categories:
  - Blog
tags:
  - ClickHouse
  - HTTP
  - progress
image: 'https://altinity.com/wp-content/uploads/2020/02/13129-blur-1853262_1920.jpg'
published: true
---

# ClickHouse&#x00AE; HTTP protocol advanced features

ClickHouse HTTP protocol is good and reliable, it is a base for official JDBC, ODBC and many 3rd party drivers and integrations. It can also be used directly with http client libraries. In this article we describe two advanced features of HTTP protocol: execution progress and sessions.

# Introduction

There is an option to receive query execution progress data with the help of `X-ClickHouse-Progress` headers.  

In order to receive progress information we need to enable `send_progress_in_http_headers` setting. This can be done either for each query independently or for the whole session and thus – for all queries within this session.

# Session

Let’s clarify term **session** in ClickHouse HTTP protocol. Session provides possibility to group queries by a set of features. For example, we may specify special settings for the session and all requests within this session would be affected. Sometimes this may be more convenient than to specify those settings to each query. We can group queries into sessions by specifying unique `session_id` – which is an arbitrary string, chosen by the client. Client has to provide `session_id` to the server for the query, which would be bound to the session. To do this, we need to add the `session_id` **GET** parameter to the request. By default, the session is terminated after 60 seconds of inactivity, but we can change this timeout, either in `default_session_timeout` setting in the server configuration xml file, or to add `session_timeout` **GET** parameter to the request. Only one query at a time can be executed within a session, so we can get the following error:

```
HTTP/1.1 500 Internal Server Error
Session is locked by a concurrent client.
```

in case of an attempt to run multiple queries concurrently within one session.

Now, let’s return back to `X-ClickHouse-Progress` headers. In order to understand progress report in full details, we need to cover one more topic.

# Response Buffering

We can enable response buffering on the server side. There are two parameters: `buffer_size` and `wait_end_of_query` provided for this purpose. We can add these parameters as **GET** parameters to the query’s URL.

`buffer_size` determines the number of bytes in the result to buffer in the server memory. If the result body is larger than this threshold, the buffer is written to the HTTP channel as it gets full, and the remaining data is sent directly to the HTTP channel.

To ensure that the entire response is buffered, we can specify `wait_end_of_query=1` parameter. In this case, the data that is not stored in memory will be buffered in a temporary server file.

Buffering is most useful in the following cases:

- To avoid situations where a query processing error occurred after the response code and HTTP headers were sent to the client. HTTP headers are sent at the very beginning of the data transfer to the client, and in case an error is met during long-running query, there is no way no specify HTTP error code (HTTP header already sent), and in this situation, an error message is written at the end of the response body. On the client side, the error can only be detected at the parsing stage, which may be inconvenient. Getting HTTP error code is much more straightforward

- To report query progress. Since query progress is being reported via `X-ClickHouse-Progress` headers, there is a restriction, that all query progress headers have to be reported before actual response body data to be sent. This is general HTTP protocol requirement – all headers have to be sent before response body. So, in order to receive reasonable progress report, we need to ask server to buffer the whole output, to send progress reports with HTTP headers and to send response body afterward.

In case no `wait_end_of_query=1` parameter specified, progress report headers would be sent until `buffer_size` response body size is reached (buffering in progress) and as body data delivery is started to be sent directly to the HTTP channel, there is no possibility to send additional HTTP headers – all headers already sent. Thus we can receive partial progress information only and response body afterward.

Now let’s take a look on examples.

Raw HTTP exchange. Let’s laucnh long-running query on a big data set: `SELECT` sequential numbers from 0 to 50000000 (not including 50000000) in one request. Pay attention to `wait_end_of_query=1` and `send_progress_in_http_headers=1` parameters

**HTTP REQUEST**

```
POST /?wait_end_of_query=1&send_progress_in_http_headers=1 HTTP/1.1
Host: localhost:8123
Content-Length: 44
Content-Type: text/plain

SELECT * from system.numbers limit 50000000
```

**HTTP RESPONSE**

```
HTTP/1.1 200 OK
Date: Fri, 28 Sep 2018 10:14:06 GMT
Connection: Keep-Alive
Content-Type: text/tab-separated-values; charset=UTF-8
Transfer-Encoding: chunked
Keep-Alive: timeout=3
X-ClickHouse-Progress: {"read_rows":"3211264","read_bytes":"25690112","total_rows":"0"}
X-ClickHouse-Progress: {"read_rows":"7077888","read_bytes":"56623104","total_rows":"0"}
X-ClickHouse-Progress: {"read_rows":"10420224","read_bytes":"83361792","total_rows":"0"}
X-ClickHouse-Progress: {"read_rows":"14024704","read_bytes":"112197632","total_rows":"0"}
X-ClickHouse-Progress: {"read_rows":"17891328","read_bytes":"143130624","total_rows":"0"}
X-ClickHouse-Progress: {"read_rows":"21692416","read_bytes":"173539328","total_rows":"0"}
X-ClickHouse-Progress: {"read_rows":"25427968","read_bytes":"203423744","total_rows":"0"}
X-ClickHouse-Progress: {"read_rows":"29294592","read_bytes":"234356736","total_rows":"0"}
X-ClickHouse-Progress: {"read_rows":"33161216","read_bytes":"265289728","total_rows":"0"}
X-ClickHouse-Progress: {"read_rows":"37027840","read_bytes":"296222720","total_rows":"0"}
X-ClickHouse-Progress: {"read_rows":"40894464","read_bytes":"327155712","total_rows":"0"}
X-ClickHouse-Progress: {"read_rows":"44695552","read_bytes":"357564416","total_rows":"0"}
X-ClickHouse-Progress: {"read_rows":"48234496","read_bytes":"385875968","total_rows":"0"}

1000
0
1
2
--- CUT HERE ---
49999998
49999999

0
```

As we can see, there are multiple `X-ClickHouse-Progress` headers responded by the server during data buffering procedure.  

Server sends these headers line-by-line, so client can receive and interpret them. After all, response is buffered on server side, the response body output starts. No headers are possible to be sent from this point, because response body is being delivered to the client. Pay attention to the last item `"total_rows":"0"` – since ClickHouse does not know how many items it has to scan because `system.numbers` is a special case of endless system table, there is no `total_rows` reported. For tables stored with more typical storage engines, such as **MergeTree**, `total_rows` would be reported as well, and we’ll have possibility to see real progress, as **read_rows/total_rows**

**Example:**

**HTTP REQUEST**

```
POST /?wait_end_of_query=1&send_progress_in_http_headers=1 HTTP/1.1
Host: localhost:8123
Content-Length: 92
Content-Type: text/plain

SELECT AVG(toInt64OrZero(MD5(toString(eventTime)))) FROM starexp1 GROUP BY prodTime limit 1
```

**HTTP RESPONSE**

```
HTTP/1.1 200 OK
Date: Tue, 02 Oct 2018 16:34:31 GMT
Connection: Keep-Alive
Content-Type: text/tab-separated-values; charset=UTF-8
X-ClickHouse-Server-Display-Name: ip-172-31-8-99.eu-west-2.compute.internal
Transfer-Encoding: chunked
Keep-Alive: timeout=3
X-ClickHouse-Progress: {"read_rows":"0","read_bytes":"0","total_rows":"2000060416"}
X-ClickHouse-Progress: {"read_rows":"196608","read_bytes":"3145728","total_rows":"2000060416"}
X-ClickHouse-Progress: {"read_rows":"327680","read_bytes":"5242880","total_rows":"2000060416"}
X-ClickHouse-Progress: {"read_rows":"393216","read_bytes":"6291456","total_rows":"2000060416"}
X-ClickHouse-Progress: {"read_rows":"524288","read_bytes":"8388608","total_rows":"2000060416"}
X-ClickHouse-Progress: {"read_rows":"655360","read_bytes":"10485760","total_rows":"2000060416"}
X-ClickHouse-Progress: {"read_rows":"786432","read_bytes":"12582912","total_rows":"2000060416"}
X-ClickHouse-Progress: {"read_rows":"851968","read_bytes":"13631488","total_rows":"2000060416"}
X-ClickHouse-Progress: {"read_rows":"917504","read_bytes":"14680064","total_rows":"2000060416"}
X-ClickHouse-Progress: {"read_rows":"1048576","read_bytes":"16777216","total_rows":"2000060416"}
--- CUT HERE ---
```

As we can see here, ClickHouse has to scan about 2 billions rows in order to get a result.

Let’s take a look on some more examples, with `curl` tool now, for convenience

- Setup session and set `send_progress_in_http_headers=1` for the whole session. Pay attention to `wait_end_of_query=1` specified for the query

```
SESSION_ID=$(date +%s)
echo $SESSION_ID
echo 'SET send_progress_in_http_headers=1' | curl -v "http://localhost:8123/?wait_end_of_query=1&session_id=$SESSION_ID" --data-binary @-
echo 'SELECT * from system.numbers limit 50000000' | curl -v "http://localhost:8123/?wait_end_of_query=1&session_id=$SESSION_ID" --data-binary @- > /dev/null
```

- Setup session and set `send_progress_in_http_headers=1` for the whole session. Pay attention NO `wait_end_of_query=1` is specified for the query, so there possibliy will be some progress report before that response body delivery started

```
SESSION_ID=$(date +%s)
echo $SESSION_ID
echo 'SET send_progress_in_http_headers=1' | curl -v "http://localhost:8123/?session_id=$SESSION_ID" --data-binary @-
echo 'SELECT sleep(3), * from system.numbers limit 500000' | curl -v "http://localhost:8123/?session_id=$SESSION_ID" --data-binary @- > /dev/null
```

- No session specified, one query

```
echo 'SELECT sleep(3), * from system.numbers limit 1000000' | curl -v "http://localhost:8123/?wait_end_of_query=1&send_progress_in_http_headers=1" --data-binary @- > /dev/null
```

# Conclusion

So, as we can see, that ClickHouse provides tools for buffering, progress reports and session management with HTTP protocol.

