What’s new in ClickHouse Altinity Stable Release 20.8.7.15?

Several months ago we certified the 20.3 ClickHouse release as Altinity Stable. Since then we have worked on newer releases and run them in-house. All recent articles and benchmarks in our blog, including benchmarks against RedShift and S3 integration articles, have been powered by ClickHouse 20.8 or newer versions. We tested new functionality to make sure the quality of user requested ClickHouse features are top notch. We are confident in certifying ClickHouse 20.8.7.15 as an Altinity Stable release.

This release is a significant step forward since the previous Altinity Stable release. It includes 1584 pull requests from 235 contributors with many new features and performance improvements! Please look below for detailed release notes.

Major new features since the previous stable release 20.3.19.4

A new release introduces a lot of changes and new features. These are organized by feature in the Appendix, so refer to this section for more detail. The following new features are worth mentioning on the front page:

Security features:
- RBAC and SQL management for users, roles, grants etc.
- LDAP authentication
- Kerberos authentication in Kafka Engine
Dictionary features:
- direct and ssd_cache layouts
- Redis, MongoDB, Cassandra sources
- Automatic query rewrite for joins (see an example here)
- Experimental polygon dictionaries
MergeTree features:
- S3 disk for MergeTree tables! It is still an experimental feature, see our article for more detail.
- New TTL types: TTL DELETE WHERE and TTL GROUP BY.
- In-memory MergeTree parts
Integrations:
- PostgreSQL wire protocol
- RabbitMQ and MongoDB storage engines
- Tencent Cloud Object Storage (COS)
- Highly experimental MaterializeMySQL engine that implements MySQL replica in ClickHouse
SQL compatibility:
- New Int128, (U)Int256, Decimal256 extended precision data types
- Aliases for standard SQL types
- EXPLAIN statement!
- Merge join improvements
Custom HTTP handlers
clickhouse-copier underwent extensive updates and improvements

As usual with ClickHouse, there are many performance and operational improvements in different server components.

Backward Incompatible Changes

The following changes are backward incompatible and require user attention during an upgrade:

Aggregate functions states with Nullable arguments may produce different / incompatible types.
Gorilla, Delta and DoubleDelta codecs can not be used anymore on data types of variable size (like strings).
System tables (e.g. system.query_log, system.trace_log, system.metric_log) are using compact data part format for parts smaller than 10 MiB in size (this is almost always the case). Compact data part format is supported since version 20.3.
- WARNING: If you have to downgrade from 20.8 to version prior 20.3, you should manually delete table data for system logs in /var/lib/clickhouse/data/system/.
Deprecate special printing of zero Date/DateTime values as ‘0000-00-00’ and ‘0000-00-00 00:00:00’. Now it is printed as ‘1970-01-01’ and ‘1970-01-01 00:00:00’ respectively.

Upgrade Notes

There were several changes between versions that may affect the rolling upgrade of big clusters. Upgrading only part of the cluster is not recommended. Note the following:

20.3 two-level aggregation is not compatible with 20.4+.
Data will not be fully aggregated for queries that are processed using the two-level aggregation algorithm. This algorithm should be disabled before upgrading if the risk is high in your environment. See group_by_two_level_threshold and group_by_two_level_threshold_bytes settings.
zstd library has been updated in 20.5. While it is not a problem for ClickHouse in general, it may result in inconsistent parts when several replicas merge parts independently, and will force ClickHouse to download merged parts to make sure they are byte-identical (which will lead to extra traffic between nodes). The first node to complete the merge will register the part in ZooKeeper, and the other nodes will download the part if their checksums are different. There will be no data loss; conflicts will disappear once all replicas are upgraded.
The following settings lead to incompatibility in distributed queries when only a subset of shards are upgraded and others are not:
- optimize_move_functions_out_of_any
- optimize_arithmetic_operations_in_aggregate_functions
- optimize_injective_functions_inside_uniq
- optimize_if_transform_strings_to_enum
When pre-20.5 and 20.5+ versions run as replicas “Part … intersects previous part” errors are possible due to change in leadership selection protocol. If you need to run pre-20.5 and 20.5+ versions in the same cluster make sure the old version can not become a leader. This can be configured via replicated_can_become_leader merge tree setting globally or on a table level.

Other Important Changes

All replicas are now ‘leaders’. This allows multiple replicas to assign merges, mutations, partition drop, move and replace concurrently. Now system.replicas.is_leader is 1 for all tables on all nodes. If you rely on this value for some operations, your processes must be revised. The LeaderElection and LeaderReplica metrics were removed.
New setting max_server_memory_usage limits total memory usage of the server. The setting max_memory_usage_for_all_queries is now obsolete and does nothing. You might see an exception ‘Memory limit (total) exceeded‘.Increasing the limit requires a restart.
The log_queries setting is now enabled by default. You might want to disable this setting for some profiles if you don’t want their queries logged into the system.query_log table.
Several new optimizations are enabled by default. While they typically improve performance sometimes regressions are possible in corner cases:
- optimize_aggregators_of_group_by_keys
- optimize_arithmetic_operations_in_aggregate_functions
- optimize_duplicate_order_by_and_distinct
- optimize_group_by_function_keys
- optimize_injective_functions_inside_uniq
- optimize_move_functions_out_of_any
  optimize_monotonous_functions_in_order_by
- optimize_redundant_functions_in_order_by
- optimize_trivial_insert_select
- partial_merge_join_optimizations

ClickHouse Altinity Stable release is based on community version. It can be downloaded from repo.clickhouse.tech, and RPM packages are available from Altinity Stable Repository.

Please contact us at info@altinity.com if you experience any issues with the upgrade.

——————

Appendix

New data types

All standard SQL data types, like BIGINT, VARCHAR, etc. are mapped to ClickHouse data types. See the system.data_type_families table.
Experimental data types Point, Ring, Polygon, MultiPolygon
Int128, Int256, UInt256, Decimal256 extended precision data types

New formats

Arrow
ArrowStream
JSONAsString — this one allows to store the full JSON row unparsed and process later using ClickHouse JSONExtract functions
Markdown
MsgPack
PosgreSQLWire
Regexp – allows to parse any non-standard textish formats (for example logs) by applying regular expression to every line.

New functions

DateTime functions:
- fromUnixTimestamp / FROM_UNIXTIME
- dateTrunc / date_trunc
- fromUnixTimestamp64Micro
- fromUnixTimestamp64Milli
- fromUnixTimestamp64Nano
- toUnixTimestamp64Micro
- toUnixTimestamp64Milli
- toUnixTimestamp64Nano
String functions:
- extractGroups, extractAllGroupsHorizontal, extractAllGroupsVertical (alias to extractAllGroups)
- Ilike, notILike (also new SQL operator ILIKE has been added)
Array functions:
- arrayReduceInRanges
- hasSubstr
Machine-learning and statistics:
- arrayAUC — area Under the ROC Curve
- bayesAB — bayesian A/B Testing Calculator
- medianExactHigh, medianExactLow
- quantileExactHigh, quantileExactLow, quantilesExactHigh, quantilesExactLow
JSON
- JSONExtractKeysAndValuesRaw
URL functions
- port
- netloc
Aggregate functions
- -Distinct — new aggregate function combinator
- initializeAggregation — allow initialization of AggregateFunction state
Functions to work with key-value pairs (aka maps):
- mapAdd, mapSubtract, maxMap, minMap (extending the API of sumMap).
- SimpleAggregateFunction now supports minMap, maxMap, sumMap.
- The new data type Map is currently in development that should make it even more intuitive.
MySQL integration
- DATABASE (alias for currentDatabase)
- globalVariable (stub)
New types related:
- toDecimal256, toDecimal256OrNull, toDecimal256OrZero
- toInt128, toInt128OrNull, toInt128OrZero
- toInt256, toInt256OrNull, toInt256OrZero
- toUInt256, toUInt256OrNull, toUInt256OrZero
Join engine:
- joinGetOrNull
Random generators:
- fuzzBits (used for CI)
- rand32 (alias for rand)
- randomFixedString, randomString, randomStringUTF8
Serialize columns to some text format
- formatRow, formatRowNoNewline
Settings & custom settings:
- getSetting
Check types
- isDecimalOverflow
- defaultValueOfTypeName
- isZeroOrNull
Helper functions to analyze system.query_log
- normalizeQuery, normalizedQueryHash
Other:
- countDigits
- mod (alias for modulo)
Special (used in CI / by ClickHouse developers):
- hasThreadFuzzer
- buildId
Related to internals of execution of of IN / GLOBAL IN operator (also see transform_null_in setting)
- notNullIn, notNullInIgnoreSet, nullIn, nullInIgnoreSet, inIgnoreSet, notInIgnoreSet
- globalNotNullIn, globalNotNullInIgnoreSet, globalNullIn, globalNullInIgnoreSet, globalInIgnoreSet, globalNotInIgnoreSet

New table functions

cosn — integration with Tencent Cloud Object Storage (COS)

New table engines

MongoDB
RabbitMQ
COSN

New metrics and events

system.metrics:

BackgroundBufferFlushSchedulePoolTask
BackgroundDistributedSchedulePoolTask
MemoryTrackingInBackgroundBufferFlushSchedulePool
MemoryTrackingInBackgroundDistributedSchedulePool
PostgreSQLConnection

system.events:

OSCPUVirtualTimeMicroseconds
OSCPUWaitMicroseconds
OSReadBytes, OSReadChars
OSWriteBytes, OSWriteChars
QueryTimeMicroseconds

New system tables (mostly for RBAC introspection)

current_roles
distribution_queue
enabled_roles
grants
licenses
privileges
quota_limits
quotas_usage
row_policies
role_grants
roles
settings_profile_elements
settings_profiles
time_zones
user_directories
users

New columns in system tables

columns.position, parts_columns.position
databases.uuid, dictionaries.uuid, tables.uuid
disks.type
merge_tree_settings.type, settings.type
parts: delete_ttl_info_min, delete_ttl_info_max, move_ttl_info_min, move_ttl_info_max, move_ttl_info.expression
query_log.current_database
storage_policies.volume_type
tables: total_rows, total_bytes, lifetime_rows, lifetime_bytes
trace_log: trace_type, size, timestamp_ns

system.merge_tree_settings added/changed

Setting	Old value	New value	Description
allow_nullable_key		0	Allow Nullable types as primary keys.
always_fetch_merged_part		0	If true, replicas never merge parts and always download merged parts from other replicas.
disable_background_merges	0		REMOVED
enable_mixed_granularity_parts	0	1	Enable parts with adaptive and non-adaptive granularity at the same time
in_memory_parts_enable_wal		1	Whether to write blocks in Native format to write-ahead-log before creation in-memory part
lock_acquire_timeout_for_background_operations		120	For background operations like merges, mutations etc. How many seconds before failing to acquire table locks.
max_part_loading_threads		auto	max_part_loading_threads
max_part_removal_threads		auto	The number of threads for concurrent removal of inactive data parts. One is usually enough, but in ‘Google Compute Environment SSD Persistent Disks’ file removal (unlink) operation is extraordinarily slow and you probably have to increase this number (recommended is up to 16).
max_replicated_logs_to_keep	10000	1000	How many records may be in log, if there is an inactive replica.
min_replicated_logs_to_keep	100	10	Keep about this number of last records in ZooKeeper log, even if they are obsolete. It doesn’t affect work of tables: used only to diagnose ZooKeeper log before cleaning.
min_bytes_for_compact_part		0	Minimal uncompressed size in bytes to create part in compact format instead of saving it in RAM. If non-zero enables in-memory parts.
min_rows_for_compact_part		0	Minimal number of rows to create part in compact format instead of saving it in RAM. If non-zero enables in-memory parts.
min_index_granularity_bytes		1024	Minimum amount of bytes in single granule.
min_relative_delay_to_measure		120	Calculate relative replica delay only if absolute delay is not less than this value.
write_ahead_log_max_bytes		1073741824	Rotate WAL, if it exceeds that amount of bytes

system.settings added/changed

Setting	Old value	New value	Description
allow_experimental_bigint_types		0	Allow Int128, Int256, UInt256 and Decimal256 types
allow_experimental_database_materialize_mysql		1	Allow database creation with Engine=MaterializeMySQL(…) (Highly experimental yet)
allow_experimental_geo_types		0	Allow geo data types such as Point, Ring, Polygon, MultiPolygon
allow_non_metadata_alters		1	Allow to execute alters which affects table’s metadata and data on disk.
allow_push_predicate_when_subquery_contains_with		1	Allows push predicate when subquery contains WITH clause
allow_suspicious_codecs		0	If it is set to true, allow specifying meaningless compression codecs.
alter_partition_verbose_result		0	Output information about affected parts. Currently works only for FREEZE and ATTACH commands.
background_buffer_flush_schedule_pool_size		16	Number of threads performing background flush for tables with Buffer engine. Only has meaning at server startup.
background_distributed_schedule_pool_size		16	Number of threads performing background tasks for distributed sends. Only has meaning at server startup.
cast_keep_nullable		0	CAST operator keep Nullable for result data type
data_type_default_nullable		0	Data types without NULL or NOT NULL will make Nullable
default_database_engine		Ordinary	Default database engine
distributed_replica_max_ignored_errors		0	Number of errors that will be ignored while choosing replicas
force_optimize_skip_unused_shards_nesting		0	Same as force_optimize_skip_unused_shards, but accept nesting level until which it will work
format_regexp			Regular expression (for Regexp format)
format_regexp_escaping_rule		Escaped	Field escaping rule (for Regexp format)
format_regexp_skip_unmatched		0	Skip lines unmatched by regular expression (for Regexp format)
function_implementation			Choose function implementation for specific target or variant (experimental). If empty, enable all of them.
input_format_avro_allow_missing_fields		0	For Avro/AvroConfluent format: when field is not found in schema use default value instead of error
input_format_with_names_use_header	0	1	For TSVWithNames and CSVWithNames input formats this controls whether the format parser is to assume that column data appear in the input exactly as they are specified in the header.
insert_in_memory_parts_timeout		600000
join_on_disk_max_files_to_merge		64	For MergeJoin on disk, set how many files are allowed to sort simultaneously. The larger the value the more memory is used and less disk I/O needed. Minimum is 2.
lock_acquire_timeout		120	How long locking request should wait before failing
log_queries_min_type		QUERY_START	Minimal type in query_log to log, possible values (from low to high): QUERY_START, QUERY_FINISH, EXCEPTION_BEFORE_START, EXCEPTION_WHILE_PROCESSING.
materialize_ttl_after_modify		1	Apply TTL for old data, after ALTER MODIFY TTL query
max_block_size	65536	65505	Maximum block size for reading
max_final_threads		16	The maximum number of threads to read from the table with FINAL.
max_insert_block_size	1048576	1048545	The maximum block size for insertion, if we control the creation of blocks for insertion
max_joined_block_size_rows	65536	65505	Maximum block size for JOIN result (if join algorithm supports it). 0 means unlimited.
max_untracked_memory		4194304	Small allocations and deallocations are grouped in thread local variables and tracked or profiled only when the amount (in absolute value) becomes larger than specified value. If the value is higher than ‘memory_profiler_step’ it will be effectively lowered to ‘memory_profiler_step’.
memory_profiler_sample_probability		0	Collect random allocations and deallocations and write them into system.trace_log with ‘MemorySample’ trace_type. The probability is for every alloc/free regardless to the size of the allocation. Note that sampling happens only when the amount of untracked memory exceeds ‘max_untracked_memory’. You may want to set ‘max_untracked_memory’ to 0 for extra fine grained sampling.
metrics_perf_events_enabled		0	If enabled, some of the perf events will be measured throughout queries’ execution.
metrics_perf_events_list			Comma separated list of perf metrics that will be measured throughout queries’ execution. Empty means all events.
min_chunk_bytes_for_parallel_parsing	1048576	10485760	The minimum chunk size in bytes, which each thread will parse in parallel.
min_insert_block_size_bytes	268435456	268427520	Squash blocks passed to INSERT query to specified size in bytes, if blocks are not big enough.
min_insert_block_size_bytes_for_materialized_views		0	Like min_insert_block_size_bytes, but applied only during pushing to MATERIALIZED VIEW (default: min_insert_block_size_bytes)
min_insert_block_size_rows	1048576	1048545	Squash blocks passed to INSERT query to specified size in rows, if blocks are not big enough.
min_insert_block_size_rows_for_materialized_views		0	Like min_insert_block_size_rows, but applied only during pushing to MATERIALIZED VIEW (default: min_insert_block_size_rows)
multiple_joins_rewriter_version		2	1 or 2. Second rewriter version knows about table columns and keeps not clashing names as is.
optimize_aggregation_in_order		0	Enable GROUP BY optimization for aggregating data in corresponding order in MergeTree tables.
optimize_aggregators_of_group_by_keys		1	Eliminates min/max/any/anyLast aggregators of GROUP BY keys in SELECT section
optimize_arithmetic_operations_in_aggregate_functions		1	Move arithmetic operations out of aggregation functions
optimize_distributed_group_by_sharding_key		0	Optimize GROUP BY sharding_key queries (by avoiding costly aggregation on the initiator server)
optimize_duplicate_order_by_and_distinct		1	Remove duplicate ORDER BY and DISTINCT if it’s possible
optimize_group_by_function_keys		1	Eliminates functions of other keys in GROUP BY section
optimize_if_chain_to_multiif		0	Replace if(cond1, then1, if(cond2, …)) chains to multiIf. Currently it’s not beneficial for numeric types.
optimize_if_transform_strings_to_enum		0	Replaces string-type arguments in If and Transform to enum. Disabled by default cause it could make inconsistent change in distributed query that would lead to its fail
optimize_injective_functions_inside_uniq		1	Delete injective functions of one argument inside uniq*() functions
optimize_monotonous_functions_in_order_by		1	Replace monotonous function with its argument in ORDER BY
optimize_move_functions_out_of_any		1	Move functions out of aggregate functions ‘any’, ‘anyLast’
optimize_redundant_functions_in_order_by		1	Remove functions from ORDER BY if its argument is also in ORDER BY
optimize_skip_unused_shards_nesting		0	Same as optimize_skip_unused_shards, but accept nesting level until which it will work
optimize_trivial_insert_select		1	Optimize trivial ‘INSERT INTO table SELECT … FROM TABLES’ query
output_format_enable_streaming		0	Enable streaming in output formats that supports it
output_format_pretty_grid_charset		UTF-8	Charset for printing grid borders. Available charsets: ASCII, UTF-8 (default)
output_format_pretty_max_value_width		10000	Maximum width of value to display in Pretty formats. If greater – it will be cut.
parallel_distributed_insert_select		0	Process distributed INSERT SELECT query in the same cluster on local tables on every shard, if 1 SELECT is executed on each shard, if 2 SELECT and INSERT is executed on each shard
partial_merge_join_left_table_buffer_bytes		32000000	If not 0, group left table blocks in bigger ones for the left-side table in partial merge join. It uses up to 2x of specified memory per joining thread. The current version works only with ‘partial_merge_join_optimizations = 1’.
partial_merge_join_optimizations	0	1	Enable optimizations in partial merge join
partial_merge_join_rows_in_right_blocks	10000	65536	Split right-hand joining data in blocks of specified size. It’s a portion of data indexed by min-max values and possibly unloaded on disk.
rabbitmq_max_wait_ms		5000	The wait time for reading from RabbitMQ before retry.
read_in_order_two_level_merge_threshold		100	Minimal number of parts to read to run preliminary merge step during multithread reading in order of primary key.
send_logs_level	none	fatal	Send server text logs with specified minimum level to client. Valid values: ‘trace’, ‘debug’, ‘information’, ‘warning’, ‘error’, ‘fatal’, ‘none’
show_table_uuid_in_table_create_query_if_not_nil		0	For tables in databases with Engine=Atomic show UUID of the table in its CREATE query.
temporary_files_codec		LZ4	Set compression codec for temporary files (sort and join on disk). I.e. LZ4, NONE.
transform_null_in		0	If enabled, NULL values will be matched with ‘IN’ operator as if they are considered equal.
validate_polygons		1	Throw exception if polygon is invalid in function pointInPolygon (e.g. self-tangent, self-intersecting). If the setting is false, the function will accept invalid polygons but may silently return wrong result.

———-

Also, please refer to the release notes from the development team available at the following URLs:

Get in touch with ClickHouse experts.