ClickHouse® Altinity Stable Release™ 20.8.7.15
Several months ago we certified the 20.3 ClickHouse release as Altinity Stable. Since then we have worked on newer releases and run them in-house. All recent articles and benchmarks in our blog, including benchmarks against RedShift and S3 integration articles, have been powered by ClickHouse 20.8 or newer versions. We tested new functionality to make sure the quality of user requested ClickHouse features are top notch. We are confident in certifying ClickHouse 20.8.7.15 as an Altinity Stable release.
This release is a significant step forward since the previous Altinity Stable release. It includes 1584 pull requests from 235 contributors with many new features and performance improvements! Please look below for detailed release notes.
Major new features since the previous stable release 20.3.19.4
A new release introduces a lot of changes and new features. These are organized by feature in the Appendix, so refer to this section for more detail. The following new features are worth mentioning on the front page:
- Security features:
- RBAC and SQL management for users, roles, grants etc.
- LDAP authentication
- Kerberos authentication in Kafka Engine
- Dictionary features:
direct
andssd_cache
layouts- Redis, MongoDB, Cassandra sources
- Automatic query rewrite for joins (see an example here)
- Experimental polygon dictionaries
- MergeTree features:
- S3 disk for MergeTree tables! It is still an experimental feature, see our article for more detail.
- New TTL types:
TTL DELETE WHERE
andTTL GROUP BY
. - In-memory MergeTree parts
- Integrations:
- PostgreSQL wire protocol
RabbitMQ
andMongoDB
storage engines- Tencent Cloud Object Storage (COS)
- Highly experimental
MaterializeMySQL
engine that implements MySQL replica in ClickHouse
- SQL compatibility:
- New
Int128, (U)Int256, Decimal256
extended precision data types - Aliases for standard SQL types
EXPLAIN
statement!- Merge join improvements
- New
- Custom HTTP handlers
- clickhouse-copier underwent extensive updates and improvements
As usual with ClickHouse, there are many performance and operational improvements in different server components.
Backward Incompatible Changes
The following changes are backward incompatible and require user attention during an upgrade:
- Aggregate functions states with Nullable arguments may produce different / incompatible types.
Gorilla
,Delta
andDoubleDelta
codecs can not be used anymore on data types of variable size (like strings).- System tables (e.g.
system.query_log, system.trace_log, system.metric_log
) are using compact data part format for parts smaller than 10 MiB in size (this is almost always the case). Compact data part format is supported since version 20.3.- WARNING: If you have to downgrade from 20.8 to version prior 20.3, you should manually delete table data for system logs in /var/lib/clickhouse/data/system/.
- Deprecate special printing of zero Date/DateTime values as
‘0000-00-00’
and‘0000-00-00 00:00:00’
. Now it is printed as‘1970-01-01’
and‘1970-01-01 00:00:00’
respectively.
Upgrade Notes
There were several changes between versions that may affect the rolling upgrade of big clusters. Upgrading only part of the cluster is not recommended. Note the following:
- 20.3 two-level aggregation is not compatible with 20.4+.
Data will not be fully aggregated for queries that are processed using the two-level aggregation algorithm. This algorithm should be disabled before upgrading if the risk is high in your environment. Seegroup_by_two_level_threshold
andgroup_by_two_level_threshold_bytes
settings. - zstd library has been updated in 20.5. While it is not a problem for ClickHouse in general, it may result in inconsistent parts when several replicas merge parts independently, and will force ClickHouse to download merged parts to make sure they are byte-identical (which will lead to extra traffic between nodes). The first node to complete the merge will register the part in ZooKeeper, and the other nodes will download the part if their checksums are different. There will be no data loss; conflicts will disappear once all replicas are upgraded.
- The following settings lead to incompatibility in distributed queries when only a subset of shards are upgraded and others are not:
optimize_move_functions_out_of_any
optimize_arithmetic_operations_in_aggregate_functions
optimize_injective_functions_inside_uniq
optimize_if_transform_strings_to_enum
- When pre-20.5 and 20.5+ versions run as replicas “Part … intersects previous part” errors are possible due to change in leadership selection protocol. If you need to run pre-20.5 and 20.5+ versions in the same cluster make sure the old version can not become a leader. This can be configured via
replicated_can_become_leader
merge tree setting globally or on a table level.
Other Important Changes
- All replicas are now ‘leaders’. This allows multiple replicas to assign merges, mutations, partition drop, move and replace concurrently. Now
system.replicas.is_leader
is 1 for all tables on all nodes. If you rely on this value for some operations, your processes must be revised. The LeaderElection and LeaderReplica metrics were removed. - New setting
max_server_memory_usage
limits total memory usage of the server. The settingmax_memory_usage_for_all_queries
is now obsolete and does nothing. You might see an exception ‘Memory limit (total) exceeded‘.Increasing the limit requires a restart. - The
log_queries
setting is now enabled by default. You might want to disable this setting for some profiles if you don’t want their queries logged into thesystem.query_log
table. - Several new optimizations are enabled by default. While they typically improve performance sometimes regressions are possible in corner cases:
optimize_aggregators_of_group_by_keys
optimize_arithmetic_operations_in_aggregate_functions
optimize_duplicate_order_by_and_distinct
optimize_group_by_function_keys
optimize_injective_functions_inside_uniq
optimize_move_functions_out_of_any
optimize_monotonous_functions_in_order_by
optimize_redundant_functions_in_order_by
optimize_trivial_insert_select
partial_merge_join_optimizations
ClickHouse Altinity Stable release is based on community version. It can be downloaded from repo.clickhouse.tech, and RPM packages are available from Altinity Stable Repository.
Please contact us at info@altinity.com if you experience any issues with the upgrade.
——————
Appendix
New data types
- All standard SQL data types, like
BIGINT
,VARCHAR
, etc. are mapped to ClickHouse data types. See thesystem.data_type_families
table. - Experimental data types
Point, Ring, Polygon, MultiPolygon
Int128, Int256, UInt256, Decimal256
extended precision data types
New formats
Arrow
ArrowStream
JSONAsString
— this one allows to store the full JSON row unparsed and process later using ClickHouseJSONExtract
functionsMarkdown
MsgPack
PosgreSQLWire
Regexp
– allows to parse any non-standard textish formats (for example logs) by applying regular expression to every line.
New functions
- DateTime functions:
fromUnixTimestamp / FROM_UNIXTIME
dateTrunc / date_trunc
fromUnixTimestamp64Micro
fromUnixTimestamp64Milli
fromUnixTimestamp64Nano
toUnixTimestamp64Micro
toUnixTimestamp64Milli
toUnixTimestamp64Nano
- String functions:
extractGroups, extractAllGroupsHorizontal, extractAllGroupsVertical
(alias toextractAllGroups
)Ilike
,notILike
(also new SQL operatorILIKE
has been added)
- Array functions:
arrayReduceInRanges
hasSubstr
- Machine-learning and statistics:
arrayAUC
— area Under the ROC CurvebayesAB
— bayesian A/B Testing CalculatormedianExactHigh, medianExactLow
quantileExactHigh, quantileExactLow, quantilesExactHigh, quantilesExactLow
- JSON
JSONExtractKeysAndValuesRaw
- URL functions
port
netloc
- Aggregate functions
-Distinct
— new aggregate function combinatorinitializeAggregation
— allow initialization of AggregateFunction state
- Functions to work with key-value pairs (aka maps):
mapAdd, mapSubtract, maxMap, minMap
(extending the API ofsumMap
).SimpleAggregateFunction
now supportsminMap, maxMap, sumMap
.- The new data type
Map
is currently in development that should make it even more intuitive.
- MySQL integration
DATABASE
(alias forcurrentDatabase
)globalVariable
(stub)
- New types related:
toDecimal256, toDecimal256OrNull, toDecimal256OrZero
toInt128, toInt128OrNull, toInt128OrZero
toInt256, toInt256OrNull, toInt256OrZero
toUInt256, toUInt256OrNull, toUInt256OrZero
Join
engine:joinGetOrNull
- Random generators:
fuzzBits
(used for CI)rand32
(alias for rand)randomFixedString, randomString, randomStringUTF8
- Serialize columns to some text format
formatRow, formatRowNoNewline
- Settings & custom settings:
getSetting
- Check types
isDecimalOverflow
defaultValueOfTypeName
isZeroOrNull
- Helper functions to analyze
system.query_log
normalizeQuery, normalizedQueryHash
- Other:
countDigits
mod
(alias formodulo
)
- Special (used in CI / by ClickHouse developers):
hasThreadFuzzer
buildId
- Related to internals of execution of of
IN
/GLOBAL IN
operator (also seetransform_null_in
setting)notNullIn, notNullInIgnoreSet, nullIn, nullInIgnoreSet, inIgnoreSet, notInIgnoreSet
globalNotNullIn, globalNotNullInIgnoreSet, globalNullIn, globalNullInIgnoreSet, globalInIgnoreSet, globalNotInIgnoreSet
New table functions
cosn
— integration with Tencent Cloud Object Storage (COS)
New table engines
MongoDB
RabbitMQ
COSN
New metrics and events
system.metrics:
BackgroundBufferFlushSchedulePoolTask
BackgroundDistributedSchedulePoolTask
MemoryTrackingInBackgroundBufferFlushSchedulePool
MemoryTrackingInBackgroundDistributedSchedulePool
PostgreSQLConnection
system.events:
OSCPUVirtualTimeMicroseconds
OSCPUWaitMicroseconds
OSReadBytes, OSReadChars
OSWriteBytes, OSWriteChars
QueryTimeMicroseconds
New system tables (mostly for RBAC introspection)
current_roles
distribution_queue
enabled_roles
grants
licenses
privileges
quota_limits
quotas_usage
row_policies
role_grants
roles
settings_profile_elements
settings_profiles
time_zones
user_directories
users
New columns in system tables
columns.position, parts_columns.position
databases.uuid, dictionaries.uuid, tables.uuid
disks.type
merge_tree_settings.type, settings.type
- parts:
delete_ttl_info_min, delete_ttl_info_max, move_ttl_info_min, move_ttl_info_max, move_ttl_info.expression
query_log.current_database
storage_policies.volume_type
tables: total_rows, total_bytes, lifetime_rows, lifetime_bytes
trace_log: trace_type, size, timestamp_ns
system.merge_tree_settings added/changed
Setting | Old value | New value | Description |
allow_nullable_key | 0 | Allow Nullable types as primary keys. | |
always_fetch_merged_part | 0 | If true, replicas never merge parts and always download merged parts from other replicas. | |
disable_background_merges | 0 | REMOVED | |
enable_mixed_granularity_parts | 0 | 1 | Enable parts with adaptive and non-adaptive granularity at the same time |
in_memory_parts_enable_wal | 1 | Whether to write blocks in Native format to write-ahead-log before creation in-memory part | |
lock_acquire_timeout_for_background_operations | 120 | For background operations like merges, mutations etc. How many seconds before failing to acquire table locks. | |
max_part_loading_threads | auto | max_part_loading_threads | |
max_part_removal_threads | auto | The number of threads for concurrent removal of inactive data parts. One is usually enough, but in ‘Google Compute Environment SSD Persistent Disks’ file removal (unlink) operation is extraordinarily slow and you probably have to increase this number (recommended is up to 16). | |
max_replicated_logs_to_keep | 10000 | 1000 | How many records may be in log, if there is an inactive replica. |
min_replicated_logs_to_keep | 100 | 10 | Keep about this number of last records in ZooKeeper log, even if they are obsolete. It doesn’t affect work of tables: used only to diagnose ZooKeeper log before cleaning. |
min_bytes_for_compact_part | 0 | Minimal uncompressed size in bytes to create part in compact format instead of saving it in RAM. If non-zero enables in-memory parts. | |
min_rows_for_compact_part | 0 | Minimal number of rows to create part in compact format instead of saving it in RAM. If non-zero enables in-memory parts. | |
min_index_granularity_bytes | 1024 | Minimum amount of bytes in single granule. | |
min_relative_delay_to_measure | 120 | Calculate relative replica delay only if absolute delay is not less than this value. | |
write_ahead_log_max_bytes | 1073741824 | Rotate WAL, if it exceeds that amount of bytes |
system.settings added/changed
Setting | Old value | New value | Description |
allow_experimental_bigint_types | 0 | Allow Int128, Int256, UInt256 and Decimal256 types | |
allow_experimental_database_materialize_mysql | 1 | Allow database creation with Engine=MaterializeMySQL(…) (Highly experimental yet) | |
allow_experimental_geo_types | 0 | Allow geo data types such as Point, Ring, Polygon, MultiPolygon | |
allow_non_metadata_alters | 1 | Allow to execute alters which affects table’s metadata and data on disk. | |
allow_push_predicate_when_subquery_contains_with | 1 | Allows push predicate when subquery contains WITH clause | |
allow_suspicious_codecs | 0 | If it is set to true, allow specifying meaningless compression codecs. | |
alter_partition_verbose_result | 0 | Output information about affected parts. Currently works only for FREEZE and ATTACH commands. | |
background_buffer_flush_schedule_pool_size | 16 | Number of threads performing background flush for tables with Buffer engine. Only has meaning at server startup. | |
background_distributed_schedule_pool_size | 16 | Number of threads performing background tasks for distributed sends. Only has meaning at server startup. | |
cast_keep_nullable | 0 | CAST operator keep Nullable for result data type | |
data_type_default_nullable | 0 | Data types without NULL or NOT NULL will make Nullable | |
default_database_engine | Ordinary | Default database engine | |
distributed_replica_max_ignored_errors | 0 | Number of errors that will be ignored while choosing replicas | |
force_optimize_skip_unused_shards_nesting | 0 | Same as force_optimize_skip_unused_shards, but accept nesting level until which it will work | |
format_regexp | Regular expression (for Regexp format) | ||
format_regexp_escaping_rule | Escaped | Field escaping rule (for Regexp format) | |
format_regexp_skip_unmatched | 0 | Skip lines unmatched by regular expression (for Regexp format) | |
function_implementation | Choose function implementation for specific target or variant (experimental). If empty, enable all of them. | ||
input_format_avro_allow_missing_fields | 0 | For Avro/AvroConfluent format: when field is not found in schema use default value instead of error | |
input_format_with_names_use_header | 0 | 1 | For TSVWithNames and CSVWithNames input formats this controls whether the format parser is to assume that column data appear in the input exactly as they are specified in the header. |
insert_in_memory_parts_timeout | 600000 | ||
join_on_disk_max_files_to_merge | 64 | For MergeJoin on disk, set how many files are allowed to sort simultaneously. The larger the value the more memory is used and less disk I/O needed. Minimum is 2. | |
lock_acquire_timeout | 120 | How long locking request should wait before failing | |
log_queries_min_type | QUERY_START | Minimal type in query_log to log, possible values (from low to high): QUERY_START, QUERY_FINISH, EXCEPTION_BEFORE_START, EXCEPTION_WHILE_PROCESSING. | |
materialize_ttl_after_modify | 1 | Apply TTL for old data, after ALTER MODIFY TTL query | |
max_block_size | 65536 | 65505 | Maximum block size for reading |
max_final_threads | 16 | The maximum number of threads to read from the table with FINAL. | |
max_insert_block_size | 1048576 | 1048545 | The maximum block size for insertion, if we control the creation of blocks for insertion |
max_joined_block_size_rows | 65536 | 65505 | Maximum block size for JOIN result (if join algorithm supports it). 0 means unlimited. |
max_untracked_memory | 4194304 | Small allocations and deallocations are grouped in thread local variables and tracked or profiled only when the amount (in absolute value) becomes larger than specified value. If the value is higher than ‘memory_profiler_step’ it will be effectively lowered to ‘memory_profiler_step’. | |
memory_profiler_sample_probability | 0 | Collect random allocations and deallocations and write them into system.trace_log with ‘MemorySample’ trace_type. The probability is for every alloc/free regardless to the size of the allocation. Note that sampling happens only when the amount of untracked memory exceeds ‘max_untracked_memory’. You may want to set ‘max_untracked_memory’ to 0 for extra fine grained sampling. | |
metrics_perf_events_enabled | 0 | If enabled, some of the perf events will be measured throughout queries’ execution. | |
metrics_perf_events_list | Comma separated list of perf metrics that will be measured throughout queries’ execution. Empty means all events. | ||
min_chunk_bytes_for_parallel_parsing | 1048576 | 10485760 | The minimum chunk size in bytes, which each thread will parse in parallel. |
min_insert_block_size_bytes | 268435456 | 268427520 | Squash blocks passed to INSERT query to specified size in bytes, if blocks are not big enough. |
min_insert_block_size_bytes_for_materialized_views | 0 | Like min_insert_block_size_bytes, but applied only during pushing to MATERIALIZED VIEW (default: min_insert_block_size_bytes) | |
min_insert_block_size_rows | 1048576 | 1048545 | Squash blocks passed to INSERT query to specified size in rows, if blocks are not big enough. |
min_insert_block_size_rows_for_materialized_views | 0 | Like min_insert_block_size_rows, but applied only during pushing to MATERIALIZED VIEW (default: min_insert_block_size_rows) | |
multiple_joins_rewriter_version | 2 | 1 or 2. Second rewriter version knows about table columns and keeps not clashing names as is. | |
optimize_aggregation_in_order | 0 | Enable GROUP BY optimization for aggregating data in corresponding order in MergeTree tables. | |
optimize_aggregators_of_group_by_keys | 1 | Eliminates min/max/any/anyLast aggregators of GROUP BY keys in SELECT section | |
optimize_arithmetic_operations_in_aggregate_functions | 1 | Move arithmetic operations out of aggregation functions | |
optimize_distributed_group_by_sharding_key | 0 | Optimize GROUP BY sharding_key queries (by avoiding costly aggregation on the initiator server) | |
optimize_duplicate_order_by_and_distinct | 1 | Remove duplicate ORDER BY and DISTINCT if it’s possible | |
optimize_group_by_function_keys | 1 | Eliminates functions of other keys in GROUP BY section | |
optimize_if_chain_to_multiif | 0 | Replace if(cond1, then1, if(cond2, …)) chains to multiIf. Currently it’s not beneficial for numeric types. | |
optimize_if_transform_strings_to_enum | 0 | Replaces string-type arguments in If and Transform to enum. Disabled by default cause it could make inconsistent change in distributed query that would lead to its fail | |
optimize_injective_functions_inside_uniq | 1 | Delete injective functions of one argument inside uniq*() functions | |
optimize_monotonous_functions_in_order_by | 1 | Replace monotonous function with its argument in ORDER BY | |
optimize_move_functions_out_of_any | 1 | Move functions out of aggregate functions ‘any’, ‘anyLast’ | |
optimize_redundant_functions_in_order_by | 1 | Remove functions from ORDER BY if its argument is also in ORDER BY | |
optimize_skip_unused_shards_nesting | 0 | Same as optimize_skip_unused_shards, but accept nesting level until which it will work | |
optimize_trivial_insert_select | 1 | Optimize trivial ‘INSERT INTO table SELECT … FROM TABLES’ query | |
output_format_enable_streaming | 0 | Enable streaming in output formats that supports it | |
output_format_pretty_grid_charset | UTF-8 | Charset for printing grid borders. Available charsets: ASCII, UTF-8 (default) | |
output_format_pretty_max_value_width | 10000 | Maximum width of value to display in Pretty formats. If greater – it will be cut. | |
parallel_distributed_insert_select | 0 | Process distributed INSERT SELECT query in the same cluster on local tables on every shard, if 1 SELECT is executed on each shard, if 2 SELECT and INSERT is executed on each shard | |
partial_merge_join_left_table_buffer_bytes | 32000000 | If not 0, group left table blocks in bigger ones for the left-side table in partial merge join. It uses up to 2x of specified memory per joining thread. The current version works only with ‘partial_merge_join_optimizations = 1’. | |
partial_merge_join_optimizations | 0 | 1 | Enable optimizations in partial merge join |
partial_merge_join_rows_in_right_blocks | 10000 | 65536 | Split right-hand joining data in blocks of specified size. It’s a portion of data indexed by min-max values and possibly unloaded on disk. |
rabbitmq_max_wait_ms | 5000 | The wait time for reading from RabbitMQ before retry. | |
read_in_order_two_level_merge_threshold | 100 | Minimal number of parts to read to run preliminary merge step during multithread reading in order of primary key. | |
send_logs_level | none | fatal | Send server text logs with specified minimum level to client. Valid values: ‘trace’, ‘debug’, ‘information’, ‘warning’, ‘error’, ‘fatal’, ‘none’ |
show_table_uuid_in_table_create_query_if_not_nil | 0 | For tables in databases with Engine=Atomic show UUID of the table in its CREATE query. | |
temporary_files_codec | LZ4 | Set compression codec for temporary files (sort and join on disk). I.e. LZ4, NONE. | |
transform_null_in | 0 | If enabled, NULL values will be matched with ‘IN’ operator as if they are considered equal. | |
validate_polygons | 1 | Throw exception if polygon is invalid in function pointInPolygon (e.g. self-tangent, self-intersecting). If the setting is false, the function will accept invalid polygons but may silently return wrong result. |
———-
Also, please refer to the release notes from the development team available at the following URLs:
- https://clickhouse.tech/docs/en/whats-new/changelog/#clickhouse-release-v20-4-2-9-2020-05-12
- https://clickhouse.tech/docs/en/whats-new/changelog/#clickhouse-release-v20-5-2-7-stable-2020-07-02
- https://clickhouse.tech/docs/en/whats-new/changelog/#clickhouse-release-v20-6-3-28-stable
- https://clickhouse.tech/docs/en/whats-new/changelog/#clickhouse-release-v20-7-2-30-stable-2020-08-31
- https://clickhouse.tech/docs/en/whats-new/changelog/#clickhouse-release-v20-8-2-3-stable-2020-09-08