ClickHouse Altinity Stable Release™ 20.8.7.15

Altinity

Several months ago we certified the 20.3 ClickHouse release as Altinity Stable. Since then we have worked on newer releases and run them in-house. All recent articles and benchmarks in our blog, including benchmarks against RedShift and S3 integration articles, have been powered by ClickHouse 20.8 or newer versions. We tested new functionality to make sure the quality of user requested ClickHouse features are top notch. We are confident in certifying ClickHouse 20.8.7.15 as an Altinity Stable release.

This release is a significant step forward since the previous Altinity Stable release. It includes 1584 pull requests from 235 contributors with many new features and performance improvements! Please look below for detailed release notes.

Major new features since the previous stable release 20.3.19.4

A new release introduces a lot of changes and new features. These are organized by feature in the Appendix, so refer to this section for more detail. The following new features are worth mentioning on the front page:

  • Security features:
    • RBAC and SQL management for users, roles, grants etc.
    • LDAP authentication
    • Kerberos authentication in Kafka Engine
  • Dictionary features: 
    • direct and ssd_cache layouts
    • Redis, MongoDB, Cassandra sources
    • Automatic query rewrite for joins (see an example here)
    • Experimental polygon dictionaries
  • MergeTree features:
    • S3 disk for MergeTree tables! It is still an experimental feature, see our article for more detail.
    • New TTL types: TTL DELETE WHERE and TTL GROUP BY.
    • In-memory MergeTree parts
  • Integrations:
    • PostgreSQL wire protocol
    • RabbitMQ and MongoDB storage engines
    • Tencent Cloud Object Storage (COS)
    • Highly experimental MaterializeMySQL engine that implements MySQL replica in ClickHouse
  • SQL compatibility:
    • New Int128, (U)Int256, Decimal256 extended precision data types
    • Aliases for standard SQL types
    • EXPLAIN statement!
    • Merge join improvements
  • Custom HTTP handlers
  • clickhouse-copier underwent extensive updates and improvements

As usual with ClickHouse, there are many performance and operational improvements in different server components.

Backward Incompatible Changes

The following changes are backward incompatible and require user attention during an upgrade:

  • Aggregate functions states with Nullable arguments may produce different / incompatible types.
  • Gorilla, Delta and DoubleDelta codecs can not be used anymore on data types of variable size (like strings).
  • System tables (e.g. system.query_log, system.trace_log, system.metric_log) are using compact data part format for parts smaller than 10 MiB in size (this is almost always the case). Compact data part format is supported since version 20.3. 
    • WARNING: If you have to downgrade from 20.8 to version prior 20.3, you should manually delete table data for system logs in /var/lib/clickhouse/data/system/.
  • Deprecate special printing of zero Date/DateTime values as ‘0000-00-00’ and ‘0000-00-00 00:00:00’. Now it is printed as ‘1970-01-01’ and ‘1970-01-01 00:00:00’ respectively.

Upgrade Notes

There were several changes between versions that may affect the rolling upgrade of big clusters. Upgrading only part of the cluster is not recommended. Note the following:

  • 20.3 two-level aggregation is not compatible with 20.4+.
    Data will not be fully aggregated for queries that are processed using the two-level aggregation algorithm. This algorithm should be disabled before upgrading if the risk is high in your environment. See group_by_two_level_threshold and group_by_two_level_threshold_bytes settings.
  • zstd library has been updated in 20.5. While it is not a problem for ClickHouse in general, it may result in inconsistent parts when several replicas merge parts independently, and will force ClickHouse to download merged parts to make sure they are byte-identical (which will lead to extra traffic between nodes). The first node to complete the merge will register the part in ZooKeeper, and the other nodes will download the part if their checksums are different. There will be no data loss; conflicts will disappear once all replicas are upgraded.
  • The following settings lead to incompatibility in distributed queries when only a subset of shards are upgraded and others are not: 
    • optimize_move_functions_out_of_any
    • optimize_arithmetic_operations_in_aggregate_functions
    • optimize_injective_functions_inside_uniq
    • optimize_if_transform_strings_to_enum
  • When pre-20.5 and 20.5+ versions run as replicas “Part … intersects previous part” errors are possible due to change in leadership selection protocol. If you need to run pre-20.5 and 20.5+ versions in the same cluster make sure the old version can not become a leader. This can be configured via replicated_can_become_leader merge tree setting globally or on a table level.

Other Important Changes

  • All replicas are now ‘leaders’. This allows multiple replicas to assign merges, mutations, partition drop, move and replace concurrently. Now system.replicas.is_leader is 1 for all tables on all nodes. If you rely on this value for some operations, your processes must be revised. The LeaderElection and LeaderReplica metrics were removed.
  • New setting max_server_memory_usage limits total memory usage of the server. The setting max_memory_usage_for_all_queries is now obsolete and does nothing. You might see an exception ‘Memory limit (total) exceeded‘.Increasing the limit requires a restart.
  • The log_queries setting is now enabled by default. You might want to disable this setting for some profiles if you don’t want their queries logged into the system.query_log table.
  • Several new optimizations are enabled by default. While they typically improve performance sometimes regressions are possible in corner cases:
    • optimize_aggregators_of_group_by_keys
    • optimize_arithmetic_operations_in_aggregate_functions
    • optimize_duplicate_order_by_and_distinct
    • optimize_group_by_function_keys
    • optimize_injective_functions_inside_uniq
    • optimize_move_functions_out_of_any
      optimize_monotonous_functions_in_order_by
    • optimize_redundant_functions_in_order_by
    • optimize_trivial_insert_select
    • partial_merge_join_optimizations

ClickHouse Altinity Stable release is based on community version. It can be downloaded from repo.clickhouse.tech, and RPM packages are available from Altinity Stable Repository.

Please contact us at info@altinity.com if you experience any issues with the upgrade.

——————

Appendix

New data types

  • All standard SQL data types, like BIGINT, VARCHAR, etc. are mapped to ClickHouse data types.  See the system.data_type_families table.
  • Experimental data types Point, Ring, Polygon, MultiPolygon
  • Int128, Int256, UInt256, Decimal256 extended precision data types

New formats

  • Arrow
  • ArrowStream
  • JSONAsString — this one allows to store the full JSON row unparsed and process later using ClickHouse JSONExtract functions 
  • Markdown
  • MsgPack
  • PosgreSQLWire 
  • Regexp – allows to parse any non-standard textish formats (for example logs) by applying regular expression to every line.

New functions

  • DateTime functions:
    • fromUnixTimestamp / FROM_UNIXTIME
    • dateTrunc / date_trunc
    • fromUnixTimestamp64Micro
    • fromUnixTimestamp64Milli
    • fromUnixTimestamp64Nano
    • toUnixTimestamp64Micro
    • toUnixTimestamp64Milli
    • toUnixTimestamp64Nano
  • String functions:
    • extractGroups, extractAllGroupsHorizontal, extractAllGroupsVertical (alias to extractAllGroups)
    • Ilike, notILike (also new SQL operator ILIKE has been added)
  • Array functions:
    • arrayReduceInRanges
    • hasSubstr
  • Machine-learning and statistics:
    • arrayAUC — area Under the ROC Curve
    • bayesAB — bayesian A/B Testing Calculator
    • medianExactHigh, medianExactLow
    • quantileExactHigh, quantileExactLow, quantilesExactHigh, quantilesExactLow
  • JSON
    • JSONExtractKeysAndValuesRaw
  • URL functions
    • port
    • netloc
  • Aggregate functions
    • -Distinct — new aggregate function combinator
    • initializeAggregation — allow initialization of AggregateFunction state
  • Functions to work with key-value pairs (aka maps):
    • mapAdd, mapSubtract, maxMap, minMap (extending the API of sumMap).
    • SimpleAggregateFunction now supports minMap, maxMap, sumMap
    • The new data type Map is currently in development that should make it even more intuitive.
  • MySQL integration
    • DATABASE (alias for currentDatabase)
    • globalVariable (stub)
  • New types related:
    • toDecimal256, toDecimal256OrNull, toDecimal256OrZero
    • toInt128, toInt128OrNull, toInt128OrZero
    • toInt256, toInt256OrNull, toInt256OrZero
    • toUInt256, toUInt256OrNull, toUInt256OrZero
  • Join engine:
    • joinGetOrNull
  • Random generators:
    • fuzzBits (used for CI)
    • rand32 (alias for rand)
    • randomFixedString, randomString, randomStringUTF8
  • Serialize columns to some text format
    • formatRow, formatRowNoNewline
  • Settings & custom settings:
    • getSetting
  • Check types
    • isDecimalOverflow
    • defaultValueOfTypeName
    • isZeroOrNull
  • Helper functions to analyze system.query_log
    • normalizeQuery, normalizedQueryHash
  • Other:
    • countDigits
    • mod (alias for modulo)
  • Special (used in CI / by ClickHouse developers):
    • hasThreadFuzzer
    • buildId
  • Related to internals of execution of of IN / GLOBAL IN operator (also see transform_null_in setting)
    • notNullIn, notNullInIgnoreSet, nullIn, nullInIgnoreSet, inIgnoreSet, notInIgnoreSet
    • globalNotNullIn, globalNotNullInIgnoreSet, globalNullIn, globalNullInIgnoreSet, globalInIgnoreSet, globalNotInIgnoreSet

New table functions

  • cosn — integration with Tencent Cloud Object Storage (COS)

New table engines

  • MongoDB
  • RabbitMQ
  • COSN

New metrics and events

system.metrics

  • BackgroundBufferFlushSchedulePoolTask
  • BackgroundDistributedSchedulePoolTask
  • MemoryTrackingInBackgroundBufferFlushSchedulePool
  • MemoryTrackingInBackgroundDistributedSchedulePool
  • PostgreSQLConnection

system.events:

  • OSCPUVirtualTimeMicroseconds
  • OSCPUWaitMicroseconds
  • OSReadBytes, OSReadChars
  • OSWriteBytes, OSWriteChars
  • QueryTimeMicroseconds

New system tables (mostly for RBAC introspection)

  • current_roles
  • distribution_queue 
  • enabled_roles
  • grants 
  • licenses
  • privileges
  • quota_limits
  • quotas_usage
  • row_policies
  • role_grants
  • roles
  • settings_profile_elements
  • settings_profiles
  • time_zones
  • user_directories
  • users              

New columns in system tables

  • columns.position, parts_columns.position
  • databases.uuid, dictionaries.uuid, tables.uuid
  • disks.type
  • merge_tree_settings.type, settings.type
  • parts: delete_ttl_info_min, delete_ttl_info_max, move_ttl_info_min, move_ttl_info_max, move_ttl_info.expression
  • query_log.current_database
  • storage_policies.volume_type
  • tables: total_rows, total_bytes, lifetime_rows, lifetime_bytes
  • trace_log: trace_type, size, timestamp_ns

system.merge_tree_settings added/changed

SettingOld valueNew valueDescription
allow_nullable_key0Allow Nullable types as primary keys.
always_fetch_merged_part0If true, replicas never merge parts and always download merged parts from other replicas.
disable_background_merges0REMOVED
enable_mixed_granularity_parts01Enable parts with adaptive and non-adaptive granularity at the same time
in_memory_parts_enable_wal1Whether to write blocks in Native format to write-ahead-log before creation in-memory part 
lock_acquire_timeout_for_background_operations120For background operations like merges, mutations etc. How many seconds before failing to acquire table locks.
max_part_loading_threadsautomax_part_loading_threads
max_part_removal_threadsautoThe number of threads for concurrent removal of inactive data parts. One is usually enough, but in ‘Google Compute Environment SSD Persistent Disks’ file removal (unlink) operation is extraordinarily slow and you probably have to increase this number (recommended is up to 16).
max_replicated_logs_to_keep100001000How many records may be in log, if there is an inactive replica.
min_replicated_logs_to_keep10010Keep about this number of last records in ZooKeeper log, even if they are obsolete. It doesn’t affect work of tables: used only to diagnose ZooKeeper log before cleaning.
min_bytes_for_compact_part0Minimal uncompressed size in bytes to create part in compact format instead of saving it in RAM. If non-zero enables in-memory parts.
min_rows_for_compact_part0Minimal number of rows to create part in compact format instead of saving it in RAM. If non-zero enables in-memory parts.
min_index_granularity_bytes1024Minimum amount of bytes in single granule.
min_relative_delay_to_measure120Calculate relative replica delay only if absolute delay is not less than this value.
write_ahead_log_max_bytes1073741824Rotate WAL, if it exceeds that amount of bytes

system.settings added/changed

SettingOld valueNew valueDescription
allow_experimental_bigint_types0Allow Int128, Int256, UInt256 and Decimal256 types
allow_experimental_database_materialize_mysql1Allow database creation with Engine=MaterializeMySQL(…) (Highly experimental yet)
allow_experimental_geo_types0Allow geo data types such as Point, Ring, Polygon, MultiPolygon
allow_non_metadata_alters1Allow to execute alters which affects table’s metadata and data on disk.
allow_push_predicate_when_subquery_contains_with1Allows push predicate when subquery contains WITH clause
allow_suspicious_codecs0If it is set to true, allow specifying meaningless compression codecs.
alter_partition_verbose_result0Output information about affected parts. Currently works only for FREEZE and ATTACH commands.
background_buffer_flush_schedule_pool_size16Number of threads performing background flush for tables with Buffer engine. Only has meaning at server startup.
background_distributed_schedule_pool_size16Number of threads performing background tasks for distributed sends. Only has meaning at server startup.
cast_keep_nullable0CAST operator keep Nullable for result data type
data_type_default_nullable0Data types without NULL or NOT NULL will make Nullable
default_database_engineOrdinaryDefault database engine
distributed_replica_max_ignored_errors0Number of errors that will be ignored while choosing replicas
force_optimize_skip_unused_shards_nesting0Same as force_optimize_skip_unused_shards, but accept nesting level until which it will work
format_regexpRegular expression (for Regexp format)
format_regexp_escaping_ruleEscapedField escaping rule (for Regexp format)
format_regexp_skip_unmatched0Skip lines unmatched by regular expression (for Regexp format)
function_implementationChoose function implementation for specific target or variant (experimental). If empty, enable all of them.
input_format_avro_allow_missing_fields0For Avro/AvroConfluent format: when field is not found in schema use default value instead of error
input_format_with_names_use_header01For TSVWithNames and CSVWithNames input formats this controls whether the format parser is to assume that column data appear in the input exactly as they are specified in the header.
insert_in_memory_parts_timeout600000
join_on_disk_max_files_to_merge64For MergeJoin on disk, set how many files are allowed to sort simultaneously. The larger the value the more memory is used and less disk I/O needed. Minimum is 2.
lock_acquire_timeout120How long locking request should wait before failing
log_queries_min_typeQUERY_STARTMinimal type in query_log to log, possible values (from low to high): QUERY_START, QUERY_FINISH, EXCEPTION_BEFORE_START, EXCEPTION_WHILE_PROCESSING.
materialize_ttl_after_modify1Apply TTL for old data, after ALTER MODIFY TTL query
max_block_size6553665505Maximum block size for reading
max_final_threads16The maximum number of threads to read from the  table with FINAL.
max_insert_block_size10485761048545The maximum block size for insertion, if we control the creation of blocks for insertion
max_joined_block_size_rows6553665505Maximum block size for JOIN result (if join algorithm supports it). 0 means unlimited.
max_untracked_memory4194304Small allocations and deallocations are grouped in thread local variables and tracked or profiled only when the amount (in absolute value) becomes larger than specified value. If the value is higher than ‘memory_profiler_step’ it will be effectively lowered to ‘memory_profiler_step’.
memory_profiler_sample_probability0Collect random allocations and deallocations and write them into system.trace_log with ‘MemorySample’ trace_type. The probability is for every alloc/free regardless to the size of the allocation. Note that sampling happens only when the amount of untracked memory exceeds ‘max_untracked_memory’. You may want to set ‘max_untracked_memory’ to 0 for extra fine grained sampling.
metrics_perf_events_enabled0If enabled, some of the perf events will be measured throughout queries’ execution.
metrics_perf_events_listComma separated list of perf metrics that will be measured throughout queries’ execution. Empty means all events.
min_chunk_bytes_for_parallel_parsing104857610485760The minimum chunk size in bytes, which each thread will parse in parallel.
min_insert_block_size_bytes268435456268427520Squash blocks passed to INSERT query to specified size in bytes, if blocks are not big enough.
min_insert_block_size_bytes_for_materialized_views0Like min_insert_block_size_bytes, but applied only during pushing to MATERIALIZED VIEW (default: min_insert_block_size_bytes)
min_insert_block_size_rows10485761048545Squash blocks passed to INSERT query to specified size in rows, if blocks are not big enough.
min_insert_block_size_rows_for_materialized_views0Like min_insert_block_size_rows, but applied only during pushing to MATERIALIZED VIEW (default: min_insert_block_size_rows)
multiple_joins_rewriter_version21 or 2. Second rewriter version knows about table columns and keeps not clashing names as is. 
optimize_aggregation_in_order0Enable GROUP BY optimization for aggregating data in corresponding order in MergeTree tables.
optimize_aggregators_of_group_by_keys1Eliminates min/max/any/anyLast aggregators of GROUP BY keys in SELECT section 
optimize_arithmetic_operations_in_aggregate_functions1Move arithmetic operations out of aggregation functions
optimize_distributed_group_by_sharding_key0Optimize GROUP BY sharding_key queries (by avoiding costly aggregation on the initiator server)
optimize_duplicate_order_by_and_distinct1Remove duplicate ORDER BY and DISTINCT if it’s possible
optimize_group_by_function_keys1Eliminates functions of other keys in GROUP BY section
optimize_if_chain_to_multiif0Replace if(cond1, then1, if(cond2, …)) chains to multiIf. Currently it’s not beneficial for numeric types.
optimize_if_transform_strings_to_enum0Replaces string-type arguments in If and Transform to enum. Disabled by default cause it could make inconsistent change in distributed query that would lead to its fail
optimize_injective_functions_inside_uniq1Delete injective functions of one argument inside uniq*() functions
optimize_monotonous_functions_in_order_by1Replace monotonous function with its argument in ORDER BY
optimize_move_functions_out_of_any1Move functions out of aggregate functions ‘any’, ‘anyLast’
optimize_redundant_functions_in_order_by1Remove functions from ORDER BY if its argument is also in ORDER BY
optimize_skip_unused_shards_nesting0Same as optimize_skip_unused_shards, but accept nesting level until which it will work
optimize_trivial_insert_select1Optimize trivial ‘INSERT INTO table SELECT … FROM TABLES’ query 
output_format_enable_streaming0Enable streaming in output formats that supports it
output_format_pretty_grid_charsetUTF-8Charset for printing grid borders. Available charsets: ASCII, UTF-8 (default)
output_format_pretty_max_value_width10000Maximum width of value to display in Pretty formats. If greater – it will be cut.
parallel_distributed_insert_select0Process distributed INSERT SELECT query in the same cluster on local tables on every shard, if 1 SELECT is executed on each shard, if 2 SELECT and INSERT is executed on each shard
partial_merge_join_left_table_buffer_bytes32000000If not 0, group left table blocks in bigger ones for the left-side table in partial merge join. It uses up to 2x of specified memory per joining thread. The current version works only with ‘partial_merge_join_optimizations = 1’. 
partial_merge_join_optimizations01Enable optimizations in partial merge join
partial_merge_join_rows_in_right_blocks1000065536Split right-hand joining data in blocks of specified size. It’s a portion of data indexed by min-max values and possibly unloaded on disk.
rabbitmq_max_wait_ms5000The wait time for reading from RabbitMQ before retry.
read_in_order_two_level_merge_threshold100Minimal number of parts to read to run preliminary merge step during multithread reading in order of primary key.
send_logs_levelnonefatalSend server text logs with specified minimum level to client. Valid values: ‘trace’, ‘debug’, ‘information’, ‘warning’, ‘error’, ‘fatal’, ‘none’
show_table_uuid_in_table_create_query_if_not_nil0For tables in databases with Engine=Atomic show UUID of the table in its CREATE query.
temporary_files_codecLZ4Set compression codec for temporary files (sort and join on disk). I.e. LZ4, NONE.
transform_null_in0If enabled, NULL values will be matched with ‘IN’ operator as if they are considered equal.
validate_polygons1Throw exception if polygon is invalid in function pointInPolygon (e.g. self-tangent, self-intersecting). If the setting is false, the function will accept invalid polygons but may silently return wrong result.

———-

Also, please refer to the release notes from the development team available at the following URLs:

Share