ClickHouse Altinity Stable Release™ 20.8.7.15

By Altinity Team on December 2nd, 2020

ClickHouse Altinity Stable Release™ 20.8.7.15

Altinity Team Altinity StableClickHouseReleases

Several months ago we certified the 20.3 ClickHouse release as Altinity Stable. Since then we have worked on newer releases and run them in-house. All recent articles and benchmarks in our blog, including benchmarks against RedShift and S3 integration articles, have been powered by ClickHouse 20.8 or newer versions. We tested new functionality to make sure the quality of user requested ClickHouse features are top notch. We are confident in certifying ClickHouse 20.8.7.15 as an Altinity Stable release.

This release is a significant step forward since the previous Altinity Stable release. It includes 1584 pull requests from 235 contributors with many new features and performance improvements! Please look below for detailed release notes.

Major new features since the previous stable release 20.3.19.4

A new release introduces a lot of changes and new features. These are organized by feature in the Appendix, so refer to this section for more detail. The following new features are worth mentioning on the front page:

  • Security features:
    • RBAC and SQL management for users, roles, grants etc.
    • LDAP authentication
    • Kerberos authentication in Kafka Engine
  • Dictionary features: 
    • direct and ssd_cache layouts
    • Redis, MongoDB, Cassandra sources
    • Automatic query rewrite for joins (see an example here)
    • Experimental polygon dictionaries
  • MergeTree features:
    • S3 disk for MergeTree tables! It is still an experimental feature, see our article for more detail.
    • New TTL types: TTL DELETE WHERE and TTL GROUP BY.
    • In-memory MergeTree parts
  • Integrations:
    • PostgreSQL wire protocol
    • RabbitMQ and MongoDB storage engines
    • Tencent Cloud Object Storage (COS)
    • Highly experimental MaterializeMySQL engine that implements MySQL replica in ClickHouse
  • SQL compatibility:
    • New Int128, (U)Int256, Decimal256 extended precision data types
    • Aliases for standard SQL types
    • EXPLAIN statement!
    • Merge join improvements
  • Custom HTTP handlers
  • clickhouse-copier underwent extensive updates and improvements

As usual with ClickHouse, there are many performance and operational improvements in different server components.

Backward Incompatible Changes

The following changes are backward incompatible and require user attention during an upgrade:

  • Aggregate functions states with Nullable arguments may produce different / incompatible types.
  • Gorilla, Delta and DoubleDelta codecs can not be used anymore on data types of variable size (like strings).
  • System tables (e.g. system.query_log, system.trace_log, system.metric_log) are using compact data part format for parts smaller than 10 MiB in size (this is almost always the case). Compact data part format is supported since version 20.3. 
    • WARNING: If you have to downgrade from 20.8 to version prior 20.3, you should manually delete table data for system logs in /var/lib/clickhouse/data/system/.
  • Deprecate special printing of zero Date/DateTime values as ‘0000-00-00’ and ‘0000-00-00 00:00:00’. Now it is printed as ‘1970-01-01’ and ‘1970-01-01 00:00:00’ respectively.

Upgrade Notes

There were several changes between versions that may affect the rolling upgrade of big clusters. Upgrading only part of the cluster is not recommended. Note the following:

  • 20.3 two-level aggregation is not compatible with 20.4+.
    Data will not be fully aggregated for queries that are processed using the two-level aggregation algorithm. This algorithm should be disabled before upgrading if the risk is high in your environment. See group_by_two_level_threshold and group_by_two_level_threshold_bytes settings.
  • zstd library has been updated in 20.5. While it is not a problem for ClickHouse in general, it may result in inconsistent parts when several replicas merge parts independently, and will force ClickHouse to download merged parts to make sure they are byte-identical (which will lead to extra traffic between nodes). The first node to complete the merge will register the part in ZooKeeper, and the other nodes will download the part if their checksums are different. There will be no data loss; conflicts will disappear once all replicas are upgraded.
  • The following settings lead to incompatibility in distributed queries when only a subset of shards are upgraded and others are not: 
    • optimize_move_functions_out_of_any
    • optimize_arithmetic_operations_in_aggregate_functions
    • optimize_injective_functions_inside_uniq
    • optimize_if_transform_strings_to_enum
  • When pre-20.5 and 20.5+ versions run as replicas “Part … intersects previous part” errors are possible due to change in leadership selection protocol. If you need to run pre-20.5 and 20.5+ versions in the same cluster make sure the old version can not become a leader. This can be configured via replicated_can_become_leader merge tree setting globally or on a table level.

Other Important Changes

  • All replicas are now ‘leaders’. This allows multiple replicas to assign merges, mutations, partition drop, move and replace concurrently. Now system.replicas.is_leader is 1 for all tables on all nodes. If you rely on this value for some operations, your processes must be revised. The LeaderElection and LeaderReplica metrics were removed.
  • New setting max_server_memory_usage limits total memory usage of the server. The setting max_memory_usage_for_all_queries is now obsolete and does nothing. You might see an exception ‘Memory limit (total) exceeded‘.Increasing the limit requires a restart.
  • The log_queries setting is now enabled by default. You might want to disable this setting for some profiles if you don’t want their queries logged into the system.query_log table.
  • Several new optimizations are enabled by default. While they typically improve performance sometimes regressions are possible in corner cases:
    • optimize_aggregators_of_group_by_keys
    • optimize_arithmetic_operations_in_aggregate_functions
    • optimize_duplicate_order_by_and_distinct
    • optimize_group_by_function_keys
    • optimize_injective_functions_inside_uniq
    • optimize_move_functions_out_of_any
      optimize_monotonous_functions_in_order_by
    • optimize_redundant_functions_in_order_by
    • optimize_trivial_insert_select
    • partial_merge_join_optimizations

ClickHouse Altinity Stable release is based on community version. It can be downloaded from repo.clickhouse.tech, and RPM packages are available from Altinity Stable Repository.

Please contact us at [email protected] if you experience any issues with the upgrade.

——————

Appendix

New data types

  • All standard SQL data types, like BIGINT, VARCHAR, etc. are mapped to ClickHouse data types.  See the system.data_type_families table.
  • Experimental data types Point, Ring, Polygon, MultiPolygon
  • Int128, Int256, UInt256, Decimal256 extended precision data types

New formats

  • Arrow
  • ArrowStream
  • JSONAsString — this one allows to store the full JSON row unparsed and process later using ClickHouse JSONExtract functions 
  • Markdown
  • MsgPack
  • PosgreSQLWire 
  • Regexp – allows to parse any non-standard textish formats (for example logs) by applying regular expression to every line.

New functions

  • DateTime functions:
    • fromUnixTimestamp / FROM_UNIXTIME
    • dateTrunc / date_trunc
    • fromUnixTimestamp64Micro
    • fromUnixTimestamp64Milli
    • fromUnixTimestamp64Nano
    • toUnixTimestamp64Micro
    • toUnixTimestamp64Milli
    • toUnixTimestamp64Nano
  • String functions:
    • extractGroups, extractAllGroupsHorizontal, extractAllGroupsVertical (alias to extractAllGroups)
    • Ilike, notILike (also new SQL operator ILIKE has been added)
  • Array functions:
    • arrayReduceInRanges
    • hasSubstr
  • Machine-learning and statistics:
    • arrayAUC — area Under the ROC Curve
    • bayesAB — bayesian A/B Testing Calculator
    • medianExactHigh, medianExactLow
    • quantileExactHigh, quantileExactLow, quantilesExactHigh, quantilesExactLow
  • JSON
    • JSONExtractKeysAndValuesRaw
  • URL functions
    • port
    • netloc
  • Aggregate functions
    • -Distinct — new aggregate function combinator
    • initializeAggregation — allow initialization of AggregateFunction state
  • Functions to work with key-value pairs (aka maps):
    • mapAdd, mapSubtract, maxMap, minMap (extending the API of sumMap).
    • SimpleAggregateFunction now supports minMap, maxMap, sumMap
    • The new data type Map is currently in development that should make it even more intuitive.
  • MySQL integration
    • DATABASE (alias for currentDatabase)
    • globalVariable (stub)
  • New types related:
    • toDecimal256, toDecimal256OrNull, toDecimal256OrZero
    • toInt128, toInt128OrNull, toInt128OrZero
    • toInt256, toInt256OrNull, toInt256OrZero
    • toUInt256, toUInt256OrNull, toUInt256OrZero
  • Join engine:
    • joinGetOrNull
  • Random generators:
    • fuzzBits (used for CI)
    • rand32 (alias for rand)
    • randomFixedString, randomString, randomStringUTF8
  • Serialize columns to some text format
    • formatRow, formatRowNoNewline
  • Settings & custom settings:
    • getSetting
  • Check types
    • isDecimalOverflow
    • defaultValueOfTypeName
    • isZeroOrNull
  • Helper functions to analyze system.query_log
    • normalizeQuery, normalizedQueryHash
  • Other:
    • countDigits
    • mod (alias for modulo)
  • Special (used in CI / by ClickHouse developers):
    • hasThreadFuzzer
    • buildId
  • Related to internals of execution of of IN / GLOBAL IN operator (also see transform_null_in setting)
    • notNullIn, notNullInIgnoreSet, nullIn, nullInIgnoreSet, inIgnoreSet, notInIgnoreSet
    • globalNotNullIn, globalNotNullInIgnoreSet, globalNullIn, globalNullInIgnoreSet, globalInIgnoreSet, globalNotInIgnoreSet

New table functions

  • cosn — integration with Tencent Cloud Object Storage (COS)

New table engines

  • MongoDB
  • RabbitMQ
  • COSN

New metrics and events

system.metrics

  • BackgroundBufferFlushSchedulePoolTask
  • BackgroundDistributedSchedulePoolTask
  • MemoryTrackingInBackgroundBufferFlushSchedulePool
  • MemoryTrackingInBackgroundDistributedSchedulePool
  • PostgreSQLConnection

system.events:

  • OSCPUVirtualTimeMicroseconds
  • OSCPUWaitMicroseconds
  • OSReadBytes, OSReadChars
  • OSWriteBytes, OSWriteChars
  • QueryTimeMicroseconds

New system tables (mostly for RBAC introspection)

  • current_roles
  • distribution_queue 
  • enabled_roles
  • grants 
  • licenses
  • privileges
  • quota_limits
  • quotas_usage
  • row_policies
  • role_grants
  • roles
  • settings_profile_elements
  • settings_profiles
  • time_zones
  • user_directories
  • users              

New columns in system tables

  • columns.position, parts_columns.position
  • databases.uuid, dictionaries.uuid, tables.uuid
  • disks.type
  • merge_tree_settings.type, settings.type
  • parts: delete_ttl_info_min, delete_ttl_info_max, move_ttl_info_min, move_ttl_info_max, move_ttl_info.expression
  • query_log.current_database
  • storage_policies.volume_type
  • tables: total_rows, total_bytes, lifetime_rows, lifetime_bytes
  • trace_log: trace_type, size, timestamp_ns

system.merge_tree_settings added/changed

Setting Old value New value Description
allow_nullable_key 0 Allow Nullable types as primary keys.
always_fetch_merged_part 0 If true, replicas never merge parts and always download merged parts from other replicas.
disable_background_merges 0 REMOVED
enable_mixed_granularity_parts 0 1 Enable parts with adaptive and non-adaptive granularity at the same time
in_memory_parts_enable_wal 1 Whether to write blocks in Native format to write-ahead-log before creation in-memory part 
lock_acquire_timeout_for_background_operations 120 For background operations like merges, mutations etc. How many seconds before failing to acquire table locks.
max_part_loading_threads auto max_part_loading_threads
max_part_removal_threads auto The number of threads for concurrent removal of inactive data parts. One is usually enough, but in ‘Google Compute Environment SSD Persistent Disks’ file removal (unlink) operation is extraordinarily slow and you probably have to increase this number (recommended is up to 16).
max_replicated_logs_to_keep 10000 1000 How many records may be in log, if there is an inactive replica.
min_replicated_logs_to_keep 100 10 Keep about this number of last records in ZooKeeper log, even if they are obsolete. It doesn’t affect work of tables: used only to diagnose ZooKeeper log before cleaning.
min_bytes_for_compact_part 0 Minimal uncompressed size in bytes to create part in compact format instead of saving it in RAM. If non-zero enables in-memory parts.
min_rows_for_compact_part 0 Minimal number of rows to create part in compact format instead of saving it in RAM. If non-zero enables in-memory parts.
min_index_granularity_bytes 1024 Minimum amount of bytes in single granule.
min_relative_delay_to_measure 120 Calculate relative replica delay only if absolute delay is not less than this value.
write_ahead_log_max_bytes 1073741824 Rotate WAL, if it exceeds that amount of bytes

system.settings added/changed

Setting Old value New value Description
allow_experimental_bigint_types 0 Allow Int128, Int256, UInt256 and Decimal256 types
allow_experimental_database_materialize_mysql 1 Allow database creation with Engine=MaterializeMySQL(…) (Highly experimental yet)
allow_experimental_geo_types 0 Allow geo data types such as Point, Ring, Polygon, MultiPolygon
allow_non_metadata_alters 1 Allow to execute alters which affects table’s metadata and data on disk.
allow_push_predicate_when_subquery_contains_with 1 Allows push predicate when subquery contains WITH clause
allow_suspicious_codecs 0 If it is set to true, allow specifying meaningless compression codecs.
alter_partition_verbose_result 0 Output information about affected parts. Currently works only for FREEZE and ATTACH commands.
background_buffer_flush_schedule_pool_size 16 Number of threads performing background flush for tables with Buffer engine. Only has meaning at server startup.
background_distributed_schedule_pool_size 16 Number of threads performing background tasks for distributed sends. Only has meaning at server startup.
cast_keep_nullable 0 CAST operator keep Nullable for result data type
data_type_default_nullable 0 Data types without NULL or NOT NULL will make Nullable
default_database_engine Ordinary Default database engine
distributed_replica_max_ignored_errors 0 Number of errors that will be ignored while choosing replicas
force_optimize_skip_unused_shards_nesting 0 Same as force_optimize_skip_unused_shards, but accept nesting level until which it will work
format_regexp Regular expression (for Regexp format)
format_regexp_escaping_rule Escaped Field escaping rule (for Regexp format)
format_regexp_skip_unmatched 0 Skip lines unmatched by regular expression (for Regexp format)
function_implementation Choose function implementation for specific target or variant (experimental). If empty, enable all of them.
input_format_avro_allow_missing_fields 0 For Avro/AvroConfluent format: when field is not found in schema use default value instead of error
input_format_with_names_use_header 0 1 For TSVWithNames and CSVWithNames input formats this controls whether the format parser is to assume that column data appear in the input exactly as they are specified in the header.
insert_in_memory_parts_timeout 600000
join_on_disk_max_files_to_merge 64 For MergeJoin on disk, set how many files are allowed to sort simultaneously. The larger the value the more memory is used and less disk I/O needed. Minimum is 2.
lock_acquire_timeout 120 How long locking request should wait before failing
log_queries_min_type QUERY_START Minimal type in query_log to log, possible values (from low to high): QUERY_START, QUERY_FINISH, EXCEPTION_BEFORE_START, EXCEPTION_WHILE_PROCESSING.
materialize_ttl_after_modify 1 Apply TTL for old data, after ALTER MODIFY TTL query
max_block_size 65536 65505 Maximum block size for reading
max_final_threads 16 The maximum number of threads to read from the  table with FINAL.
max_insert_block_size 1048576 1048545 The maximum block size for insertion, if we control the creation of blocks for insertion
max_joined_block_size_rows 65536 65505 Maximum block size for JOIN result (if join algorithm supports it). 0 means unlimited.
max_untracked_memory 4194304 Small allocations and deallocations are grouped in thread local variables and tracked or profiled only when the amount (in absolute value) becomes larger than specified value. If the value is higher than ‘memory_profiler_step’ it will be effectively lowered to ‘memory_profiler_step’.
memory_profiler_sample_probability 0 Collect random allocations and deallocations and write them into system.trace_log with ‘MemorySample’ trace_type. The probability is for every alloc/free regardless to the size of the allocation. Note that sampling happens only when the amount of untracked memory exceeds ‘max_untracked_memory’. You may want to set ‘max_untracked_memory’ to 0 for extra fine grained sampling.
metrics_perf_events_enabled 0 If enabled, some of the perf events will be measured throughout queries’ execution.
metrics_perf_events_list Comma separated list of perf metrics that will be measured throughout queries’ execution. Empty means all events.
min_chunk_bytes_for_parallel_parsing 1048576 10485760 The minimum chunk size in bytes, which each thread will parse in parallel.
min_insert_block_size_bytes 268435456 268427520 Squash blocks passed to INSERT query to specified size in bytes, if blocks are not big enough.
min_insert_block_size_bytes_for_materialized_views 0 Like min_insert_block_size_bytes, but applied only during pushing to MATERIALIZED VIEW (default: min_insert_block_size_bytes)
min_insert_block_size_rows 1048576 1048545 Squash blocks passed to INSERT query to specified size in rows, if blocks are not big enough.
min_insert_block_size_rows_for_materialized_views 0 Like min_insert_block_size_rows, but applied only during pushing to MATERIALIZED VIEW (default: min_insert_block_size_rows)
multiple_joins_rewriter_version 2 1 or 2. Second rewriter version knows about table columns and keeps not clashing names as is. 
optimize_aggregation_in_order 0 Enable GROUP BY optimization for aggregating data in corresponding order in MergeTree tables.
optimize_aggregators_of_group_by_keys 1 Eliminates min/max/any/anyLast aggregators of GROUP BY keys in SELECT section 
optimize_arithmetic_operations_in_aggregate_functions 1 Move arithmetic operations out of aggregation functions
optimize_distributed_group_by_sharding_key 0 Optimize GROUP BY sharding_key queries (by avoiding costly aggregation on the initiator server)
optimize_duplicate_order_by_and_distinct 1 Remove duplicate ORDER BY and DISTINCT if it’s possible
optimize_group_by_function_keys 1 Eliminates functions of other keys in GROUP BY section
optimize_if_chain_to_multiif 0 Replace if(cond1, then1, if(cond2, …)) chains to multiIf. Currently it’s not beneficial for numeric types.
optimize_if_transform_strings_to_enum 0 Replaces string-type arguments in If and Transform to enum. Disabled by default cause it could make inconsistent change in distributed query that would lead to its fail
optimize_injective_functions_inside_uniq 1 Delete injective functions of one argument inside uniq*() functions
optimize_monotonous_functions_in_order_by 1 Replace monotonous function with its argument in ORDER BY
optimize_move_functions_out_of_any 1 Move functions out of aggregate functions ‘any’, ‘anyLast’
optimize_redundant_functions_in_order_by 1 Remove functions from ORDER BY if its argument is also in ORDER BY
optimize_skip_unused_shards_nesting 0 Same as optimize_skip_unused_shards, but accept nesting level until which it will work
optimize_trivial_insert_select 1 Optimize trivial ‘INSERT INTO table SELECT … FROM TABLES’ query 
output_format_enable_streaming 0 Enable streaming in output formats that supports it
output_format_pretty_grid_charset UTF-8 Charset for printing grid borders. Available charsets: ASCII, UTF-8 (default)
output_format_pretty_max_value_width 10000 Maximum width of value to display in Pretty formats. If greater – it will be cut.
parallel_distributed_insert_select 0 Process distributed INSERT SELECT query in the same cluster on local tables on every shard, if 1 SELECT is executed on each shard, if 2 SELECT and INSERT is executed on each shard
partial_merge_join_left_table_buffer_bytes 32000000 If not 0, group left table blocks in bigger ones for the left-side table in partial merge join. It uses up to 2x of specified memory per joining thread. The current version works only with ‘partial_merge_join_optimizations = 1’. 
partial_merge_join_optimizations 0 1 Enable optimizations in partial merge join
partial_merge_join_rows_in_right_blocks 10000 65536 Split right-hand joining data in blocks of specified size. It’s a portion of data indexed by min-max values and possibly unloaded on disk.
rabbitmq_max_wait_ms 5000 The wait time for reading from RabbitMQ before retry.
read_in_order_two_level_merge_threshold 100 Minimal number of parts to read to run preliminary merge step during multithread reading in order of primary key.
send_logs_level none fatal Send server text logs with specified minimum level to client. Valid values: ‘trace’, ‘debug’, ‘information’, ‘warning’, ‘error’, ‘fatal’, ‘none’
show_table_uuid_in_table_create_query_if_not_nil 0 For tables in databases with Engine=Atomic show UUID of the table in its CREATE query.
temporary_files_codec LZ4 Set compression codec for temporary files (sort and join on disk). I.e. LZ4, NONE.
transform_null_in 0 If enabled, NULL values will be matched with ‘IN’ operator as if they are considered equal.
validate_polygons 1 Throw exception if polygon is invalid in function pointInPolygon (e.g. self-tangent, self-intersecting). If the setting is false, the function will accept invalid polygons but may silently return wrong result.

———-

Also, please refer to the release notes from the development team available at the following URLs:


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.