ClickHouse Altinity Stable Release 21.3

A few months ago we certified ClickHouse 21.1 as an Altinity Stable release. Since then, we have worked on newer releases and run them in-house. We completed several new features, and even more have been added by community contributors. We were running ClickHouse 21.3 to power our public datasets at Altinity.Cloud instance and testing it in our environments. As of 21.3.13.9 we are confident in certifying 21.3 as an Altinity Stable release.

This release is a relatively small upgrade since the previous Altinity Stable release. But even a small upgrade includes 742 pull requests from 133 contributors; these mostly focused on performance and stability improvements. There are several important new features as well! Please look below for detailed release notes.

Major new features since the previous stable release 21.1.x

A new release introduces a lot of changes and new features. These are organized by feature in the Appendix, so refer to this section for more detail. The following new features are worth mentioning on the front page:

  • SQL features:
    • Window functions! See our blog article for an introduction.
    • Support multiple levels nesting in the Nested datatype.
  • Security features:
    • LDAP roles mapping a).
    • Apply row-level security as a separate PREWHERE step a).
    • Server side keys support for S3 table function a).
  • Cluster improvements:
    • Hedged requests! That allows to reduce tail latencies on large clusters by running the same query at different replicas. This is controlled by use_hedged_requests and max_parallel_replicas settings.
    • Allow inserts into a specific shard via a distributed table. This is controlled by the insert_shard_id setting.
    • Allow inserts into cluster() table function and specify a sharding key.
    • Replicated database engine (experimental). Allows to replicate DDL statements across the cluster. It is not production ready in this release.
  • MergeTree features:
    • Table level concurrency control. Can be controlled by the max_concurrent_queries merge tree setting.
  • Integrations:
    • PostgreSQL table engine, table function and dictionary source.

As usual with ClickHouse, there are many performance and operational improvements in different server components.

a)– contributed by Altinity developers.

Backward Incompatible Changes

The following changes are backward incompatible and require user attention during an upgrade:

  • It is no longer allowed to create MergeTree tables in the old syntax with table TTL; this syntax is now ignored. Attaching old tables is still possible.
  • Floating point columns and values are disallowed in:
    • Partitioning key – these can be turned back on with the allow_floating_point_partition_key merge tree setting.
    • Bitwise functions.
  • Excessive parenthesis in type definitions are no longer supported, example: Array((UInt8)).

Upgrade Notes

There were several changes between versions that may affect the rolling upgrade of big clusters. Upgrading only part of the cluster is not recommended.

  • Now replicas that are processing the ALTER TABLE ATTACH PART[ITION] command search their detached/ folders before fetching the data from other replicas. As an implementation detail, a new command ATTACH_PART is introduced in the replicated log. Parts are searched and compared by their checksums. ATTACH PART[ITION] queries may not work during cluster upgrade.

Notice

It’s not possible to rollback to an older ClickHouse version after executing ALTER ... ATTACH query in the new version as the old servers would fail to pass the ATTACH_PART entry in the replicated log.

Other Important Changes

Some new ClickHouse features are now enabled by default. It may lead to a change in behaviour, so review those carefully and disable features that may affect your system:

  • distributed_aggregation_memory_efficient
  • enable_global_with_statement
  • optimize_normalize_count_variants
  • optimize_respect_aliases
  • optimize_rewrite_sum_if_to_count_if

Known issues in 21.3.13.9

The development team continues to improve the quality of the 21.3 release. The following issues still exist in the 21.3.13.9 version and may affect ClickHouse operation. Please inspect them carefully to decide if those are applicable to your applications:

ClickHouse Altinity Stable releases are based on the community versions. Community builds can be downloaded from repo.clickhouse.tech.

ClickHouse Altinity builds may be found at:

Please contact us at info@altinity.com if you experience any issues with the upgrade.

Appendix

New functions

  • DateTime functions:
    • Compatibility with MySQL: DATE, DAY, DAYOFMONTH, DAYOFWEEK, DAYOFYEAR, HOUR, MINUTE, MONTH, QUARTER, SECOND, YEAR
    • parseDateTimeBestEffortUSOrNull
    • parseDateTimeBestEffortUSOrZero
    • timezoneOffset
  • String functions:
    • decodeXMLComponent, extractTextFromHTML
  • Map functions:
    • mapContains, mapKeys, mapValues
  • Aggregate functions
    • New combinator -SimpleState
    • deltaSum
    • runningConcurrency
  • Functions to work with new Map data type:
    • map
  • Math/Statistics:
    • sign
  • Type related:
    • reinterpret, reinterpretAsInt128, reinterpretAsInt256, reinterpretAsUInt256
  • Internal ClickHouse:
    • connectionID, connection_id,
    • normalizeQueryKeepNames, normalizedQueryHashKeepNames

New table functions

  • postgresql
  • file

New table engines

  • PostgreSQL

New metrics and events

system.metrics.

  • PartsCommitted
  • PartsCompact
  • PartsDeleteOnDestroy
  • PartsDeleting
  • PartsInMemory
  • PartsOutdated
  • PartsPreCommitted
  • PartsTemporary
  • PartsWide

system.events:

  • DistributedDelayedInserts
  • DistributedDelayedInsertsMilliseconds
  • DistributedRejectedInserts
  • HedgedRequestsChangeReplica

New system tables

  • distributed_ddl_queue

New columns in system tables

  • metric_log.
    • ProfileEvent_DistributedDelayedInserts, ProfileEvent_DistributedRejectedInserts, ProfileEvent_DistributedDelayedInsertsMilliseconds, ProfileEvent_HedgedRequestsChangeReplica, CurrentMetric_PartsTemporary, CurrentMetric_PartsPreCommitted, CurrentMetric_PartsCommitted, CurrentMetric_PartsOutdated, CurrentMetric_PartsDeleting, CurrentMetric_PartsDeleteOnDestroy, CurrentMetric_PartsWide, CurrentMetric_PartsCompact, CurrentMetric_PartsInMemory
  • Parts:
    • group_by_ttl_info.expression, group_by_ttl_info.min, group_by_ttl_info.max, rows_where_ttl_info.expression, rows_where_ttl_info.min, rows_where_ttl_info.max
  • processes.http_referer
  • query_log:
    • http_referer, log_comment, used_aggregate_functions, used_aggregate_function_combinators, used_database_engines, used_data_type_families, used_dictionaries, used_formats, used_functions, used_storages, used_table_functions
  • query_thread_log.http_referer
  • quota_limits:
    • max_query_selects, max_query_inserts
  • quota_usage:
    • query_selects, max_query_selects, query_inserts, max_query_inserts
  • quotas_usage:
    • query_selects, max_query_selects, query_inserts, max_query_inserts

System.merge_tree_settings added/changed

NameOld valueNew valueDescription
allow_floating_point_partition_key0Allow floating point as partition key.
inactive_parts_to_delay_insert0If table contains at least that many inactive parts in single partition, artificially slow down insert into table.
inactive_parts_to_throw_insert0If more than this number of inactive parts are in a single partition, throw the ‘Too many inactive parts …’ exception.
max_concurrent_queries0Max number of concurrently executed queries related to the MergeTree table (0 – disabled). Queries will still be limited by other max_concurrent_queries settings.
min_marks_to_honor_max_concurrent_queries0Minimal number of marks to honor the MergeTree-level’s max_concurrent_queries (0 – disabled). Queries will still be limited by other max_concurrent_queries settings.

system.settings added/changed

NameOld valueNew valueDescription
allow_changing_replica_until_first_data_packet0Allow HedgedConnections to change the replica until receiving the first data packet.
allow_experimental_database_replicated0Allows creating databases with the Replicated engine.
allow_experimental_query_deduplication0Allows sending parts’ UUIDs for a query in order to deduplicate data parts, if any.
async_socket_for_remote10Asynchronously read from socket executing remote query.
background_fetches_pool_size38Number of threads performing background fetches for replicated tables. Only has meaning at server startup.
checksum_on_read1Validates checksums on reading. It is enabled by default and should be always enabled in production. Please do not expect any benefits in disabling this setting. It may only be used for experiments and benchmarks. These settings are only applicable for tables of the MergeTree family. Checksums are always validated for other table engines and when receiving data over the network.
database_replicated_ddl_output1Returns table with query execution status as a result of a DDL query.
database_replicated_initial_query_timeout_sec300How long an initial DDL query should wait for a Replicated database to process previous DDL queue entries.
distributed_aggregation_memory_efficient01The memory-saving mode of distributed aggregation is enabled.
enable_global_with_statement01Propagate WITH statements to UNION queries and all subqueries.
engine_file_empty_if_not_exists0Allows selecting data from a file engine table without a file.
engine_file_truncate_on_insert0Enables or disables truncate before insert in file engine tables.
flatten_nested1If true, columns of type Nested will be flatten to separate array columns instead of one array of tuples.
hedged_connection_timeout_ms100Connection timeout for establishing connection with replica for Hedged requests.
insert_in_memory_parts_timeout600000REMOVED.
insert_shard_id0If non zero, when inserting into a distributed table, the data will be inserted into the shard insert_shard_id synchronously. Possible values range from 1 to shards_number of the corresponding distributed table.
log_commentLog comment into system.query_log table and server log. It can be set to an arbitrary string no longer than max_query_size.
normalize_function_names0Normalize function names to their canonical names.
optimize_normalize_count_variants1Rewrite aggregate functions that semantically equals count() as count().
optimize_respect_aliases1If it is set to true, it will respect aliases in WHERE/GROUP BY/ORDER BY, that will help with partition pruning/secondary indexes/optimize_aggregation_in_order/optimize_read_in_order/optimize_trivial_count.
optimize_rewrite_sum_if_to_count_if1Rewrite sumIf() and sum(if()) function countIf() function when logically equivalent.
periodic_live_view_refresh60Interval after which a periodically refreshed live view is forced to refresh.
query_plan_max_optimizations_to_apply10000Limit the total number of optimizations applied to the query plan. If zero, ignored. If the limit is reached, throw an exception.
receive_data_timeout_ms2000Connection timeout for receiving the first packet of data or packet with positive progress from replica.
s3_max_connections1024The maximum number of connections per server.
sleep_in_send_data0Time to sleep in sending data in TCPHandler.
sleep_in_send_tables_status0Time to sleep in sending tables the status response in TCPHandler.
unknown_packet_in_send_data0Sends unknown packet instead of N-th data packet.
use_hedged_requests0Use hedged requests for distributed queries.

Also, please refer to the release notes from the development team available at the following URLs:

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.