ClickHouse Birthday Altinity Stable Release 20.3.12.112

Altinity Stable Release 20.3.12.112

4 years ago, Alexey Milovidov published “Yandex Opens ClickHouse”, a historic blog post that opened up a new era in analytical databases. Since then we consider June 15 as the birthday of ClickHouse. ClickHouse is our beloved child that grows up and matures very rapidly. Like any normal kid, it has some bad habits and a lot of “character.” We teach it and are happy to see how it progresses. Birthdays are a time for presents. For ClickHouse’s 4th birthday we prepared several presents for ClickHouse users and ClickHouse itself: new releases of ClickHouse operator for Kubernetes, the Grafana ClickHouse datasource and last but not least, the new Altinity Stable ClickHouse release 20.3.12.112

This release is a significant step forward since the previous Altinity Stable release 19.16.19.85. It includes 1203 pull requests from 171 contributors! Correspondingly, new features and contributions put a lot of load on QA and testing. Therefore we spent several months polishing it with the Yandex team in order to stabilize new functionality and ensure there are no regressions. While there are a few minor issues left, the release is finally good quality and we can recommend it for production use.

Major Changes Since the Previous Stable Release 19.16.19.85

A new release introduces a lot of changes and new features. We tried to organize them conveniently in the Appendix, so refer to this section for more detail. The following changes are worth mentioning on the front page:

  • TTL moves. See the “Putting Things Where They Belong Using New TTL Moves” article. While we consider this feature to be production ready, we have a roadmap to further improve it . TTL moves have been developed by Altinity.

  • Dictionary DDLs that many ClickHouse users dreamed of! See the “Clickhouse Dictionaries: Reloaded!” article that highlights this feature.

  • New DateTime64 datatype with configurable precision up to nanoseconds. The feature is in beta state and more functions for usability will follow in next releases. It has also been developed by Altinity.

  • Joins have been improved a lot, including SEMI/ANTI JOIN, experimental merge join and other changes. See the join_algorithm setting below for more info on merge joins.

  • A new compact format for MergeTree tables that store all columns in one file. It improves performance of small inserts. The usual format where every column is stored separately is now called “wide.” Compact format is disabled by default. See the min_bytes_for_wide_part and min_rows_for_wide_part settings.

  • A built-in Prometheus exporter endpoint for ClickHouse monitoring statistics

  • Porting some functionality of the H3 library — A Hexagonal Hierarchical Geospatial Indexing System

  • ALTER MODIFY/DROP are currently implemented as mutations for ReplicatedMergeTree* engine family. Now ALTER commands block only at the metadata update stage, and don’t block after that.

  • Import/export gzip-compressed files directly for file based storage engines and table functions: File, URL, HDFS and S3.

As usual with ClickHouse, there were also a lot of performance improvements in different components of the server. For example, performance of the Kafka Engine has been improved, parallel INSERT is now possible (see max_insert_threads setting) and others.

Backward Incompatible Changes

The following changes are backward incompatible and require user attention.

  • ALTER on ReplicatedMergeTree is not compatible with previous versions.20.3 creates a different metadata structure in ZooKeeper for ALTERs. Earlier versions do not understand the format and cannot proceed with their replication queue.

  • The format of replication log entries for mutation commands has changed. You have to wait for old mutations to process before installing the new version.

  • 20.3 requires an alias for every subquery participating in a join by default. Set ‘joined_subquery_requires_alias=0’ in order to keep the previous behavior.

  • ANY JOIN logic has been changed. To upgrade without changes in behaviour, you need to add ‘SETTINGS any_join_distinct_right_table_keys = 1’ to Engine Join tables metadata or recreate these tables after upgrade.

  • Functions indexHint, findClusterValue, findClusterIndex were removed

  • Settings merge_tree_uniform_read_distribution, allow_experimental_cross_to_join_conversion, allow_experimental_multiple_joins_emulation are deprecated and ignored

Upgrade Notes

There were several changes between versions that may affect the rolling upgrade of big clusters. Upgrading only part of the cluster is not recommended. Beware of the following:

  • Replication fetch protocol has been changed in this release. If only one replica is upgraded in a shard, replication may get stuck with ‘unknown protocol version’ error until all replicas are upgraded. 19.16.17.80 and 19.16.19.85 contain a special compatibility fix that allows smooth upgrade without replication downtime.

  • If enable_optimize_predicate_expression is turned on, it may result in incorrect data when only part of the cluster is upgraded. Please turn it off before the upgrade, and turn it back on afterwards.

  • There were some optimizations that are incompatible between versions, so we recommend disabling enable_scalar_subquery_optimization before the upgrade. This setting turns off new scalar subquery optimizations and was backported specifically in order to facilitate smooth upgrades.

  • There were some optimizations around distributed query execution that may result in incorrect data if part of the cluster is upgraded. We recommend disabling distributed_aggregation_memory_efficient while upgrading, and turn it on afterwards.

Other Important Notes

  • If you encounter ‘Argument at index 0 for function dictGetDateTime must be a constant’ or a similar error message after upgrading, set ‘enable_early_constant_folding=0’.

  • The new release adds a parallel parsing of input formats. While it should improve the performance of data loading in text formats, sometimes it may result in a slower load. Set ‘input_format_parallel_parsing=0’ if you experience insert performance degradation.

  • The check that a query is performed fast enough is enabled by default. Queries can get an exception like this: ‘DB::Exception: Estimated query execution time (N seconds) is too long. Maximum: 12334.’. Set ‘timeout_before_checking_execution_speed=0’ to fix this problem.

  • Mutations containing non-deterministic functions, e.g. dictGet or joinGet are disabled by default. Set ‘allow_nondeterministic_mutations=1’ to enable.

Also note that this release has a new query pipeline enabled by default (codename — Processors). This was a significant internal refactoring, and we believe all issues have been fixed. However, in the rare case that you face some weird query behavior or performance degradation, try disabling ‘experimental_use_processors’ and check if the problem goes away.

As usual, RPM packages for Altinity Stable Release can be found in the Altinity Stable Repository. The ClickHouse repository can be used as well but check version numbers carefully. Remember that you can also find Altinity release descriptions in our blog using the ‘Releases’ tag.

Please contact us at info@altinity.com if you experience any issues with the upgrade.


Appendix

New formats

  • Avro

  • AvroConfluent

  • JSONCompactEachRow

  • JSONCompactEachRowWithNamesAndTypes

New or Improved Functions

  • CRC32IEEE, CRC64

  • New JSON functions:

    • isValidJSON

    • JSONExtractArrayRaw — very useful for parsing nested JSON structures

  • New array functions:

    • arrayCompact

    • arrayFill

    • arrayReverseFill

    • arrayZip

  • New geo functions:

    • geoDistance

    • greatCircleDistance

  • H3 functions:

    • h3EdgeAngle

    • h3EdgeLengthM

    • h3GetBaseCell

    • h3GetResolution

    • h3HexAreaM2

    • h3IndexesAreNeighbors

    • h3IsValid

    • h3ToChildren

    • h3ToParent

    • h3ToString

    • h3kRing

    • stringToH3

  • Functions for DateTime64:

    • now64

    • parseDateTime64BestEffort, parseDateTime64BestEffortOrNull, parseDateTime64BestEffortOrZero

    • toDateTime64, toDateTime64OrNull, toDateTime64OrZero

  • New aggregate functions:

    • avgWeighted

    • categoricalInformationValue

    • groupArraySample

  • Other:

    • bitCount

    • ifNotFinite 

    • isConstant

    • javaHashUTF16LE

    • moduloOrZero

    • randomPrintableASCII

    • roundBankers

New table functions

  • clusterAllReplicas — similar to cluster function but queries all replicas in the cluster

  • generateRandom — allows users to generate random dataset with defined structure

  • zeros, zeros_mt

New metrics and events

system.metrics:

  • BackgroundMovePoolTask

  • CacheDictionaryUpdateQueueBatches

  • CacheDictionaryUpdateQueueKeys

  • MemoryTrackingInBackgroundMoveProcessingPool

  • MySQLConnection

  • SendScalars

New system tables

  • metric_log — stores system.metrics and system.events history

  • quotas, quota_usage — introspection to quotes

  • row_policies — introspection to row security policies

  • trace_log, stack_trace — for low-level debugging purpose

  • zeros, zeros_mt — zero ‘generators’, like system.numbers, but faster and returns zeros

New columns in system tables

  • mutations: parts_to_do_names

  • parts, parts_columns: part_type

  • processes: thread_ids replaces old columns thread_numbers and os_thread_ids

  • query_log: exception_code, thread_ids replaces old columns thread_numbers and os_thread_ids

  • replicas: zookeeper_exception

  • settings: min, max, readonly

  • table_engines: supports_settings, supports_skipping_indices, supports_sort_order, supports_ttl, supports_replication, supports_deduplication

system.merge_tree_settings added/changed

Settings Old value New value Description
disable_background_merges 0 Disable background merges
merge_max_block_size 8192 How many rows in blocks should be formed for merge operations
min_bytes_for_wide_part 0 Minimal uncompressed size in bytes to create part in wide format instead of compact
min_rows_for_wide_part 0 Minimal number of rows to create part in wide format instead of compact
use_minimalistic_part_header_in_zookeeper 0 1 Store part header (checksums and columns) in a compact format and a single part znode instead of separate znodes. This setting was available for a year already, and many users have it enabled.

system.settings added/changed

Setting Old value New value Description
allow_experimental_alter_materialized_view_structure 0 Allow atomic alter on Materialized views. Work in progress
allow_experimental_data_skipping_indices 0 1 Data skipping indices are now enabled by default
allow_nondeterministic_mutations 0 Allow non-deterministic functions in ALTER UPDATE/ALTER DELETE statements, e.g. with dictGet() functions
background_move_pool_size 8 Number of threads performing background moves for tables. Only has meaning at server startup
default_max_bytes_in_join 100000000 1000000000 Maximum size of right-side table if limit is required but max_bytes_in_join is not set
enable_early_constant_folding 0 1 Enable query optimization where we analyze function and subqueries results and rewrite query if there’re constants there
enable_scalar_subquery_optimization 0 1 If it is set to true, prevent scalar subqueries from (de)serializing large scalar values and possibly avoid running the same subquery more than once
experimental_use_processors 0 1 Processors pipeline is now enabled by default
force_optimize_skip_unused_shards 1 Throw an exception if unused shards cannot be skipped (1 – throw only if the table has the sharding key, 2 – always throw)
format_avro_schema_registry_url For AvroConfluent format: Confluent Schema Registry URL
input_format_parallel_parsing 1 Enable parallel parsing for some data formats
input_format_values_deduce_templates_of_expressions 0 1 For Values format: if the field could not be parsed by streaming parser, run SQL parser, deduce template of the SQL expression, try to parse all rows using template and then interpret expression for all rows
join_algorithm hash Specify join algorithm: ‘auto’, ‘hash’, ‘partial_merge’, ‘prefer_partial_merge’. ‘auto’ tries to change HashJoin to MergeJoin on the fly to avoid out of memory
joined_subquery_requires_alias 0 1 Force joined subqueries and table functions to have aliases for correct name qualification
max_insert_threads 0 The maximum number of threads to execute the INSERT SELECT query. Values 0 or 1 means that INSERT SELECT is not run in parallel. Higher values will lead to higher memory usage. Parallel INSERT SELECT has effect only if the SELECT part is run on parallel, see ‘max_threads’ setting
max_joined_block_size_rows 65536 Maximum block size for JOIN result (if join algorithm supports it). 0 means unlimited
max_parser_depth 1000 Maximum parser depth
memory_profiler_step 0 Every number of bytes the memory profiler will collect the allocating stack trace. The minimal effective step is 4 MiB (less values will work as clamped to 4 MiB). Zero means disabled memory profiler
min_bytes_to_use_mmap_io 0 The minimum number of bytes for reading the data with the mmap option during SELECT queries execution. 0 – disabled
min_chunk_bytes_for_parallel_parsing 1048576 The minimum chunk size in bytes, which each thread will parse in parallel
mutations_sync 0 Wait for synchronous execution of ALTER TABLE UPDATE/DELETE queries (mutations). 0 – execute asynchronously. 1 – wait current server. 2 – wait all replicas if they exist
optimize_if_chain_to_miltiif 0 Replace if(cond1, then1, if(cond2, …)) chains to multiIf
optimize_trivial_count_query 1 Process trivial ‘SELECT count() FROM table’ query from metadata
output_format_avro_codec Compression codec used for output. Possible values: ‘null’, ‘deflate’, ‘snappy’
output_format_avro_sync_interval 16384 Sync interval in bytes
output_format_csv_crlf_end_of_line If it is set true, end of line in CSV format will be \r\n instead of \n
output_format_tsv_crlf_end_of_line If it is set true, end of line in TSV format will be \r\n instead of \n
timeout_before_checking_execution_speed 0 10 Check that the speed is not too low after the specified time has elapsed
use_compact_format_in_distributed_parts_names 0 Changes format of directories names for distributed table insert parts

Also, please refer to the release notes from the development team that can be available using the following URLs:

Share