Support for ClickHouse®
Stuck debugging ClickHouse issues?
We help teams debug slow queries, memory errors, replication lag, upgrades, backups, and production incidents across self-hosted, Kubernetes, BYOC, cloud, and bare-metal deployments.
Maintainers of clickhouse-backup (⭐ 1.6K) and creators of Altinity Kubernetes Operator (⭐ 2.5K). Contributing upstream since 2017.









What are you stuck on?
Paste the error, query, or short version of your issue.
I spent a ton of time trying to figure out a JSON path issue, and the [Altinity engineer] figured it out immediately. That almost never happens. Normally, you get a canned response.
tatari
ClickHouse Errors We See Regularly
DB::Exception: Too many parts (300). Merges are processing significantly slower than inserts.
Usually an insert rate vs. merge throughput imbalance. Not a hardware problem. We look at merge_tree settings, partition key design, and insert batch sizes. Fix typically doesn’t require downtime.
ALERT: replication_delay_sec{replica=”replica-02″} = 247 — exceeded threshold 60s
Replication lag shows up in monitoring, not in ClickHouse logs. Could be a heavy background merge on the replica, a network blip, or a schema mismatch. We pull the queue state from system.replication_queue and identify what’s stalling. We’ve traced this on Kubernetes setups where pod scheduling was the actual culprit.
DB::Exception: Memory limit (total) exceeded: 5.37 GiB (attempt to allocate chunk of 1.20 GiB)
We identify which queries are triggering this, whether it’s a JOIN memory issue or aggregation, and whether max_memory_usage and max_bytes_before_external_group_by are configured correctly. Usually fixable without adding RAM.
What Useful ClickHouse Help Looks Like
When it’s urgent, use SLA-backed options
We provide SLAs for user systems with guaranteed response times based on severity and tiers (Enterprise (4-hour) – Premium (30-60 mins)). We’ll confirm response time and escalation path once we understand the situation.
You talk to people who know the internals
Support includes ClickHouse committers, engineers who run large-scale clusters in production and recognize common failure modes when you describe them.
We try to explain why it happened
The goal is not just to make the symptom go away. We look for the query, schema, setting, workload, configuration, or deployment behavior behind the issue, so your team knows what to change next.