AI-Powered Alert Analysis: Uncovering Critical Patterns in ClickHouse® Databases

Recorded: November 25 @ 08:00 am PST
Presenters: Alkin Tezuysal & Boris Tyshkevich
In this webinar, Altinity Director of Services Alen and Support Engineer Boris Tyshkevich walk through how Altinity built an AI-powered alert analysis system on top of ClickHouse®. The session opens with the business problem: Altinity’s managed platform receives thousands of alerts per month, and manual triage by on-call and follow-the-sun teams is both inefficient and error-prone.
The presenters explain why ClickHouse is the right analytics backend for alerting data, covering its scalable architecture, tiered storage model, and support for materialized views. They then describe the full ingestion pipeline: a Python application extracts alert data from Atlassian OpsGenie on an incremental basis and loads it into ClickHouse, where materialized views filter and shape the data for LLM consumption.
The core technical contribution is the Altinity MCP server, a high-performance Go binary that exposes ClickHouse to AI chat interfaces via the Model Context Protocol (MCP). Boris demonstrates how to connect the MCP server to Claude and walk through live queries, with the model constructing and executing SQL against the alert database, interpreting results, and generating interactive HTML dashboards, all without writing a single line of code manually.
The Q&A section covers key practical topics, including the current limitations of root cause analysis with LLMs, the importance of context window management for large datasets, schema design best practices, and how to extend the pattern to a PostgreSQL-sourced ClickHouse data warehouse.
Here are the slides:
Key Moments (Timestamps)
Key moments generated with AI assistance.
- 00:00 – Welcome and introductions
- 01:44 – Goals: categorizing alerts by urgency, defining actionable responses, filtering false positives
- 03:24 – Why this matters: high alert volume, inefficient triage, impact on on-call teams
- 05:04 – Why ClickHouse for alerting analytics
- 06:31 – Workflow overview: OpsGenie to Python to ClickHouse to MCP server to LLM
- 08:14 – Transition to Boris: LLM-powered data analytics
- 09:47 – Why traditional BI tools fall short: rigidity, slow iteration, limited flexibility
- 11:31 – The new paradigm: chat interfaces, MCP, and LLM-driven exploration
- 14:02 – Altinity MCP server: design, Go implementation, transport modes, tools
- 19:29 – Connecting the MCP server to a chat interface
- 20:54 – Schema design and agent prompt construction
- 24:01 – Live demo: running analytical queries through Claude
- 26:12 – Generating interactive HTML dashboards with LLM-built artifacts
- 31:10 – Real-time demo walkthrough: full query and response cycle
- 37:52 – Q&A: root cause analysis, token costs, schema design, and PostgreSQL integration
Webinar Transcript
[00:00] – Welcome and Introductions
Alen: Hello, everyone. Today, we are going to have a webinar about AI-powered alert analysis with ClickHouse® cloud databases, with Boris and me. My name is Alen. You can connect with us on LinkedIn and other places. We also have an open Slack channel where you can reach out. I am the Director of Services at Altinity. We run the support and services organization together with my colleague Boris. Boris is a support engineer at Altinity, a ClickHouse expert, and a well-known person in the community.
We have prepared a talk about AI-based alert analysis. Our agenda for today is going to be: understanding the requirements and the data, connecting to the data, then building an environment that helps us use AI to analyze alerts, and then we will discuss key takeaways. We have also prepared a few demos to show you how to build your own environment using ClickHouse and the provided tools, including an MCP server. We will wrap up this webinar along the lines of this subject.
[01:44] – Goals: Categorizing Alerts and Enabling Actionable Responses
Alen: Let us move forward with the goal. The goal we have is to categorize all the alerts based on their urgency and importance, basically their severity. Then we need to determine which alerts require immediate attention. Priority-one tickets, or level-zero tickets, are the most important ones. But they might be flipping back and forth, they might be false positives, and we need to analyze whether they require immediate action, like waking up at 3:00 a.m. to look at something.
Of course, we need to define actionable responses from the alert definitions. As a support team or ClickHouse experts, we know what actions to take. But if alerts are repeating and recurring at different times of day, different days of the month, and with varying durations, and then they go away on their own, they might be indicators of other underlying issues: long-running backups, IOPS limitations, or other resource restrictions. This system will help us surface some of that.
The ultimate goal, which may be a bit of a utopia with current implementations, is to implement a system that will automatically handle routine alerts, or at least filter out the alerts that do not need immediate human attention at the moment they fire.
[03:24] – Why We Are Doing This: High Alert Volume and Inefficient Triage
Alen: Every organization has a high volume of alerts coming in. Our managed platform receives thousands of alerts per month. Among those, there are critical alerts that sometimes go unnoticed until they repeatedly increase in number. That causes inefficient manual triage, which becomes very difficult when too many alerts are firing without a clear underlying cause.
In the meantime, this also impacts our support team, our on-call team, and our follow-the-sun team. These alerts can sometimes feel unstoppable, and they are not properly handled because the underlying problem of that system is not understood. We need smarter alerts: better definitions, better thresholds, automatic threshold management and snoozing, and the ability to leverage automation and tooling.
Getting there is going to take time, but having this at least gives you a broader view of what is happening in your system, rather than just looking at a static Grafana dashboard. You can actually interact with this system and explore it.
[05:04] – Why ClickHouse for Alerting Analytics
Alen: Moving on: why are we choosing ClickHouse®? ClickHouse is an excellent fit for analytics, and alerting data is basically analytics data that does not require any transactions. You can have a very complex system or an observability tool connected to a ClickHouse database that handles thousands of metric profiles and alerts.
It is also a scalable architecture. If you really need to put it into production, you can have a sharded replica set enabled environment. It also has a very good tiered storage model: hot, warm, and cold. You can even use object storage in your environment to allow historical data to be retained, or TTL’d out at the end of its useful life. Most critical alerting systems use the last two weeks, 14 days, 21 days, or up to a month. With this ecosystem you can also use summary data to compare month-over-month, quarter-over-quarter, or year-over-year numbers of alerts. It is a really good fit combined with a really powerful target database.
[06:31] – Workflow Overview: From OpsGenie to ClickHouse to LLM
Alen: Let us look at the workflow in our system. Many organizations also use OpsGenie, which is part of the Atlassian suite. It has thresholds, and from the monitoring system it gets alerts triggered. The way that we extract it could be anything: other monitoring tools, PagerDuty, it does not matter. The way we do it is with a Python application that extracts that data and then imports it into ClickHouse on an incremental basis. We do a baseline update and then update incrementally.
We also use materialized views to filter the combination of data that we want to select. Then we use an MCP server and combine it with our RAG automation, which we will be demonstrating in the next section.
That is the whole idea of this demo. We will move on to the next section, where we use LLMs for data analytics based on the example of our alerting analysis. I am going to hand it over to Boris.
[09:47] – Why Traditional BI Tools Fall Short
Boris: Let us discuss first about why we should use LLMs for data analytics. We have a lot of traditional BI platforms and sometimes we are happy with them. So why do we need another tool?
The problem with traditional BI tools is that the whole business process is a bit rigid. Building a dashboard consumes a lot of time, or you need to be comfortable with SQL for freestyle research, and that is not always easy. Iteration is slow. To check your hypothesis, you need to redesign charts, make filters, and write complicated SQL, and it all takes a lot of time. Flexibility is limited too. If you create a dashboard and it is not quite what you want, you have to recreate it and invest even more time to make it better.
[11:31] – The New Paradigm: Chat Interfaces and MCP
Boris: How should it be in the future? We are all moving toward using different chat interfaces from leading vendors, or custom-built solutions, to explore data. If Google Search used to be our best instrument, now the best instrument is ChatGPT, Claude, or whatever you use, and more and more functionality is moving into this environment.
You already have access to web search inside chat interfaces, access to corporate data like knowledge bases or Notion, and you definitely want access to data in corporate databases. Our data lives in ClickHouse, and we want to explore it.
The path is quite clear. We need an intermediary. A chat UI works with large models. The LLM should have the ability to access external data. That is done through the MCP protocol, the Model Context Protocol, and the main instrument here is called tools. During inference, the LLM gets a tool request, executes that tool request, gets data from the database, and processes it in some meaningful way. So the idea is: you use a chat-like interface instead of a BI dashboard. The prompt is sent to the large model. The large model executes a tool call to the MCP server, and the MCP server executes a direct connection to the database.
[14:02] – Altinity MCP Server: Design and Features
Boris: Let us look at how it should be set up and explored. Here at Altinity we built our own MCP server. There are a lot of different MCP servers for different purposes, and there are even MCP servers for ClickHouse. But we were not happy with those other efforts and built our own.
First of all, we need performance, which is why we selected Go as the programming language. And it is not just performance: it is easy to install. For typical Python or Node.js MCP servers, you have to install a lot of dependencies. For Go, it is just a single binary.
Our MCP server supports all different MCP access modes: stdio for local access (the old style), networking (also older style), and the modern standard, which is HTTP streaming. We have several approaches to tools. We have one great query tool called execute_query. The model can send any SQL query to ClickHouse through the MCP server and get a result back.
We do not create special tools like “get table schema” or “get database schema” and so on, the way many other vendors do, because the model can do all of that easily with just the single execute_query tool, the same way a human would: just run SHOW CREATE TABLE, SHOW DATABASES, and so on.
We also added a specific instrument for dynamic tool creation. You can write a parameterized view in the ClickHouse database and expose that view as a tool. That allows you to create quite complicated business rules directly inside the database and expose those instruments to the model. It is a quite interesting solution. I have not heard of anything like it from other vendors.
As for authorization, we support both old-style password authorization and a modern token-based approach. You can create a token with an encrypted password, host, and username, and provide it inside the URL. It is a bit more secure than passing passwords to the models directly, though still not as robust as full OAuth standards. We are working on supporting OAuth standards and expect to have that completed by the end of the year.
Additionally, we support an OpenAPI REST endpoint. This is important for old-style access via ChatGPT’s custom GPTs, or Agents as they are now called in the ChatGPT interface, and for different other tools. You can easily access the ClickHouse server with the same authorization as MCP but through a typical REST API.
[19:29] – Connecting the MCP Server to a Chat Interface
Boris: How do you connect to the LLM? The main artifact here is the specially constructed URL. You run the server, construct the URL with the token authorization inside, and pass that URL to the UI of your particular chat interface.
It is the same for Claude AI, ChatGPT, Codex, Gemini, and many different chat interfaces that mirror the ChatGPT experience. The typical screenshot I am showing here is from the Claude AI interface. You add a custom connector. Every company provides a different name for this. For ChatGPT it is now called Applications, though they also use the word connector. You provide a name for the connector, provide the URL, and that is all. Add the connector and your chat interface now has access to all the tools exposed by the MCP server.
[20:54] – Schema Design and Agent Prompt Construction
Boris: Let us get closer to exploring today. We decided to explore our own data: our alert database. The idea is to get data from Atlassian into the ClickHouse database and process it to be explored by large models.
First, we need to work with the schema. You need a schema clear enough for both humans and models. You should consider the model like a new employee in your company who needs to understand all the functionality and schema very easily and very fast. The schema should be clear with correct column names and comments that help the human, or the model, understand how to work with the data.
We went through the transformation process and built a pipeline for transferring data from Atlassian to ClickHouse, and now we have it.
Next, you should create a special instruction, the agent prompt, that helps the model behave the way you want. The model should not invent something new each time. You provide strict requirements, rules for discussion, output formats, and so on. You should think of the agent prompt like a set of instructions for a new employee in your company: an explanation of what to do and what not to do.
Here is a typical instruction. It starts like many others: “You are the alert analyst.” It is very important to set a clear role for the agent and explain what it should do, along with instructions about what not to do. I also decided to give direct instructions about the table name, schema, and column names. It is not strictly necessary, but sometimes it makes the discussion a bit faster, because if you do not provide enough instructions, the model will spend time and tokens executing additional queries to infer the schema, column names, and comments on its own. You could provide that information in the agent prompt, or allow the model to navigate on its own, either approach can work. I also specify the recommended tool as the Altinity MCP server.
[24:01] – Live Demo: Running Analytical Queries Through Claude
Boris: Now it is time to run a query. You go to a new chat and provide the question. Here the question is a typical analytical question: a breakdown of top-priority alerts by organization for the last four months. The model should understand the question, construct a SQL query, send it to ClickHouse, and explore.
Here is the list of different tools the model has access to. I specifically removed tools that are not needed in order to keep the context window small and usage more economical.
Here is the first reply from the model. The model provides some understanding, some numbers, and finds critical patterns. That information is already valuable for working with the team and understanding what is good and what is bad.
Let us make something more interesting. This is the second question, more complicated. We run it to the model, and the model shows us a response.
[26:12] – Generating Interactive HTML Dashboards
Boris: Now, how do we make the information clearer and more visually appealing? We still need to build a dashboard. That is the same instrument we use a lot in typical BI tools: something visual.
Nowadays, LLMs can create artifacts. An artifact is a document. Such a document could be a text document like a PDF that you could send to your colleagues, a website, or HTML with JavaScript code.
However, there is a problem. The model makes a random design each time you ask it to create such artifacts. That is funny but not great for a consistent corporate style. So we should use advanced prompting techniques like Claude Skills, which can teach the model to produce a stable design. You provide exact examples of the CSS, JavaScript, and HTML code to produce a good, consistent dashboard output.
Another problem with building dashboards by generating code each time is that it can spend a lot of tokens. So it is better to create a dashboard intended for multiple uses, not just a single disposable answer. We can go different ways: we could embed data directly into an HTML dashboard and send it as a set of files to colleagues, or we could allow the dashboard to load data directly from ClickHouse through our MCP server’s REST API.
This is one of the examples of dashboards that Claude AI generated for me. It uses typical approaches for dashboard builders: schemas, charts, and so on, all done directly inside the chat. No programming, no specialists, no design requirements. Just do it yourself in a few minutes, spending a handful of tokens.
[31:10] – Real-Time Demo Walkthrough
Boris: Let me show you a more realistic discussion of how this works. I do not want to run it live from scratch because it would take too long for the model to construct everything from the beginning. So let me walk through a real example.
We ask a question and get a response from the model, including a research plan. The model starts executing queries, does some analysis, provides the SQL it created for that analysis, and suggests building interactive dashboards. For Claude, we can run artifacts such as dashboards directly inside the Claude interface.
Notice an important behavior here. When using an MCP server, the model always asks for confirmation before running a tool call. This is a bit of a safety net, since MCP could in theory send your data somewhere else. The model is correct to ask for that confirmation, or to continue with continuous confirmation.
You can see the model navigating through the database, executing queries, first constructing them and then executing them. Note that the model provides a LIMIT clause. It is very important not to pull too much data into the model, because the context window of the model is quite limited. Claude’s context window is around 200,000 tokens, and queries from a database could easily overflow it and spend your money without producing a useful result. That is why all queries should compress data inside ClickHouse first. They should be GROUP BY queries, JOIN queries, or top-N queries with an ORDER BY and LIMIT, and all of that is better to state clearly in the agent prompt.
Here is a model response. It is a typical answer in JSON format. We get a response from the query with some explanation about top-priority alerts, counts, affected organizations, and so on. That result is already usable. You can explore it in different directions, clarify details, explore new ideas, or create a dashboard.
Note also that the model was specifically requested in the agent prompt to provide the SQL queries it used. You can then use those queries in traditional dashboard tools like Grafana, or verify the queries yourself in a tool like DataGrip.
[37:52] – Q&A: Root Cause Analysis, Costs, Schema Design, and PostgreSQL Integration
Alen: While Boris is finishing the demo, I will take a question. One question is: how about root cause analysis? Can the agent do root cause analysis without human intervention?
The short answer is no, not yet. The agents are still not at the level of giving detailed root cause analysis. What I showed in the first part of the presentation is that that would be the ultimate goal. But basically it gives hints and pointers. If you put those together and apply a template, you can get a nice root cause analysis output. We have tested a lot of those internally. However, we would not give any recommendations that have not been reviewed by a ClickHouse expert, and we would not use them ourselves either. We are not there yet in my opinion.
Boris: To do proper root cause analysis, we need access to more data: all the metrics for a considerable amount of time, access to logs, and maybe more. The model could work with that. However, the amount of unstructured data is quite large and mostly not ready to be processed. The whole instrument would need to include an ETL process that grabs data from logs, metrics, and other sources, does some preprocessing, and places it into the database. We are already working on that, but it is a longer process than simply giving the model access to the logs.
Alen: Just one more thing to add: this whole thing is still not totally free of cost. The more data you analyze, the more tokens you will use. You will have to reserve credits or funds to do large operations in this sense.
Boris: The key point here is to have structured, well-documented data, the same way we have been talking about for years and decades in data warehousing, documenting everything the way DBT loves to do for you. The model is already here just to help you make some inferences and surface ideas from that structured data.
Alen: Ankit has a question: how do we create interactive dashboards and report generation using data in ClickHouse, where in their use case ClickHouse is used as a data warehouse system that syncs from PostgreSQL?
Boris: The sync from PostgreSQL is an additional procedure. You need a special tool for that. But after the data is synced from PostgreSQL to ClickHouse, you can easily navigate the data in ClickHouse with the methods I described, and yes you can get text information, create dashboards, and even create dashboards that communicate directly with ClickHouse. Let me show you one of my favorite dashboard examples.
This is one of my dashboards created by Claude. It is not a static dashboard with embedded data. It has a data loader. It provides the query, creates a data loader, a place for the token, and a fetch button. I can reload it. This is the starting point with no data loaded. When I load the data, the simple HTML code with JavaScript inside connects directly to ClickHouse through the MCP server REST API, requests data, and draws the dashboards. Here is the actual data. All the code was written by the model. No programmers were harmed.
Alen: The short answer is: get the data into ClickHouse already, then you can wire it like we did, and then you can ask the agent to build an interactive dashboard. If you connect the chat UI to the ClickHouse database with a token, you will be able to do that. You already have the data coming from PostgreSQL to ClickHouse.
Boris: Yes. It is very important to have the schema and database created in a proper way, designed with the proper ordering, partitioning, and so on. In that situation, the model can explore data freely with different types of queries and not spend too much computing resources on grouping, aggregating, joining, and so on.
Alen: All right, thanks everyone for joining today’s webinar. If you have any questions, we have an Altinity Slack space where you can ask, or you can connect with us on LinkedIn. Thank you so much everyone.
FAQ Section
Q: Why is ClickHouse a good database for storing and analyzing alert data?
A: ClickHouse is well-suited for alerting workloads because alerting data is analytics data. It does not require transactions, but it does require fast aggregation over large volumes of time-series events. ClickHouse’s columnar storage, parallel query execution, and scalable architecture make it excellent for this. Additionally, its tiered storage model (hot, warm, cold, and object storage) lets you retain months or years of alert history cost-efficiently, and you can use materialized views to pre-aggregate common analysis patterns, making LLM-driven queries faster and cheaper.
Q: What is the Altinity MCP server and why was it built in Go?
A: The Altinity MCP server is an open-source tool that exposes ClickHouse to AI chat interfaces using the Model Context Protocol (MCP). It was built in Go primarily for performance and ease of installation. Unlike Python or Node.js MCP servers that require installing many dependencies, the Go version ships as a single binary. It supports all standard MCP transport modes (stdio, TCP, and HTTP streaming) and provides a single execute_query tool through which the LLM can run any SQL against ClickHouse, including schema discovery queries like SHOW CREATE TABLE.
Q: What is an agent prompt, and why does it matter for alert analysis?
A: An agent prompt is a set of instructions that shapes how an LLM behaves when interacting with your data. Think of it as an onboarding document for a new employee. For alert analysis, a good agent prompt sets a clear role (“You are the alert analyst”), defines the database schema, specifies output formats, instructs the model to always include SQL queries in its responses for verification, and tells the model to use compressed, aggregated queries (using GROUP BY, LIMIT, and ORDER BY) so that the context window is not flooded with raw rows. A well-crafted agent prompt is one of the most important factors in getting consistent, useful results.
Q: Can AI agents perform root cause analysis on ClickHouse alerts today?
A: Not fully, not yet. Current LLMs can identify patterns, surface anomalies, and generate useful hints about potential underlying causes. However, genuine root cause analysis requires access to structured logs, metrics across multiple systems, and data over extended time windows, most of which is not yet structured and pipeline-ready. Altinity is working on ETL tooling to bring that data into ClickHouse in a form the model can consume. For now, the agents serve as a powerful first-pass triage tool, surfacing patterns that a ClickHouse expert can then investigate and act on.
Q: How do you manage LLM token costs when querying large alert datasets?
A: The most important technique is to have the LLM always compress data inside ClickHouse before returning it to the model. The agent prompt should explicitly instruct the model to use GROUP BY, ORDER BY with LIMIT, and aggregation queries rather than retrieving raw rows. Claude’s context window is around 200,000 tokens, and uncompressed query results can fill that quickly, wasting money and degrading result quality. Additionally, removing unnecessary MCP tools from the context and providing explicit schema information in the agent prompt reduces the number of exploratory tool calls the model needs to make.
Q: How do I create interactive dashboards from ClickHouse data using an LLM?
A: You ask the LLM to generate an HTML artifact containing JavaScript that connects directly to ClickHouse through the Altinity MCP server’s REST API endpoint. The model can generate a fully functional dashboard with a token input field and a fetch button. When loaded in a browser, the dashboard queries ClickHouse live and renders charts. For a more shareable format, the model can also embed the data directly into the HTML so the dashboard is self-contained. Using Claude Skills (example CSS, JavaScript, and HTML snippets provided in the agent prompt) ensures the model generates dashboards with a consistent design rather than a randomly styled one each time.
© 2026 Altinity, Inc. All rights reserved. Altinity®, Altinity.Cloud®, and Altinity Stable® are registered trademarks of Altinity, Inc. ClickHouse® is a registered trademark of ClickHouse, Inc.; Altinity is not affiliated with or associated with ClickHouse, Inc. Kubernetes, MySQL, and PostgreSQL are trademarks and property of their respective owners.
ClickHouse® is a registered trademark of ClickHouse, Inc.; Altinity is not affiliated with or associated with ClickHouse, Inc.