Webinars

The Future of Observability is Open: Combining Iceberg, ClickHouse®, and OpenTelemetry

Recorded: April 14 @ 08:00 am PDT
Presenters: Joshua Lee

In this webinar, Josh Lee, Open Source Advocate at Altinity, makes the case that observability begs to be open and explains why that openness must extend across three distinct layers: instrumentation, pipelines, and storage. Drawing on years of hands-on experience rolling out OpenTelemetry at large organizations, Josh walks through each layer and explains what is already working, what is still missing, and how a new generation of open-source technologies is closing the gap.

Josh then introduces a live proof-of-concept stack that combines OpenTelemetry, Apache Parquet, Apache Iceberg, Apache Arrow, and ClickHouse to create a real-time open lakehouse for observability data. The entire stack runs on commodity hardware and stores telemetry directly on S3 object storage, approximately 12 times cheaper than block storage alternatives. He also previews Project Antalya, Altinity’s open-source initiative to bring hybrid table support to ClickHouse, enabling seamless tiering of hot data in MergeTree and cold data in Iceberg without any application changes.

The webinar closes with a frank look at the pros and cons of this architecture, a roundup of open-source ecosystem companies building compatible tooling, and a live Q&A covering eBPF instrumentation and the future of OpenTelemetry profiles.

Key Moments (Timestamps)

Key moments generated with AI assistance.

  • 0:00 – Introduction and Altinity overview
  • 1:05 – Why observability begs to be open
  • 1:25 – A brief history: Josh’s journey with OpenTelemetry
  • 3:00 – The three forms of openness: instrumentation, pipelines, storage
  • 6:12 – The missing piece: open storage and the multi-access pattern
  • 9:12 – ClickHouse as a unified telemetry data store
  • 11:56 – Scale problem: petabytes of observability data daily
  • 12:31 – Introducing the real-time open lakehouse concept
  • 14:27 – The proof-of-concept stack: OpenTelemetry, Parquet, Iceberg, Arrow, S3
  • 17:38 – Live demo walkthrough
  • 23:36 – Production patterns and scaling this architecture today
  • 26:11 – OpenTelemetry Arrow and the road ahead
  • 28:07 – Real-time data with hybrid tables and Project Antalya
  • 29:02 – Pros, cons, and ecosystem partners
  • 34:49 – Q&A: eBPF, OpenTelemetry profiles, and more

Webinar Transcript

[0:00] – Introduction and Altinity Overview

Josh: Hello everybody. Welcome to our next Altinity webinar. Today we’re going to talk about my vision for the future of observability. We’re going to be combining OpenTelemetry, ClickHouse, and Apache Iceberg and a couple of other Apache projects. Let me just share my screen and we will dive right in.

Okay. So the future of observability is open. This is something that I believe very, very strongly. We’re going to talk about why. I’m Josh Lee. I’m an open source advocate at Altinity. Altinity provides services and support for ClickHouse. We help you run open source ClickHouse better. We do that both on our cloud or really anywhere that you can run a Linux binary. We also offer enterprise support. Altinity is not affiliated with ClickHouse, Incorporated. We are but humble open-source contributors.


[1:05] – Why Observability Begs to Be Open

Josh: This is something that I say in a lot of the talks that I give. I say this often: observability begs to be open. There are a couple of different reasons why, and there are a couple of different forms of openness. We’re going to dive into each one of those as we get into this talk.

So quickly, a brief history of how we got here. First of all, I love OpenTelemetry. I am a huge fan. I’m a little bit obsessed with it and I have been for the last several years. I have helped some pretty large companies roll out OpenTelemetry and have made some small contributions to the codebase, but mostly I’m just a huge fan.

Now, with everything that we love, if we love it enough, we’ll find that we have some criticisms and some complaints. That is also the case for me with OpenTelemetry. Here you’re looking at the title slide from a talk that I’ve given at a number of conferences with Adriana Vela from Dynatrace. In that talk we were digging into one of the problems with OpenTelemetry, not a technical problem but more of a marketing problem.

OpenTelemetry is often sold as a vendor-neutral tool, and that vendor neutrality is sold as a way that you can switch vendors like at the flip of a switch. That’s not really true, but it doesn’t mean that vendor neutrality doesn’t have tremendous value. That talk was inspired by a LinkedIn post I wrote a couple of years ago, digging into this mismarketing, this messaging that does not quite align with the reality of what it’s actually like to use OpenTelemetry and what it’s actually like to switch vendors. If you want to scan the QR code, you can see a video recording of us doing this talk at KCD Warsaw last fall. There are also many other instances you can find on YouTube.


[3:00] – The Three Forms of Openness

Josh: What it really comes down to for me with openness is this key question: can you take it with you? When it comes to observability, there are three different kinds of openness that I want to talk about. We have open instrumentation, we have open pipelines, and we have open storage. All of these are important for different reasons.

So first, open instrumentation. This is the first thing that we got from OpenTelemetry. We have this open specification and this open set of SDKs and APIs so that we can instrument our code in a vendor-neutral way. This gives us portable code artifacts. This is amazing for so many reasons because it means we can take our code and our artifacts with us when we change vendors. We don’t have to go rebuild every single thing we have deployed. It also means that vendors themselves no longer have to do the work of maintaining all of these different instrumentation libraries and tracing libraries and things and compatibility with every open-source framework and library under the sun. That is a benefit to vendors and also to the community, because it means the instrumentation for those things can in many cases be built right in. The vendor providing those tools does not have to choose an observability vendor as their favorite. They can build in OpenTelemetry and it will work with any vendor. So they get to build observability into their tool from the very beginning, or if they are bolting it on afterwards, at least they know the insides better than anyone else in order to build something like an instrumentation library. Open instrumentation: awesome. We have it. Thank you, OpenTelemetry community.

Then we get open pipelines. When companies were first adopting OpenTelemetry three or four years ago, this was still a little bit of a dream. Most vendors were having you ship your OpenTelemetry data to their vendor agent, and the vendor still sort of maintained ownership and control and support for the entire pipeline that ended up in their backend SaaS platform.

This has shifted dramatically in the last two years, where the OpenTelemetry Collector has matured and been deployed at massive production scales by companies like eBay. What that gives us, as people building these systems, is interoperability. We no longer have to choose just one tool to send our data to. We can fork and do fun things with the data. We can layer in different tools. We can build up the observability platform of our dreams from all of these interoperable Lego pieces. That’s really, really beautiful.

Here’s one of the use cases for this: you can use the OpenTelemetry Collector to do a bake-off. If you’re trying to choose between two different vendors, you can just send all of your telemetry to both of them and then choose which one you actually want to keep. You probably don’t want to keep both because that would get really, really expensive, but if you have to do a migration, just being able to do this for a short period of time is huge. It was not something that was in any way really possible before we had OpenTelemetry and open pipelines.


[6:12] – The Missing Piece: Open Storage

Josh: The piece that’s still missing, and this is something that we highlighted in that talk, is the storage layer. At the end of the day, you’re going to build this pipeline, but your telemetry is going to end up in some vendor’s database or in some open-source database that you’re managing, and it has sort of its own proprietary format.

What open storage would give us, if we had it, is the ability to take our telemetry with us. It’s like GDPR for your telemetry. Typically, organizations are really only concerned with the last 24 hours of telemetry, and in some cases the last two weeks. But there are cases where you might have seven years’ worth of logs that you’re required to keep for legal reasons. There are a lot of other reasons why you might want to take your telemetry with you when you migrate from one tool to another.

Again, the vendor neutrality is not about the day that you switch tools. It’s about everything working together all the time. Here’s another slide from that talk where I share a vision for what this could look like if vendors did open up their storage layer, maybe provide an API that lets me get my telemetry back out.

First of all, it means I can use additional analysis tools, maybe not provided by the original vendor I’m working with. Maybe I have some teams that want to analyze the data in a certain way. They can do that. They can get those extra capabilities. But what’s even more exciting is: this is my data. This is data that describes all of the operations of my organization. Now I can train my own machine learning models and AI models using that data and have complete ownership. I can also use other tools. Maybe my executives really want their reports in a specific way. With an API, they can keep using that tool when maybe the SRE team decides their analysis tool needs to change. It doesn’t disrupt everybody else’s workflow. So everybody can have what they want all the time. To me, that is the most powerful impact of openness.

So what open storage really gives us is not the ability to move, but it gives us this multi-access pattern where we can store our data once and use it for multiple things without having to pay for multiple expensive replicas of our data. Because storing data on things like EBS is great and very convenient, but it’s a little bit expensive. It also gives us complete and total ownership, which is really cool if we want to do things like train AI on it, or have AI reaching into it and providing analysis directly on the data, which is another emerging trend that I’ve seen a lot.

So just to recap this first section: why openness matters. Open instrumentation gives us the ability to take our code and artifacts with us. Open pipelines give us the ability to layer in interoperable tools. And open storage gives us the ability to use interoperable tools without paying for expensive replicas of all of our data.


[9:12] – ClickHouse as a Unified Telemetry Data Store

Josh: All right. Stepping back, you’re like: “Cool, Josh, but what about all of the fancy visualization and analysis that I get from my vendor? I really like that stuff and I don’t want to give it up.” You’re right. That stuff is really cool. However, I would posit that maybe it’s not as important as we used to think.

Something that I found is that if you can layer in OpenTelemetry, a database, and an MCP server capable of talking to that database, you really have about 80% of what a traditional observability platform would give you. You’re most of the way there. And for some small organizations, or for certain use cases, that might be all you need. You can get really, really far with this. Now with agents, they don’t really complain about running SSH commands or SSHing directly into a host to pull the logs, things that might have been challenging before when that knowledge would have had to be shared across multiple members of an SRE team.

So my first hot take: the visualization layer is quickly becoming the least interesting part of observability. One reason is this MCP access, but also because we all now have the ability to build our own visualization layer that is custom-tailored to exactly what we need and how we want to use it, based on the data, if we have access to the data.

Stepping back again, you’re like: “All right, this is cool, but now I have three databases to secure and scale and manage and learn, and they all have different query languages. Yeah, it’s a little bit of a nightmare.” But I have an answer for this, too. This is another talk that I have given at multiple conferences and online. You can find the recording at Observability Day from last year. It’s all about how I think ClickHouse is really a cool choice as a unified telemetry data store.

Rather than having a data store for each different signal type, and now we would add profiling to this as well, you kind of need something that looks more like this: all of your various sources, and that’s going to be a lot of sources at most organizations even at small scale, all feeding into a single pipeline. We now have OpenTelemetry as an awesome tool for this for most use cases, and Kafka as an additional tool if you really have a lot of volume.

And then finally, that feeds into a single storage layer. We could do this by wrapping APIs around different data stores, but I think it’s kind of cool if you can do it all with one data store. It turns out ClickHouse is a data store that now has pretty good capabilities for all of the signals that we might want to use it for.


[11:56] – The Scale Problem and the Real-Time Open Lakehouse

Josh: Okay, but what about scale? There’s a problem in observability, which is that the datasets are absolutely exploding in size. At Altinity we have some customers who are ingesting over two petabytes a day of observability data, and I am told by some friends who work at other institutions that that is small by their standards. This is just a tremendous amount of data, especially when EBS starts to get really, really expensive once you get into the terabyte and petabyte scales.

This is where the real-time open lakehouse comes into the conversation. There are a lot of words here. We’ll get into what each one of these means. They sort of start with this idea of a data lake and then layer on additional capabilities to make it something that might actually work for observability, and not just a place where data goes to die and become a data swamp.

So first we have data lakes. This basically just means you’re sending your data to something like AWS S3 object storage. You have abstracted away the storage into an API and you can just send whatever you want at it without really any concern for structure, organization, or metadata. That’s really useful. It gives us a place to just dump everything. Maybe after the fact we can use AI or automations to comb through that and derive meaning, but that is a slow and painful process.

Thanks to companies like Databricks, we now have this concept of the lakehouse. As my friend Will from Drumo likes to say: you take a lake, you put a little structure on it, and you’ve got a lakehouse. Really this just means we’re taking that mess of data, applying some schema and structure on it, and turning it into something that resembles a traditional database table that we can query with something like SQL.

And I did say this was going to be a real-time open lakehouse. What about the real-time aspect? The solution to this is something that Altinity has been working hard on for the last year and a bit, that we call Project Antalya. One of the features in Project Antalya is what we call hybrid tables. These are tables that use ClickHouse MergeTree as the first layer of your storage, but then seamlessly export and TTL that data off to something like AWS object storage using tools like Iceberg and Parquet. We’ll get into definitions of all of those for those who are not familiar.


[14:27] – The Stack: OpenTelemetry, Parquet, Iceberg, Arrow, and S3

Josh: So let’s talk about the stack. This stack that I’m going to show you is just a proof of concept, but we’re putting together OpenTelemetry as our source of instrumentation and telemetry, Project Antalya as our method of querying the data, and then we’re combining a bunch of Apache projects, Arrow, Parquet, and Iceberg, as this unified data layer. Finally, all of it lives on S3 object storage, which is about 12 times cheaper than using something like NVMe or EBS.

So just to quickly define each of the elements of this stack that you may not all be familiar with. I’m going to assume everybody’s familiar with AWS S3. OpenTelemetry gives us, as we’ve talked about, vendor-neutral instrumentation specifications and pipelines. Awesome. Yay.

Then we get Apache Parquet. If you haven’t heard of Parquet, it’s just a data format. Think of it like JSON for columnar data. Columnar just means that the data is laid out with all of the values for a single column next to each other. This is really, really useful if we need to do something like an aggregation or math over millions or billions of rows, which for observability we need to do a lot of. Parquet is a great way to store columnar data. It’s open. It can be read by many tools and written by many tools. It is usually stored on S3. Parquet by itself doesn’t really have structure. We can’t query just a folder of files very efficiently and know what’s inside each one of those files. This is where a catalog comes in.

To me, the most exciting catalog at the moment is Apache Iceberg. It’s not just one implementation. It’s a specification. What it gives us is an open and neutral metadata format that basically takes all of that Parquet data living as blobs on object storage and gives us metadata about it that lets us query it like a traditional table. It also gives us ACID transactions. We basically write to the Parquet optimistically, and then once our writes have completed and we know that there were no errors, we can go and make a commit to the Iceberg catalog and create a new snapshot that shows the new point in time and gives the Iceberg catalog access to the new data. This is awesome. It’s also a little bit slow, and that’s why hybrid tables is going to come back into the conversation soon.

And then finally, something I’m really, really excited about is Apache Arrow. This is involved in this stack in a couple of different ways. One of them is just through ADBC, which is a replacement for ODBC or JDBC. It’s basically a more performant way to send data to and from columnar databases because the data doesn’t have to be marshaled and unmarshaled into rows to create JSON or something like gRPC before it can be sent over the wire. It just stays in this columnar format. It is a stateful protocol, which does have a slight performance overhead in that you have to manage some shared state between the client and the server, but that little bit of shared state then gives you huge performance gains in terms of how much data you actually have to send over the wire, because you can dictionary-encode it, which is really, really cool.

And then there’s another thing that we’re going to talk about with Arrow: the OpenTelemetry Arrow project. I’ll get to that.


[17:38] – Live Demo Walkthrough

Josh: So, a quick demo. This is a little proof-of-concept demo that I have put together using most of these tools that we’re talking about. It is available at this GitHub repo. This is very much not a production demo, but you could use these concepts to build something for your production use case. I’ll talk about how after we get through the demo.

To start, let’s just look at the GitHub repo. We have this repo, and here’s a nice Mermaid chart that Claude built for me. Claude is great at working with Markdown and also Mermaid. Here’s the basic architecture of our demo. We have any kind of OTLP data coming into an OpenTelemetry Collector. Really, this could be anything. It doesn’t need to be OTLP. It could be anything that the OpenTelemetry Collector has a receiver for.

We’re using this OpenTelemetry Collector both to provide our OTLP endpoints and to create batches, because when we’re writing to Iceberg and Parquet data, if we were writing for every single log message or metric that came in, we would end up with a ton of tiny little files that would absolutely kill our file system and our processor. So we don’t want to do that. We need to get nice big batches. Just for this proof of concept, I believe I’m using one-second or one-thousand-row batches, which is smaller than you would want for a production use case.

Then we have this tool made by Clay Smith over at ServiceNow. It’s called OTLP2Parquet. It’s also available on GitHub. This simply takes that OpenTelemetry data. It essentially works like a collector exporter and it is writing it to Parquet data on S3. For the purposes of this Docker demo, the S3 implementation that I’m using is RustFS, but you could use your own. There are many open-source implementations or you could just use AWS S3 as the canonical implementation.

Then we have this log sync service that was what you briefly saw on my Neovim screen before I moved over here. This is actually just a little bash script that every 60 seconds runs and checks the S3 bucket for new files. If there are any new Parquet files, it tells the Iceberg REST catalog about them and basically commits them to the view of the world that our ClickHouse instance is using to understand what exists in the table. And then ClickHouse, based on that metadata, can actually fetch the raw columnar data from S3.

This is a lot of components, but amazingly, if you turn off the telemetry generator, which I have tuned pretty high just to push things for this demo, the whole stack will run on 4 GB of RAM on a pretty basic machine. I developed this on my little Framework 12, which is just a wimpy little i3 CPU. So you don’t need a lot to run this.

Then we get into some of the descriptions. The readme explains how this actually all works. We’re describing the OpenTelemetry schema that we get from that OTLP to Parquet tool, very closely matching the schema that you would get if you were just using the ClickHouse exporter itself.

And if we come over here, we can see the actual logs coming in. I have an OTel generator running that’s just generating logs for three different services. These are fake logs. The data here is not really the interesting part. There’s a slight delay, because of the way that the log sync service is making commits to the Iceberg catalog. We only actually get new data every 60 seconds. But even on my little i3, I was able to temporarily turn this up to 10,000 logs per second and this just kept working, even with all of those different layers and different technologies working together. This is really, really cool. This is really powerful with not a lot of computing power behind it.

And of course, because all of this data is stored in an Iceberg catalog but we’re accessing it through ClickHouse, we get all of the SQL semantics that we would expect from ClickHouse to slice and dice this data any way that we want. If we look at this panel, we can see the simple query that’s running to generate this view. We’re just bucketing the logs and then showing the last 100 sorted by timestamp. But we could do a lot more complicated things with this.

I recently wrote a little resource that you can find at altinity.com/useful-observability-queries. It gives you a bunch of more complicated queries that you can run with this ClickHouse schema, not just logs but also traces, metrics, and profiles.


[23:36] – Production Patterns: Scaling the Architecture Today

Josh: All right. So that’s a demo. It’s really cool. I encourage you to play with it, build something else in its image. I think you can really take this very far just with this pattern.

When it comes to Iceberg catalogs, which are one of the key pieces enabling this, you have a couple of options. Probably the canonical option would be the Apache Polaris project. It is by far the most mature implementation of an Iceberg catalog. We also have our own little Iceberg catalog that we wrote to use in our cloud. We call it the Altinity Ice REST catalog. If you’re running ClickHouse on Altinity.Cloud, you can actually turn this on just by clicking a button and you will then have an Iceberg catalog that your ClickHouse instances can talk to. Another implementation that I’m really excited about is LakeKeeper by Vakamo. This is an open-source project that provides a unified control layer and has some very strong features for access controls and things like that to your Iceberg catalog, which is not necessarily built into the Iceberg spec.

Okay, so I said this is just a proof of concept, and you’re like: “Right, but I have production systems that I need to monitor. Can I do that today?” For the most part, yes. You actually could scale this up to production scale today using the currently available technologies. I’ll show a couple of different patterns for how you could do that.

First of all, you could just run the OTLP2Parquet binary in your cloud. You can write things to AWS, and then rather than having a bash script to make your commits to your Iceberg catalog, you could use Apache Spark, which has great support for Iceberg and can do those periodic reads of what the new data is and then commit it to the Iceberg catalog, making it available for your ClickHouse cluster, which could then make it available to things like Grafana or an MCP server, or both.

Another way you could do it is with another tool by Clay Smith called OTLP2 Pipeline. This runs on AWS Lambdas and is designed to talk directly to AWS, and can also commit the changes directly to Iceberg so you don’t need an extra step. I will say with both of these tools, there should actually be an OpenTelemetry Collector right in front of them just to handle batching, because neither one of these tools handles batching. The OpenTelemetry Collector has the batch processor, which is awesome, and you can fine-tune it to get exactly the right size Parquet files for your use case.


[26:11] – The OpenTelemetry Arrow Project and the Road Ahead

Josh: This is not quite possible yet, but this is on the roadmap for the OpenTelemetry Arrow project, which is a very, very exciting project to me, bringing the performance benefits of the Arrow wire protocol to OpenTelemetry components.

Right now what you can actually do today with OpenTelemetry Arrow is you can use it to have OpenTelemetry Collectors talk to each other, which is especially useful if you are, for example, sending data between regions and you don’t want to pay for all of that expensive transit. You can use the OTel Arrow exporter and you’ll be sending the most compact form of that data possible in order to get it to the other side.

But on the roadmap for the OpenTelemetry Arrow project is the ability to write directly to Iceberg, and of course writing the Parquet files to the object storage as part of that. This would remove the number of components that you have, and once your batches are saturated they get committed directly to Iceberg where they can be immediately read. You don’t need any other extra tools other than the OpenTelemetry Collector that you’re probably already running. I’m really excited for this. We’ll see when it actually becomes available. You can follow that through the CNCF OpenTelemetry Slack.

One thing I want to point out: in all of these different architectures that I showed, the outside of your observability platform doesn’t really have to change at all. At the end of the day, all of these different patterns are taking OpenTelemetry in and reading it with SQL through ClickHouse. So your external tools, your services that are emitting telemetry, and your services that are analyzing and visualizing that telemetry never have to change. You can swap these components as they evolve, as features become more complete or have the features that you want. You can make these changes and build the backend system that performs the way you need it to perform without changing the surface area at all.


[28:07] – Real-Time Data with Hybrid Tables and Project Antalya

Josh: I did say we would talk about real time again. So let’s quickly talk about that. If we wanted this to be truly real time, we would need the data to land immediately in ClickHouse and immediately be available for query. By immediate, I mean in a time frame of like one second or so. This is actually possible with hybrid tables in Project Antalya. They’re in what I would call an alpha state right now. We do have some customers using them in production. It’s fully open source, so if you want to take it for a spin and see how it works for you, absolutely I would encourage you to do that. Please do that. Please reach out to me on Slack and tell me how it goes, whether it goes well or not. I want to hear about it.

And if you want to follow along with the improvements and when this might be ready for less of a proof of concept and more of a battle-tested state, you can follow along. This GitHub issue contains our 2026 roadmap for Project Antalya, including all of the things that will be happening with hybrid tables, making them more feature complete.


[29:02] – Pros, Cons, and Ecosystem Partners

Josh: It’s actually going to be kind of a quick webinar. So I just have a few pros and cons to talk about, and then I will open it up for questions.

Some of the benefits of this architecture. There are more, but these are my favorites. One is you get practically limitless scale. You just don’t have scaling problems anymore because S3 grows to essentially infinity and you can just keep throwing more and more data at it. The Iceberg catalog is going to make sure that your query engines like ClickHouse only need to read the data that they actually care about.

You get completely open access, which means any tool that you want can access the data, whether through ClickHouse or whether you want to attach another query engine like Apache Spark to that Iceberg catalog. You can have multiple processes reading and writing from the same catalog. You could even have Postgres writing data and ClickHouse reading data to do high-speed analytics using that OLAP niceness from ClickHouse. This is all open.

Everything in this stack is open source. As I mentioned, I built this demo on my little i3 Framework 12, so it will really run anywhere. You could run it on a Raspberry Pi, and of course you can scale it up to 192-core clusters. This architecture makes MCP really, really awesome. It’s a great way to interact with well-structured observability data that follows the OpenTelemetry protocol, which is very well known to agents. And finally, it gives you SQL-based observability, which as a SQL fanatic I love.

Not everybody is a SQL fanatic, so one of the first limitations that I will mention is that yes, you’re now using SQL for your observability. If you’re a huge fan of PromQL, you might see that as a problem. If you have tools that depend on PromQL, this might be a problem, asterisk, because PromQL support is now in a very strong beta inside ClickHouse. So this is only going to get better, and you’re going to be able to use other query languages and other query engines to get access to your data in the way that you want. You’re not stuck with SQL.

It’s not quite real time yet, but we talked about how that’s solved through hybrid tables. There is some compute overhead in the pipeline right now, especially if you’re using the components that I used in my proof of concept. However, a lot of that is going to be solved by things like Apache Arrow. And finally, there are a lot of moving parts here, a lot of stuff to maintain. Although if you’re using Altinity.Cloud, you just click a button and we will take care of the hard parts for you.

So really, I think what I would say is: this is really exciting. If you want to be a brave pioneer and start doing this now, you don’t have to go alone. There are companies out there that can help you build on top of these open-source technologies without needing to use anything proprietary in your stack.

Some of these companies that I’m a huge fan of: ODIGOS, which has an eBPF-powered node instrumentation agent that you can use to get visibility into your services that may not have been instrumented or that may not have instrumentation libraries. That’ll come out in an OpenTelemetry-compatible format. You can use Coroot, which also has an eBPF-powered agent for that node instrumentation. Coroot also combines a backend for analysis. That backend is powered by ClickHouse, so I’m sure you could tweak it to use something like hybrid tables if you wanted the benefit of that S3 storage and all the cheapness and scalability that comes with it. And Coroot also uses, I believe, VictoriaMetrics under the hood for your metrics, for your time series, which does solve that SQL/PromQL problem if that’s a problem for you.

BindPlane, congratulations to BindPlane. It was just acquired by Dynatrace, or that acquisition was just announced. BindPlane basically handles the pipeline piece. Those OpenTelemetry Collectors, as mature as they are, still can be a little rough around the edges sometimes. The documentation is occasionally a little behind or a little aspirational. There are tools like OpAMP that you can use to actually manage your collectors, you know, a fleet of collectors at scale. BindPlane makes all of that a lot easier for you.

Observability Garden, or OllyGarden, is a brand-new company that I’m very excited about. They are specifically set up to create tools to help you both with the pipeline, but specifically with the quality of your telemetry and how well your teams are following semantic conventions of OpenTelemetry, or custom semantic conventions that you may have defined for your organization, to make sure that your telemetry is usable. You can actually derive insights from it, and it’s describing things in consistent language and using consistent values.

And then of course I mentioned it a few times, but Altinity.Cloud. We provide a managed service for ClickHouse that you can run anywhere, and we are happy to provide enterprise support for that, especially in Kubernetes-based environments but also on bare metal if that’s your flavor.

Really, it’s only going to get better from here. I’m really, really excited for where this is going to go in the next couple of years.


[33:53] – Closing Remarks and Resources

Josh: So that’s it for now. This QR code will take you to the demo repo. If you have questions about this, if you have thoughts about this, if you want to tell me I’m crazy and this will never work, please join our Slack. I would love to hear from you. We’re hiring. So if you think this stuff is as cool as I do and you want to nerd out with me about it, also please join our Slack and ping me. Head over to altinity.com/careers and take a look at our open roles.

And then finally, I will leave you with some resources. If you’re here, these are also going to get emailed to you in the follow-up. No need to screenshot this page. But yes, these are some resources that I mentioned that you may want to review after this. And now I will go back to questions.


[34:49] – Q&A: eBPF and OpenTelemetry Profiles

Josh: Okay, I see two chat messages. Yes, the slides will definitely be available.

All right. eBPF. Could I talk more about eBPF? Absolutely. eBPF is built into Linux. It emerged, I think, around 2016, but don’t quote me on that. Originally it was used for security. Basically what it does is it lets you safely write little programs using a subset of C that run in kernel space but are provided by a user-space program, and they can observe the kernel operations of other programs. With eBPF probes you can do things like check all of the network connections for other processes, and then if you can see a network connection you can do things like start a trace span based on that information.

How well does ClickHouse do for profiles in terms of performance? I wish I had more thoughts on that. I think profiles, ClickHouse is probably best suited of all the tools that I’m familiar with, but I think we need some work defining the schema for that that actually leverages ClickHouse in the right way to make it as performant as it can be. I’m hoping to explore that with some of our customers. And yes, OpenTelemetry profiles is going stable soon. I think the alpha spec was released a couple of months ago.

All right. Well, if that’s it, then thank you everyone. We’ll email you the slides and the resources. Thanks so much for coming. We’ll also post this on YouTube if you want to share it with anyone who wasn’t able to make it. See you next time.


FAQ Section

What is the “real-time open lakehouse” for observability? It is an architecture that combines ClickHouse as a high-speed query engine, Apache Iceberg as an open metadata catalog, and Apache Parquet files stored on S3 object storage. Hot data lands in ClickHouse MergeTree for immediate query, and older data is tiered automatically to Iceberg/Parquet on S3, which costs roughly 12 times less than block storage. The result is a single SQL-queryable layer that can grow to virtually unlimited scale.

What is Project Antalya and what are hybrid tables? Project Antalya is an open-source branch of ClickHouse developed by Altinity. Its headline feature is the hybrid table engine, which transparently combines a fast MergeTree segment for recent data with one or more Iceberg-backed segments for older data. Queries span all segments automatically. The 2026 roadmap, tracked publicly on GitHub, includes making hybrid tables more feature-complete and production-ready.

Why does observability data need open storage, not just open instrumentation and open pipelines? Open instrumentation (OpenTelemetry SDKs) lets you take your code with you when you change vendors. Open pipelines (the OpenTelemetry Collector) let you route data to multiple tools simultaneously. But without open storage, your historical telemetry is still locked inside a vendor’s proprietary database. Open storage enables a multi-access pattern where the same data can feed SQL analytics, machine learning pipelines, executive dashboards, and AI agents, all without paying for expensive duplicate copies.

What is the OpenTelemetry Arrow project and why does it matter? OpenTelemetry Arrow is a CNCF project that brings the Apache Arrow wire protocol to OpenTelemetry components. Today it allows OpenTelemetry Collectors to talk to each other using a highly compressed columnar format, reducing cross-region data transit costs significantly. On the roadmap is the ability for the OpenTelemetry Collector to write directly to Apache Iceberg and Parquet, which would eliminate several intermediate components from the observability stack.

Can I use PromQL with this ClickHouse-based architecture? Yes, increasingly so. PromQL support is currently in a strong beta inside ClickHouse. Josh notes that users are not locked into SQL. As PromQL support matures and additional query engines can be attached to the shared Iceberg catalog, teams can choose the query language that fits their existing tooling and workflows.

How much infrastructure do I need to run this proof-of-concept stack? Surprisingly little. Josh developed and ran the demo on a Framework 12 laptop with an Intel Core i3 processor. With the telemetry generator tuned down, the full stack runs comfortably in 4 GB of RAM. It was also tested at 10,000 log events per second on the same hardware. Production deployments would of course use more resources, but the architecture scales horizontally by adding more compute, not by migrating to a different system.


© 2026 Altinity, Inc. All rights reserved. Altinity®, Altinity.Cloud®, and Altinity Stable® are registered trademarks of Altinity, Inc. ClickHouse® is a registered trademark of ClickHouse, Inc.; Altinity is not affiliated with or associated with ClickHouse, Inc. Kubernetes®, MySQL®, and PostgreSQL® are trademarks and property of their respective owners.

Join our Slack

ClickHouse® is a registered trademark of ClickHouse, Inc.; Altinity is not affiliated with or associated with ClickHouse, Inc.

Related:

Leave a Reply

Your email address will not be published. Required fields are marked *