Why You’re Rebuilding Your Data Platform Again
This post is a summary of the first episode of Altinity’s Unevenly Distributed podcast, where Open Source Dev Advocate and host Josh Lee chats with Brad Heller, CTO and co-founder of Tower. They cover data architecture evolution, the open lakehouse movement, and why you probably will rebuild your data platform again (and why that’s okay).
Introduction
The data world is shifting again. After years of closed systems and vendor lock-in, the pendulum is swinging back toward open architectures. In this episode, Brad explains what’s driving that change and why the lakehouse model could redefine how we store, query, and manage data.
He describes the growing push for open storage, shared table formats, and portable metadata that let data move freely across tools, not stay trapped inside one platform. The lakehouse sits at the center of this change: a way to store data once and query it anywhere.
From the Hadoop hangover to the rise of AI-driven workloads, Brad explains why this shift feels different, and what it means for teams balancing cost, control, and cloud dependence. Along the way, he touches on emerging standards like Apache Iceberg, the “Deltaberg” convergence, and that even by 2035, on-prem won’t go anywhere.
Episode Highlights
- [7:34] The 1% Problem: Why Most Data Goes Unused
- [11:27] The Pendulum Swing: Open vs Closed Ecosystems
- [13:51] AI Workloads Don’t Fit Traditional Warehouses
- [15:15] What Is a Lakehouse? (Definition & Benefits)
- [18:22] How Open Formats Break Vendor Lock-in
- [22:20] Build vs Buy: Framework for Decision Making
- [24:16] Why Choose Open Formats Even with Vendors?
- [30:13] Real-World Iceberg Adoption & Cost Savings
- [34:50] What’s Still Broken in the Developer Experience
- [43:31] Spicy Take: Delta & Iceberg Will Converge
- [45:00] On-Prem Will Never Die (Controversial Opinion)
So What Exactly Is a Lakehouse?
A lakehouse stores data in an open object store with consistent metadata, allowing warehouse-like SQL queries across multiple engines. It combines the openness of a data lake with the managed feel of a warehouse, without locking into a vendor‑specific storage format.
The Open Lakehouse Paradigm Shift
Brad identified a recurring pattern in data architecture: a pendulum swinging between open ecosystems (Hadoop) and closed ones (Snowflake, Databricks). And now it’s swinging back with open lakehouses, driven not just by cost but by AI and streaming workloads that don’t fit traditional warehouse models.
The 1% Data Problem
Most organizations actively use less than 1% of their stored data to drive their business. Brad calls it a “hangover” from the Hadoop era’s store-now, figure-it-out-later mentality, creating underused data estates that drive up storage costs with little return.
Open Table Formats as Strategic Moats
Modern table formats like Apache Iceberg, Delta Lake, and Apache Hudi are existential threats to data warehouse vendors. “The traditional moat was that your data is in our system,” Brad notes. “Now I can use Snowflake for one thing and ClickHouse® for observability without moving data.” Even on managed platforms, he recommends open formats like Delta or Iceberg because the optionality of using other engines on top of that is crucial.
The Three Generations of Metadata Management
Brad explains how Iceberg’s metadata approach evolved from file-based systems to metadata stores like Dynamo or Hive, and now to REST catalogs that standardize APIs and handle concurrency. New open-source catalogs, such as Polaris and Lakekeeper, are setting the standard for managing lakehouse metadata.
Why Architectural Shifts Keep Happening
The developer experience still breaks down when working with the cloud from your laptop, often unraveling into Docker files, manifests, and Kubernetes clusters. Brad argues the solution involves giving developers the same experience across both platforms.
Why Open Format Adoption Is Accelerating
Adoption of open table formats is accelerating as teams pursue interoperability across tools and engines. “Pick one and focus on it,” Brad advises. “And I would pick Iceberg.” He predicts that, within five years, there will be an open format that everyone has to support and speak in.
“Deltaberg” Convergence
“If I were to put on a tinfoil hat, I think Delta and Iceberg are going to converge at some point. Both are adding features out of the other’s camp.” Whether through merger or convergent evolution, a unified format could reshape the ecosystem.
On-Premise Is Being Redefined
“We’ve been talking about the transition to the cloud since 2012. In 2035, we will still be talking about on-prem data and the transition to the cloud.” While cloud accounts for an increasingly large portion of IT spend, On-premise workloads are being redefined, not eliminated.
The Build vs. Buy Decision Framework
If a capability is core to your business and you’re good at it, build it; if it’s not core or you don’t have the chops, buy it.
Cloud Data Transfer Costs
Transfer costs remain a significant barrier. “Even just getting it to those vendors in the first place can be expensive if you’re going to cross clouds.”
Environmental Costs of Data Storage
“There was this Reddit article of some folks in California being encouraged to delete emails from their inbox, as a way of conserving water. Data centers need water for cooling.” The data lake approach of keeping everything forever has real environmental consequences.
The Python Imperative for Data Engineers
With the roles of data scientist and data engineer blurring, and Python at the center of this convergence, Brad emphasizes that Python is a must for newcomers.
Buddhist Data Engineering Philosophy
We’re going through a big AI paradigm shift now, and whatever comes next will cause another. “Everything is temporary. We will throw everything away eventually. So if you are stressed out because you have to build your data platform again, try not to be.”
Listen to the full conversation on the Unevenly Distributed podcast, available on Spotify, Apple Podcasts, and YouTube. Brad is most active on LinkedIn at /in/bradhe and on GitHub as bradhe. To learn more about Tower, visit Tower.dev. For more insights on ClickHouse and real-time data architecture, visit our blog.