THE POWER OF OPEN SOURCE AND ANALYTICS

Open Source Analytics Conference

2 November 2021

OSA CON 2021 – The Open Source Analytics Conference

Open source is enabling a new wave of high-performance, analytical systems that explore data from digital businesses. At OSA Con 2021, we connected developers of advanced open source software and developers of the innovative analytic applications that use them.

Missed out on OSA Con 2021? That’s okay. You can see it all on-demand below. Enjoy!

Keynote Panel Discussion

Alexey Milovidov
CTO & Founder
Gian Merlino
Co-Founder

OSA Con 2021

The Future of Open Source
Analytic Databases

The Future of Open Source<br>Analytic Databases
Low latency, open source analytic databases like ClickHouse®, Druid, and Pinot are leading a revolution in the way businesses analyze data. Our panel discussion introduces each database and then provides Q&A with lead engineers on each project. We’ll cover topics ranging from the use cases each project handles, technologies for high performance, the role of communities in advancing technology, roadmaps, and more. It’s a great opportunity to understand the differences between projects and where they are headed.
Chinmay Soman
Founding Engineer
Tim Meehan
Software Engineer

The Power of Open Source

The core that unites us here is the open-source communities. In this talk track, you can learn from experts about community management and open source innovation. Innovation is what is driving such rapid growth due to the hundreds and thousands making contributions daily. Learn about the impact of what a strong community can make.

OSA Con 2021

Surveying the Open Source Analytic Universe: Data and Anecdata

Surveying the Open Source Analytic Universe: Data and Anecdata
Open source offers a complete replacement for expensive, proprietary analytic software stacks. Not surprisingly, analytic projects on Github are extremely popular. This talk presents stats on some of the most popular analytic projects and talks about the creative ways people are using them.
Presentation Slides
Robert Hodges, CEO
Presenter Logo

OSA Con 2021

Managing Transactional and Analytical Workloads with Open Source Databases

Managing Transactional and Analytical Workloads with Open Source Databases
In the modern enterprise, you often need to run transactional and analytical approaches on the same data. We will look at approaches you may employ such as ELT/ETL,  cross-database replication and HTAP (Hybrid Transactional – Analytical Processing), then survey how open source technology is evolving to support them.
Peter Zaitsev, CEO
Presenter Logo

OSA Con 2021

Broadening Community Engagement in Your Open Source Analytics Community

Broadening Community Engagement in Your Open Source Analytics Community
In this session, we will discuss the importance of community and ways to cultivate your open-source analytics community. We will discuss the need to look beyond code and the importance of onboarding, community tools, and creating a sense of belonging for everyone.
Presentation Slides
Ray Paik, Head of Community
Presenter Logo

Analytic Applications

Analytic applications turn data into insight. They offer simple interfaces that allow users to pose questions and receive answers, often shown through creative graphics, that lead to productive actions in the real world. The OSA Con talks in this area show you just a few of the many possibilities that open source enables.

OSA Con 2021

Using ClickHouse Open Source Columnar Database for Satellite Communication Data

<meta charset="utf-8">Using ClickHouse Open Source Columnar Database for Satellite Communication Data
Meo describes how we implemented ClickHouse Columnar database to store satellite communication raw data for an important production project: Eutelsat. Eutelsat is one of the most innovative operators in the commercial satellite business. The Eutelsat Group offers capacity on 36 satellites that provide premium coverage of Europe, Africa, the Middle East, Asia and the Americas. This talk covered how Eutelsat is using ClickHouse as a core database component for its most recent projects.
Presentation Slides
Meo Bogliolo, Database Administrator
Presenter Logo

OSA Con 2021

Bringing Clarity to Cancer with an Open-Source Analytics Platform

Bringing Clarity to Cancer with an Open-Source Analytics Platform
In this session, Tanvi and Stephen will discuss how open source software helped build analytics solutions at COTA and how analytics is key to helping COTA’s customers with cancer care and research. They will also demonstrate how open source helped COTA team deliver with higher quality, greater reliability, more flexibility in their analytics.
Presentation Slides
Tanvi Pal, Sr. Software Engineer
Stephen Jakubowicz, Product Manager
Presenter Logo

OSA Con 2021

Analytics vs. Privacy: What Data Are We Allowed to Collect and How?

Analytics vs. Privacy: What Data Are We Allowed to Collect and How?
Lawmakers constantly introduce new laws limiting our ability to collect user behavioral data. GDPR (EU), PECR (UK), CCPA (California) are the main ones. How can we build an analytics system that complies with all those laws and collects meaningful amounts of user data at the same time? This talk lays out the problem and illustrates how we solved it in Jitsu.
Vladimir Klimontovich, Founder & CEO
Presenter Logo

OSA Con 2021

Analytic Trends & Data Engineering

Analytic Trends & Data Engineering
In 2017, I wrote two blog posts about data engineering: “The Rise of the Data Engineer” was an attempt at defining the emerging role , and “The Downfall of the Data Engineer” was exposing some of the challenges [and opportunities] around the role. 4 years later, it’s a good time to revisit all of this and explore what has changed, from the tool landscape to the role & responsibilities.
Maxime Beauchemin, CEO & Co-Founder
Presenter Logo

OSA Con 2021

Distributed Tracing Using ClickHouse @ EBAY

Distributed Tracing Using ClickHouse @ EBAY
ClickHouse has become a popular backend of choice in EBAY. The telemetry and monitoring team uses ClickHouse backend to enable OLAP, structured logs, and distributed tracing use-cases. In this session, we will talk about how EBAY has leveraged Kubernetes to build cloud-native and region-aware solutions like Distributed Tracing on top of ClickHouse backend. The session will also cover user experience on Grafana for all the different platform telemetry solutions that use ClickHouse backend data.
Presentation Slides
Sudeep Kumar, Staff Engineer
Amber Vaidya, Director of Engineering
Presenter Logo

OSA Con 2021

How PostHog Found our EventMansion

How PostHog Found our EventMansion
The story of how we migrated our Product Analytics Platform from Postgres to ClickHouse. 
– Why did we decide to call ClickHouse home?
– Our favorite features of ClickHouse.
– Sharp edges that we have cut ourselves on like Mutations.
– Ways we have improved performances as we’ve grown exponentially in the number of events received.
– How do we deploy ClickHouse on-prem for our customers (spoilers we had a lot of help from Altinity)…and much more!
Presentation Slides
James Greenhill, Platform Team Lead
Eric Duong, Software Engineer
Presenter Logo

Open Source Analytics Projects

Analytics Projects tend to handle one of three problems: load data and possibly transform it on the way in; provide a database to contain the information and answer questions quickly, and finally help users visualize and consume the answers. You’ll find talks across all of these topics in these sessions below and learn about what is possible. Enjoy!

OSA Con 2021

Succeeding with Apache Druid and Clickstream Data

Succeeding with Apache Druid and Clickstream Data
The Apache Druid database accelerates OLAP-style analysis, providing awesome front-end experiences for end-users.  Clickstream data requires special attention, and in this talk Peter Marshall (Technology Evangelist in the Imply community team) reveals his study into Clickstream use cases built with Druid.  From enrichment to funnel analysis, they highlight things to consider and discuss strategies when using Druid in a  Clickstream use case.
Presentation Slides
Peter Marshall, Technology Evangelist
Presenter Logo

OSA Con 2021

How Open Source Enables Innovation: A Case Study with dbt and Lightdash

How Open Source Enables Innovation: A Case Study with dbt and Lightdash
Our presentation is about the standards of a good open-source analytics tool, standards that will create the environment to allow developers to build how they want. The story will revolve around how dbt created this space that enabled community members to contribute features like the meta field, and then how the Business Intelligence tool Lightdash used this meta field to build their tool on top of it.
Presentation Slides
Amy Chen, Partner Engineer, dbt
Katie Hindson, Head of Product and Data, Lightdash
Presenter Logo Presenter Logo

OSA Con 2021

How ClickHouse Inspired Us to Build a High Performance Time Series Database

How ClickHouse Inspired Us to Build a High Performance Time Series Database
There is one single truth about monitoring in any organization: the volume of metrics is constantly growing. This growth isn’t always correlated with hardware or human resources. At VictoriaMetrics we strive to make our solution as efficient as possible and ready for any kind of volume growth. The talk covers the internals of the processing pipeline inside the VictoriaMetrics TSDB, the architectural decisions we made and the optimizations we use for getting the highest performance possible.
Presentation Slides
Aliaksandr Valialkin, Co-Founder & Principal Architect
Presenter Logo

OSA Con 2021

Do We Still Need People To Write Database Systems?

Do We Still Need People To Write Database Systems?
Database management systems (DBMSs) are complex software. They are typically worked on by an elite few seasoned in writing performant code that requires strict correctness guarantees. The unwashed masses can then use these DBMSs without concerning themselves with the intricacies of their database’s internals. But there is a new trend towards replacing traditional, hand-optimized DBMS components written by elites with “learned” components that rely on machine learning (ML). Such learned components include index data structures, query optimizers, and configuration managers. Proponents of ML methods argue that they reduce the engineering overhead of DBMSs by automatically learning the best strategies, thereby reducing the reliance on elite programmers.

In this talk, I discuss the recent trends in both modern human-devised DBMS optimizations and learned DBMS components. I will cover both academic research and real-world implementations.
Andy Pavlo, Co-Founder
Presenter Logo

OSA Con 2021

Reverse ETL with Grouparoo

Reverse ETL with Grouparoo
You have invested in your stack and now the warehouse has the data and insights to drive your business. See how open source Grouparoo can put current analytic data to work in the tools your business uses like Salesforce, Mailchimp, and Zendesk.
Presentation Slides
Brian Leonard, CEO & Co-founder
Presenter Logo

OSA Con 2021

The Future of Open Source Analytic Databases

The Future of Open Source Analytic Databases
Most people ask to consume data in their organization through dashboards, but crafting a dashboard that drives impact is a massive undertaking. In this talk, I’ll showcase my framework for designing and building effective dashboards that actually get used. I will be using Apache Superset as the BI platform of choice, but the lessons generalize to pretty much any other tool.
Srini Kadamati, Data Scientist & Sr. Developer Advocate
Presenter Logo

OSA Con 2021

Hello Hydrate! From Stream to Clickhouse with Apache Pulsar and Friends

Hello Hydrate! From Stream to Clickhouse with Apache Pulsar and Friends
An empty real-time SQL data warehouse is not useful to anyone. How can you load data quickly from diverse data sources? Utilizing open source tools from Apache, the FLiP stack enables any data engineer, programmer or analyst to build reusable modules with low or no code. We’ll show how to use them to load CDC, logs, events, XML, images, and many other types of data into ClickHouse and similar data warehouses.
Presentation Slides
Timm Spann, Developer Advocate
Presenter Logo

OSA Con 2021 Panel Discussion

Open Data Lakes with Presto, Apache Hudi & AWS Glue & S3 – The Next Generation of Analytics

<meta charset="utf-8">O<meta charset="utf-8">pen Data Lakes with Presto, Apache Hudi & AWS Glue & S3 – The Next Generation of Analytics
This moderated panel discussion features experts from each layer in the PHAS3 stack – Presto, Apache Hudi, AWS Glue and S3 – discussing next-gen cloud data analytics and how these technologies enable open, flexible, and highly performant analytics in the cloud.

This discussion was moderated by Eric Kavanagh, CEO, The Bloor Group.
Dipti Borkar, Co-founder & CPO, Ahana
Vinoth Chandar, Creator & VP, Apache Hudi
Roy Hasson, WW Analytics Specialist Leader, AWS
Presenter Logo Presenter Logo Presenter Logo

OSA Con 2021

Data Rivers: The New Analytics Architecture

Data Rivers: The New Analytics Architecture
We know all about data lakes, swamps, lake houses, so why not a Data River?  In this talk, we will take a short trip through the data analytics landscape and learn what is a data river, how did the architecture evolve and when you should consider it for your modern data application.
Rachel Pedreschi, VP of Community
Presenter Logo

OSA Con 2021

Design Decisions Behind Cube Store

Design Decisions Behind Cube Store
While production datasets often have billions of rows, analytical queries can be satisfied with much smaller volume of aggregated data. In this talk, I will reveal the design decisions behind Cube Store, an open-source rollup storage layer that allows access to data warehouses with sub-second latency and high concurrency. I will also show how open-source tech such as Apache Arrow, DataFusion, and Parquet help build modern analytical databases.
Pavel Tiunov, Co-founder & CTO
Presenter Logo

OSA Con 2021

Simplifying Data Analytics Using RudderStack Open-Source Data Pipelines

Simplifying Data Analytics Using RudderStack Open-Source Data Pipelines
RudderStack is open-source data pipeline software that helps send data from any source to any destination. In this talk, I want to explain the RudderStack architecture and how it transports data from websites to data warehouses such as ClickHouse, Snowflake, etc. I explain how this works through an understanding of the RudderStack architecture along with features like user transformation, which enables the real-time transformation of events. If time permits, I will show a demo of sending data in a reliable way.
Sumanth Puram, VP Engineering
Presenter Logo

How Can I Find Out More About 2022?

Send an email to conference@altinity.com for more information or sign up for updates below.

Thank You to Our Community Partners

We could not have pulled this together without your help promoting and locating speakers.
This is a community of communities and we are so happy to be a part of it all. Here’s to you all!

Imply

Imply is a full stack, multi-cloud data platform. It is built around Apache Druid, a widely-adopted open source real-time analytics database architected to power data-driven applications.
imply.io

Percona

Percona is a leading provider of unbiased open source database solutions that allow organizations to easily, securely and affordably maintain business agility, minimize risks, and stay competitive.
percona.com

Ahana

Ahana, the Presto company, offers the only managed service for Presto on AWS with the vision to simplify open data lake analytics. Presto, the open source project created by Facebook and used at Uber, Twitter and thousands more, is the de facto standard for fast SQL processing on data lakes. 
ahana.io

Preset

Preset empowers teams of all skill sets to be data driven, unlocking valuable insights with beautiful and interactive visualizations and dashboards.
preset.io

DoK Community

DoKC is an openly governed group of curious and experienced practitioners, taking inspiration from the CNCF and Apache Software Foundation.
Our aim is to assist in the emergence and development of techniques for the use of Kubernetes for data.  
dok.community

Stream Native

Founded by the original developers of Apache Pulsar® and Apache BookKeeper®, StreamNative builds a cloud-native event streaming platform that enables enterprises to easily access data as real-time event streams.
streamnative.io

StarTree

Founded by the creators of Apache Pinot and inspired by the user-facing analytics revolution begun by companies like LinkedIn and Uber, the StarTree team is committed to giving every decision-maker the insights they need to make great choices. 
startree.io

dbt Labs

dbt is a data transformation tool that enables data analysts and engineers to transform, test and document data in the cloud data warehouse.
getdbt.com

PostHog

PostHog is an open-source product analytics platform that offers a suite of tools, including funnels, heat maps, session recording and more, all in a single platform.
posthog.com

cube.dev

At Cube Dev, they’re on a mission to provide developers with analytical data access layer to build modern applications. Cube.js is used by thousands of companies around the world to power customer‑facing analytics and internal business intelligence tools.
cube.dev