Podcast: Combining Python And SQL To Build A PyData Warehouse

 

Sept 5, 2019
Robert Hodges on The Python Podcast with Tobias Macey

Combining Python And SQL To Build A PyData Warehouse – Episode 227

The ecosystem of tools and libraries in Python for data manipulation and analytics is truly impressive, and continues to grow. There are, however, gaps in their utility that can be filled by the capabilities of a data warehouse. In this episode Robert Hodges discusses how the PyData suite of tools can be paired with a data warehouse for an analytics pipeline that is more robust than either can provide on their own. This is a great introduction to what differentiates a data warehouse from a relational database and ways that you can think differently about running your analytical workloads for larger volumes of data.

Interview

 

  • Introductions
  • How did you get introduced to Python?
  • To start with, can you give a quick overview of what a data warehouse is and how it differs from a “regular” database for anyone who isn’t familiar with them?
    • What are the cases where a data warehouse would be preferable and when are they the wrong choice?
  • What capabilities does a data warehouse add to the PyData ecosystem?
  • For someone who doesn’t yet have a warehouse, what are some of the differentiating factors among the systems that are available?
  • Once you have a data warehouse deployed, how does it get populated and how does Python fit into that workflow?
  • For an analyst or data scientist, how might they interact with the data warehouse and what tools would they use to do so?
  • What are some potential bottlenecks when dealing with the volumes of data that can be contained in a warehouse within Python?
    • What are some ways that you have found to scale beyond those bottlenecks?

 

 

  • How does the data warehouse fit into the workflow for a machine learning or artificial intelligence project?
  • What are some of the limitations of data warehouses in the context of the Python ecosystem?
  • What are some of the trends that you see going forward for the integration of the PyData stack with data warehouses?
    • What are some challenges that you anticipate the industry running into in the process?

 

 

  • What are some useful references that you would recommend for anyone who wants to dig deeper into this topic?

 

Share