Workshop: Introduction to Siuba (PyData NYC 2023)

siuba is a data analysis library that makes data science faster. It provides a simple, consistent interface that handles messy, real-life data.

In this workshop, you’ll explore data on the top music tracks on Spotify. We’ll cover the 5 key siuba functions that allow you to answer many questions about the popularity of artists in different countries, along with different features in their music.

Throughout the workshop, we’ll use plotnine to graph the data.

Who the heck is teaching?

Michael Chow works on python tools for the open source team at Posit, PBC (formerly RStudio). He has a PhD in cognitive psycholgy, and maintains siuba.

Charlie Costanzo works primarily as an analytics engineer on public transportation and other civic technology projects (that also use siuba!)

Outline

(set up): using github codespaces
data wrangling: filter, arrange, mutate
visualization: plotnine basics
grouping and summarizing data
additional plot types

Requirements

You should have some familiarity with Python. Some experience with pandas will be useful, but is not necessary!

Preparation

We’ll walk you through these steps:

Using this site

The content for this workshop is broken into about 12 lessons. Each lesson contains slides, followed by exercises.

You can also view all the slides together.