from siuba import _, group_by, ungroup, filter, mutate, summarize
from siuba.data import mtcars
small_cars = mtcars[["cyl", "gear", "hp"]]
small_cars| cyl | gear | hp | |
|---|---|---|---|
| 0 | 6 | 4 | 110 |
| 1 | 6 | 4 | 110 |
| ... | ... | ... | ... |
| 30 | 8 | 5 | 335 |
| 31 | 4 | 4 | 109 |
32 rows × 3 columns
This function is used to specify groups in your data for verbs—like mutate(), filter(), and summarize()—to perform operations over.
For example, in the mtcars dataset, there are 3 possible values for cylinders (cyl). You could use group_by to say that you want to perform operations separately for each of these 3 groups of values.
An important compliment to group_by() is ungroup(), which removes all current groupings.
from siuba import _, group_by, ungroup, filter, mutate, summarize
from siuba.data import mtcars
small_cars = mtcars[["cyl", "gear", "hp"]]
small_cars| cyl | gear | hp | |
|---|---|---|---|
| 0 | 6 | 4 | 110 |
| 1 | 6 | 4 | 110 |
| ... | ... | ... | ... |
| 30 | 8 | 5 | 335 |
| 31 | 4 | 4 | 109 |
32 rows × 3 columns
The simplest way to use group by is to specify your grouping column directly. This is shown below, by grouping mtcars according to its 3 groups of cylinder values (4, 6, or 8 cylinders).
(grouped data frame)
| cyl | gear | hp | |
|---|---|---|---|
| 0 | 6 | 4 | 110 |
| 1 | 6 | 4 | 110 |
| ... | ... | ... | ... |
| 30 | 8 | 5 | 335 |
| 31 | 4 | 4 | 109 |
32 rows × 3 columns
Note that the result is simply a pandas GroupedDataFrame, which is what is returned if you use mtcars.groupby('cyl'). Normally, a GroupedDataFrame doesn’t print out a preview of itself, but siuba modifies it to do so, since this is very handy.
The group_by function is most often used with filter, mutate, and summarize.
In order to group by multiple columns, simply specify them all as arguments to group_by.