filter
filter(__data, *args)
Keep rows where conditions are true.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
__data |
The data being filtered. |
required | |
*args |
conditions that must be met to keep a column. |
() |
Examples:
>>> from siuba import _, filter
>>> from siuba.data import cars
Keep rows where cyl is 4 and mpg is less than 25.
>>> cars >> filter(_.cyl == 4, _.mpg < 22)
cyl mpg hp
20 4 21.5 97
31 4 21.4 109
Use |
to represent an OR condition. For example, the code below keeps
rows where hp is over 250 or mpg is over 32.
>>> cars >> filter((_.hp > 300) | (_.mpg > 32))
cyl mpg hp
17 4 32.4 66
19 4 33.9 65
30 8 15.0 335
Source code in siuba/dply/verbs.py
@singledispatch2(pd.DataFrame)
def filter(__data, *args):
"""Keep rows where conditions are true.
Parameters
----------
__data:
The data being filtered.
*args:
conditions that must be met to keep a column.
Examples
--------
>>> from siuba import _, filter
>>> from siuba.data import cars
Keep rows where cyl is 4 *and* mpg is less than 25.
>>> cars >> filter(_.cyl == 4, _.mpg < 22)
cyl mpg hp
20 4 21.5 97
31 4 21.4 109
Use `|` to represent an OR condition. For example, the code below keeps
rows where hp is over 250 *or* mpg is over 32.
>>> cars >> filter((_.hp > 300) | (_.mpg > 32))
cyl mpg hp
17 4 32.4 66
19 4 33.9 65
30 8 15.0 335
"""
crnt_indx = True
for arg in args:
res = arg(__data) if callable(arg) else arg
if isinstance(res, pd.DataFrame):
crnt_indx &= res.all(axis=1)
elif isinstance(res, pd.Series):
crnt_indx &= res
else:
crnt_indx &= res
# use loc or iloc to subset, depending on crnt_indx ----
# the main issue here is that loc can't remove all rows using a slice
# and iloc can't use a boolean series
if isinstance(crnt_indx, bool) or isinstance(crnt_indx, np.bool_):
# iloc can do slice, but not a bool series
result = __data.iloc[slice(None) if crnt_indx else slice(0),:]
else:
result = __data.loc[crnt_indx,:]
return result