Skip to content

filter

filter(__data, *args)

Keep rows where conditions are true.

Parameters:

Name Type Description Default
__data

The data being filtered.

required
*args

conditions that must be met to keep a column.

()

Examples:

>>> from siuba import _, filter
>>> from siuba.data import cars

Keep rows where cyl is 4 and mpg is less than 25.

>>> cars >> filter(_.cyl ==  4, _.mpg < 22) 
    cyl   mpg   hp
20    4  21.5   97
31    4  21.4  109

Use | to represent an OR condition. For example, the code below keeps rows where hp is over 250 or mpg is over 32.

>>> cars >> filter((_.hp > 300) | (_.mpg > 32))
    cyl   mpg   hp
17    4  32.4   66
19    4  33.9   65
30    8  15.0  335
Source code in siuba/dply/verbs.py
@singledispatch2(pd.DataFrame)
def filter(__data, *args):
    """Keep rows where conditions are true.

    Parameters
    ----------
    __data:
        The data being filtered.
    *args:
        conditions that must be met to keep a column. 

    Examples
    --------

    >>> from siuba import _, filter
    >>> from siuba.data import cars

    Keep rows where cyl is 4 *and* mpg is less than 25.

    >>> cars >> filter(_.cyl ==  4, _.mpg < 22) 
        cyl   mpg   hp
    20    4  21.5   97
    31    4  21.4  109

    Use `|` to represent an OR condition. For example, the code below keeps
    rows where hp is over 250 *or* mpg is over 32.

    >>> cars >> filter((_.hp > 300) | (_.mpg > 32))
        cyl   mpg   hp
    17    4  32.4   66
    19    4  33.9   65
    30    8  15.0  335

    """
    crnt_indx = True
    for arg in args:
        res = arg(__data) if callable(arg) else arg

        if isinstance(res, pd.DataFrame):
            crnt_indx &= res.all(axis=1)
        elif isinstance(res, pd.Series):
            crnt_indx &= res
        else:
            crnt_indx &= res

    # use loc or iloc to subset, depending on crnt_indx ----
    # the main issue here is that loc can't remove all rows using a slice
    # and iloc can't use a boolean series
    if isinstance(crnt_indx, bool) or isinstance(crnt_indx, np.bool_):
        # iloc can do slice, but not a bool series
        result = __data.iloc[slice(None) if crnt_indx else slice(0),:]
    else:
        result = __data.loc[crnt_indx,:]

    return result