Skip to content

select

select(__data, *args, **kwargs)

Select columns of a table to keep or drop (and optionally rename).

Parameters:

Name Type Description Default
__data

The input table.

required
*args

An expression specifying columns to keep or drop.

()
**kwargs

Not implemented.

{}

Examples:

>>> from siuba import _, select
>>> from siuba.data import cars
>>> small_cars = cars.head(1)
>>> small_cars
   cyl   mpg   hp
0    6  21.0  110

You can refer to columns by name or position.

>>> small_cars >> select(_.cyl, _[2])
   cyl   hp
0    6  110

Use a ~ sign to exclude a column.

>>> small_cars >> select(~_.cyl)
    mpg   hp
0  21.0  110

You can use any methods you'd find on the .columns.str accessor:

>>> small_cars.columns.str.contains("p")
array([False,  True,  True])
>>> small_cars >> select(_.contains("p"))
    mpg   hp
0  21.0  110

Use a slice to select a range of columns:

>>> small_cars >> select(_[0:2])
   cyl   mpg
0    6  21.0

Multiple expressions can be combined using _[a, b, c] syntax. This is useful for dropping a complex set of matches.

>>> small_cars >> select(~_[_.startswith("c"), -1])
    mpg
0  21.0
Source code in siuba/dply/verbs.py
@singledispatch2(DataFrame)
def select(__data, *args, **kwargs):
    """Select columns of a table to keep or drop (and optionally rename).

    Parameters
    ----------
    __data:
        The input table.
    *args: 
        An expression specifying columns to keep or drop. 
    **kwargs:
        Not implemented.

    Examples
    --------
    >>> from siuba import _, select
    >>> from siuba.data import cars

    >>> small_cars = cars.head(1)
    >>> small_cars
       cyl   mpg   hp
    0    6  21.0  110

    You can refer to columns by name or position.

    >>> small_cars >> select(_.cyl, _[2])
       cyl   hp
    0    6  110

    Use a `~` sign to exclude a column.

    >>> small_cars >> select(~_.cyl)
        mpg   hp
    0  21.0  110

    You can use any methods you'd find on the .columns.str accessor:

    >>> small_cars.columns.str.contains("p")
    array([False,  True,  True])

    >>> small_cars >> select(_.contains("p"))
        mpg   hp
    0  21.0  110

    Use a slice to select a range of columns:

    >>> small_cars >> select(_[0:2])
       cyl   mpg
    0    6  21.0

    Multiple expressions can be combined using _[a, b, c] syntax. This is useful
    for dropping a complex set of matches.

    >>> small_cars >> select(~_[_.startswith("c"), -1])
        mpg
    0  21.0

    """

    if kwargs:
        raise NotImplementedError(
                "Using kwargs in select not currently supported. "
                "Use _.newname == _.oldname instead"
                )
    var_list = var_create(*args)

    od = var_select(__data.columns, *var_list, data=__data)

    to_rename = {k: v for k,v in od.items() if v is not None}

    return __data[list(od)].rename(columns = to_rename)