select
select(__data, *args, **kwargs)
Select columns of a table to keep or drop (and optionally rename).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
__data |
The input table. |
required | |
*args |
An expression specifying columns to keep or drop. |
() |
|
**kwargs |
Not implemented. |
{} |
Examples:
>>> from siuba import _, select
>>> from siuba.data import cars
>>> small_cars = cars.head(1)
>>> small_cars
cyl mpg hp
0 6 21.0 110
You can refer to columns by name or position.
>>> small_cars >> select(_.cyl, _[2])
cyl hp
0 6 110
Use a ~
sign to exclude a column.
>>> small_cars >> select(~_.cyl)
mpg hp
0 21.0 110
You can use any methods you'd find on the .columns.str accessor:
>>> small_cars.columns.str.contains("p")
array([False, True, True])
>>> small_cars >> select(_.contains("p"))
mpg hp
0 21.0 110
Use a slice to select a range of columns:
>>> small_cars >> select(_[0:2])
cyl mpg
0 6 21.0
Multiple expressions can be combined using _[a, b, c] syntax. This is useful for dropping a complex set of matches.
>>> small_cars >> select(~_[_.startswith("c"), -1])
mpg
0 21.0
Source code in siuba/dply/verbs.py
@singledispatch2(DataFrame)
def select(__data, *args, **kwargs):
"""Select columns of a table to keep or drop (and optionally rename).
Parameters
----------
__data:
The input table.
*args:
An expression specifying columns to keep or drop.
**kwargs:
Not implemented.
Examples
--------
>>> from siuba import _, select
>>> from siuba.data import cars
>>> small_cars = cars.head(1)
>>> small_cars
cyl mpg hp
0 6 21.0 110
You can refer to columns by name or position.
>>> small_cars >> select(_.cyl, _[2])
cyl hp
0 6 110
Use a `~` sign to exclude a column.
>>> small_cars >> select(~_.cyl)
mpg hp
0 21.0 110
You can use any methods you'd find on the .columns.str accessor:
>>> small_cars.columns.str.contains("p")
array([False, True, True])
>>> small_cars >> select(_.contains("p"))
mpg hp
0 21.0 110
Use a slice to select a range of columns:
>>> small_cars >> select(_[0:2])
cyl mpg
0 6 21.0
Multiple expressions can be combined using _[a, b, c] syntax. This is useful
for dropping a complex set of matches.
>>> small_cars >> select(~_[_.startswith("c"), -1])
mpg
0 21.0
"""
if kwargs:
raise NotImplementedError(
"Using kwargs in select not currently supported. "
"Use _.newname == _.oldname instead"
)
var_list = var_create(*args)
od = var_select(__data.columns, *var_list, data=__data)
to_rename = {k: v for k,v in od.items() if v is not None}
return __data[list(od)].rename(columns = to_rename)