Selectors#
Selectors allow for more intuitive selection of columns from DataFrame
or LazyFrame objects based on their name, dtype or other properties.
They unify and build on the related functionality that is available through
the col() expression and can also broadcast expressions over the selected
columns.
Importing#
- Selectors are available as functions imported from - polars.selectors
- Typical/recommended usage is to import the module as - csand employ selectors from there.- import polars.selectors as cs import polars as pl df = pl.DataFrame( { "w": ["xx", "yy", "xx", "yy", "xx"], "x": [1, 2, 1, 4, -2], "y": [3.0, 4.5, 1.0, 2.5, -2.0], "z": ["a", "b", "a", "b", "b"], }, ) df.groupby(by=cs.string()).agg(cs.numeric().sum()) 
Set operations#
Selectors support set operations such as:
- UNION: - A | B
- INTERSECTION: - A & B
- DIFFERENCE: - A - B
- COMPLEMENT: - ~A
Examples#
import polars.selectors as cs
import polars as pl
# set up an empty dataframe with plenty of columns of various dtypes
df = pl.DataFrame(
    schema={
        "abc": pl.UInt16,
        "bbb": pl.UInt32,
        "cde": pl.Float64,
        "def": pl.Float32,
        "eee": pl.Boolean,
        "fgg": pl.Boolean,
        "ghi": pl.Time,
        "JJK": pl.Date,
        "Lmn": pl.Duration,
        "opp": pl.Datetime("ms"),
        "qqR": pl.Utf8,
    },
)
# Select the UNION of temporal, strings and columns that start with "e"
assert df.select(cs.temporal() | cs.string() | cs.starts_with("e")).schema == {
    "eee": pl.Boolean,
    "ghi": pl.Time,
    "JJK": pl.Date,
    "Lmn": pl.Duration,
    "opp": pl.Datetime("ms"),
    "qqR": pl.Utf8,
}
# Select the INTERSECTION of temporal and column names that match "opp" OR "JJK"
assert df.select(cs.temporal() & cs.matches("opp|JJK")).schema == {
    "JJK": pl.Date,
    "opp": pl.Datetime("ms"),
}
# Select the DIFFERENCE of temporal columns and columns that contain the name "opp" OR "JJK"
assert df.select(cs.temporal() - cs.matches("opp|JJK")).schema == {
    "ghi": pl.Time,
    "Lmn": pl.Duration,
}
# Select the COMPLEMENT of all columns of dtypes Duration and Time
assert df.select(~cs.by_dtype([pl.Duration, pl.Time])).schema == {
    "abc": pl.UInt16,
    "bbb": pl.UInt32,
    "cde": pl.Float64,
    "def": pl.Float32,
    "eee": pl.Boolean,
    "fgg": pl.Boolean,
    "JJK": pl.Date,
    "opp": pl.Datetime("ms"),
    "qqR": pl.Utf8,
}
Note
If you don’t want to use the set operations on the selectors, you can materialize them as expressions
by calling as_expr. This ensures the operations OR, AND, etc are dispatched to the underlying
expressions instead.
Functions#
Available selector functions:
| 
 | Select all columns. | 
| 
 | Select all columns matching the given dtypes. | 
| 
 | Select all columns matching the given names. | 
| 
 | Select columns that contain the given literal substring(s). | 
| 
 | Select all datetime columns, optionally filtering by time unit/zone. | 
| 
 | Select all duration columns, optionally filtering by time unit. | 
| 
 | Select columns that end with the given substring(s). | 
| 
 | Expand a selector to column names with respect to a specific frame or schema target. | 
| 
 | Select the first column in the current scope. | 
| 
 | Select all float columns. | 
| 
 | Select all integer columns. | 
| 
 | Indicate whether the given object/expression is a selector. | 
| 
 | Select the last column in the current scope. | 
| 
 | Select all columns that match the given regex pattern. | 
| 
 | Select all numeric columns. | 
| 
 | Select columns that start with the given substring(s). | 
| 
 | Select all Utf8 (and, optionally, Categorical) string columns. | 
| 
 | Select all temporal columns. | 
- polars.selectors.all() SelectorType[source]#
- Select all columns. - See also - Examples - >>> from datetime import date >>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "dt": [date(1999, 12, 31), date(2024, 1, 1)], ... "value": [1_234_500, 5_000_555], ... }, ... schema_overrides={"value": pl.Int32}, ... ) - Select all columns, casting them to string: - >>> df.select(cs.all().cast(pl.Utf8)) shape: (2, 2) ┌────────────┬─────────┐ │ dt ┆ value │ │ --- ┆ --- │ │ str ┆ str │ ╞════════════╪═════════╡ │ 1999-12-31 ┆ 1234500 │ │ 2024-01-01 ┆ 5000555 │ └────────────┴─────────┘ - Select all columns except for those matching the given dtypes: - >>> df.select(cs.all() - cs.numeric()) shape: (2, 1) ┌────────────┐ │ dt │ │ --- │ │ date │ ╞════════════╡ │ 1999-12-31 │ │ 2024-01-01 │ └────────────┘ 
- polars.selectors.by_dtype(*dtypes: PolarsDataType | Collection[PolarsDataType]) SelectorType[source]#
- Select all columns matching the given dtypes. - See also - Examples - >>> from datetime import date >>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "dt": [date(1999, 12, 31), date(2024, 1, 1), date(2010, 7, 5)], ... "value": [1_234_500, 5_000_555, -4_500_000], ... "other": ["foo", "bar", "foo"], ... } ... ) - Select all columns with date or integer dtypes: - >>> df.select(cs.by_dtype(pl.Date, pl.INTEGER_DTYPES)) shape: (3, 2) ┌────────────┬──────────┐ │ dt ┆ value │ │ --- ┆ --- │ │ date ┆ i64 │ ╞════════════╪══════════╡ │ 1999-12-31 ┆ 1234500 │ │ 2024-01-01 ┆ 5000555 │ │ 2010-07-05 ┆ -4500000 │ └────────────┴──────────┘ - Select all columns that are not of date or integer dtype: - >>> df.select(~cs.by_dtype(pl.Date, pl.INTEGER_DTYPES)) shape: (3, 1) ┌───────┐ │ other │ │ --- │ │ str │ ╞═══════╡ │ foo │ │ bar │ │ foo │ └───────┘ - Group by string columns and sum the numeric columns: - >>> df.groupby(cs.string()).agg(cs.numeric().sum()).sort(by="other") shape: (2, 2) ┌───────┬──────────┐ │ other ┆ value │ │ --- ┆ --- │ │ str ┆ i64 │ ╞═══════╪══════════╡ │ bar ┆ 5000555 │ │ foo ┆ -3265500 │ └───────┴──────────┘ 
- polars.selectors.by_name(*names: str | Collection[str]) SelectorType[source]#
- Select all columns matching the given names. - Parameters:
- *names
- One or more names of columns to select. 
 
 - See also - by_dtype
- Select all columns matching the given dtypes. 
 - Examples - >>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": ["x", "y"], ... "bar": [123, 456], ... "baz": [2.0, 5.5], ... "zap": [False, True], ... } ... ) - Select columns by name: - >>> df.select(cs.by_name("foo", "bar")) shape: (2, 2) ┌─────┬─────┐ │ foo ┆ bar │ │ --- ┆ --- │ │ str ┆ i64 │ ╞═════╪═════╡ │ x ┆ 123 │ │ y ┆ 456 │ └─────┴─────┘ - Match all columns except for those given: - >>> df.select(~cs.by_name("foo", "bar")) shape: (2, 2) ┌─────┬───────┐ │ baz ┆ zap │ │ --- ┆ --- │ │ f64 ┆ bool │ ╞═════╪═══════╡ │ 2.0 ┆ false │ │ 5.5 ┆ true │ └─────┴───────┘ 
- polars.selectors.contains(substring: str | Collection[str]) SelectorType[source]#
- Select columns that contain the given literal substring(s). - Parameters:
- substring
- Substring(s) that matching column names should contain. 
 
 - See also - matches
- Select all columns that match the given regex pattern. 
- ends_with
- Select columns that end with the given substring(s). 
- starts_with
- Select columns that start with the given substring(s). 
 - Examples - >>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": ["x", "y"], ... "bar": [123, 456], ... "baz": [2.0, 5.5], ... "zap": [False, True], ... } ... ) - Select columns that contain the substring ‘ba’: - >>> df.select(cs.contains("ba")) shape: (2, 2) ┌─────┬─────┐ │ bar ┆ baz │ │ --- ┆ --- │ │ i64 ┆ f64 │ ╞═════╪═════╡ │ 123 ┆ 2.0 │ │ 456 ┆ 5.5 │ └─────┴─────┘ - Select columns that contain the substring ‘ba’ or the letter ‘z’: - >>> df.select(cs.contains(("ba", "z"))) shape: (2, 3) ┌─────┬─────┬───────┐ │ bar ┆ baz ┆ zap │ │ --- ┆ --- ┆ --- │ │ i64 ┆ f64 ┆ bool │ ╞═════╪═════╪═══════╡ │ 123 ┆ 2.0 ┆ false │ │ 456 ┆ 5.5 ┆ true │ └─────┴─────┴───────┘ - Select all columns except for those that contain the substring ‘ba’: - >>> df.select(~cs.contains("ba")) shape: (2, 2) ┌─────┬───────┐ │ foo ┆ zap │ │ --- ┆ --- │ │ str ┆ bool │ ╞═════╪═══════╡ │ x ┆ false │ │ y ┆ true │ └─────┴───────┘ 
- polars.selectors.datetime(
- time_unit: TimeUnit | Collection[TimeUnit] | None = None,
- time_zone: str | timezone | Collection[str | timezone | None] | None = ('*', None),
- Select all datetime columns, optionally filtering by time unit/zone. - Parameters:
- time_unit
- One (or more) of the allowed timeunit precision strings, “ms”, “us”, and “ns”. Omit to select columns with any valid timeunit. 
- time_zone
- One or more timezone strings, as defined in zoneinfo (to see valid options run - import zoneinfo; zoneinfo.available_timezones()for a full list).
- Set - Noneto select Datetime columns that do not have a timezone.
- Set “*” to select Datetime columns that have any timezone. 
 
 
 - Examples - >>> from datetime import datetime, date >>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "tstamp_tokyo": [ ... datetime(1999, 7, 20, 20, 20, 16, 987654), ... datetime(2000, 5, 15, 21, 21, 21, 123465), ... ], ... "tstamp_utc": [ ... datetime(2023, 4, 10, 12, 14, 16, 999000), ... datetime(2025, 8, 25, 14, 18, 22, 666000), ... ], ... "tstamp": [ ... datetime(2000, 11, 20, 18, 12, 16, 600000), ... datetime(2020, 10, 30, 10, 20, 25, 123000), ... ], ... "dt": [date(1999, 12, 31), date(2010, 7, 5)], ... }, ... schema_overrides={ ... "tstamp_tokyo": pl.Datetime("ns", "Asia/Tokyo"), ... "tstamp_utc": pl.Datetime("us", "UTC"), ... }, ... ) - Select all datetime columns: - >>> df.select(cs.datetime()) shape: (2, 3) ┌────────────────────────────────┬─────────────────────────────┬─────────────────────────┐ │ tstamp_tokyo ┆ tstamp_utc ┆ tstamp │ │ --- ┆ --- ┆ --- │ │ datetime[ns, Asia/Tokyo] ┆ datetime[μs, UTC] ┆ datetime[μs] │ ╞════════════════════════════════╪═════════════════════════════╪═════════════════════════╡ │ 1999-07-21 05:20:16.987654 JST ┆ 2023-04-10 12:14:16.999 UTC ┆ 2000-11-20 18:12:16.600 │ │ 2000-05-16 06:21:21.123465 JST ┆ 2025-08-25 14:18:22.666 UTC ┆ 2020-10-30 10:20:25.123 │ └────────────────────────────────┴─────────────────────────────┴─────────────────────────┘ - Select all datetime columns that have ‘us’ precision: - >>> df.select(cs.datetime("us")) shape: (2, 2) ┌─────────────────────────────┬─────────────────────────┐ │ tstamp_utc ┆ tstamp │ │ --- ┆ --- │ │ datetime[μs, UTC] ┆ datetime[μs] │ ╞═════════════════════════════╪═════════════════════════╡ │ 2023-04-10 12:14:16.999 UTC ┆ 2000-11-20 18:12:16.600 │ │ 2025-08-25 14:18:22.666 UTC ┆ 2020-10-30 10:20:25.123 │ └─────────────────────────────┴─────────────────────────┘ - Select all datetime columns that have any timezone: - >>> df.select(cs.datetime(time_zone="*")) shape: (2, 2) ┌────────────────────────────────┬─────────────────────────────┐ │ tstamp_tokyo ┆ tstamp_utc │ │ --- ┆ --- │ │ datetime[ns, Asia/Tokyo] ┆ datetime[μs, UTC] │ ╞════════════════════════════════╪═════════════════════════════╡ │ 1999-07-21 05:20:16.987654 JST ┆ 2023-04-10 12:14:16.999 UTC │ │ 2000-05-16 06:21:21.123465 JST ┆ 2025-08-25 14:18:22.666 UTC │ └────────────────────────────────┴─────────────────────────────┘ - Select all datetime columns that have a specific timezone: - >>> df.select(cs.datetime(time_zone="UTC")) shape: (2, 1) ┌─────────────────────────────┐ │ tstamp_utc │ │ --- │ │ datetime[μs, UTC] │ ╞═════════════════════════════╡ │ 2023-04-10 12:14:16.999 UTC │ │ 2025-08-25 14:18:22.666 UTC │ └─────────────────────────────┘ - Select all datetime columns that have NO timezone: - >>> df.select(cs.datetime(time_zone=None)) shape: (2, 1) ┌─────────────────────────┐ │ tstamp │ │ --- │ │ datetime[μs] │ ╞═════════════════════════╡ │ 2000-11-20 18:12:16.600 │ │ 2020-10-30 10:20:25.123 │ └─────────────────────────┘ - Select all columns except for datetime columns: - >>> df.select(~cs.datetime()) shape: (2, 1) ┌────────────┐ │ dt │ │ --- │ │ date │ ╞════════════╡ │ 1999-12-31 │ │ 2010-07-05 │ └────────────┘ 
- polars.selectors.duration(time_unit: TimeUnit | Collection[TimeUnit] | None = None) SelectorType[source]#
- Select all duration columns, optionally filtering by time unit. - Parameters:
- time_unit
- One (or more) of the allowed timeunit precision strings, “ms”, “us”, and “ns”. Omit to select columns with any valid timeunit. 
 
 - Examples - >>> from datetime import date, timedelta >>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "dt": [date(2022, 1, 31), date(2025, 7, 5)], ... "td1": [ ... timedelta(days=1, milliseconds=123456), ... timedelta(days=1, hours=23, microseconds=987000), ... ], ... "td2": [ ... timedelta(days=7, microseconds=456789), ... timedelta(days=14, minutes=999, seconds=59), ... ], ... "td3": [ ... timedelta(weeks=4, days=-10, microseconds=999999), ... timedelta(weeks=3, milliseconds=123456, microseconds=1), ... ], ... }, ... schema_overrides={ ... "td1": pl.Duration("ms"), ... "td2": pl.Duration("us"), ... "td3": pl.Duration("ns"), ... }, ... ) - Select all duration columns: - >>> df.select(cs.duration()) shape: (2, 3) ┌────────────────┬─────────────────┬────────────────────┐ │ td1 ┆ td2 ┆ td3 │ │ --- ┆ --- ┆ --- │ │ duration[ms] ┆ duration[μs] ┆ duration[ns] │ ╞════════════════╪═════════════════╪════════════════════╡ │ 1d 2m 3s 456ms ┆ 7d 456789µs ┆ 18d 999999µs │ │ 1d 23h 987ms ┆ 14d 16h 39m 59s ┆ 21d 2m 3s 456001µs │ └────────────────┴─────────────────┴────────────────────┘ - Select all duration columns that have ‘ms’ precision: - >>> df.select(cs.duration("ms")) shape: (2, 1) ┌────────────────┐ │ td1 │ │ --- │ │ duration[ms] │ ╞════════════════╡ │ 1d 2m 3s 456ms │ │ 1d 23h 987ms │ └────────────────┘ - Select all duration columns that have ‘ms’ OR ‘ns’ precision: - >>> df.select(cs.duration(["ms", "ns"])) shape: (2, 2) ┌────────────────┬────────────────────┐ │ td1 ┆ td3 │ │ --- ┆ --- │ │ duration[ms] ┆ duration[ns] │ ╞════════════════╪════════════════════╡ │ 1d 2m 3s 456ms ┆ 18d 999999µs │ │ 1d 23h 987ms ┆ 21d 2m 3s 456001µs │ └────────────────┴────────────────────┘ - Select all columns except for duration columns: - >>> df.select(~cs.duration()) shape: (2, 1) ┌────────────┐ │ dt │ │ --- │ │ date │ ╞════════════╡ │ 2022-01-31 │ │ 2025-07-05 │ └────────────┘ 
- polars.selectors.ends_with(*suffix: str) SelectorType[source]#
- Select columns that end with the given substring(s). - Parameters:
- suffix
- Substring(s) that matching column names should end with. 
 
 - See also - contains
- Select columns that contain the given literal substring(s). 
- matches
- Select all columns that match the given regex pattern. 
- starts_with
- Select columns that start with the given substring(s). 
 - Examples - >>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": ["x", "y"], ... "bar": [123, 456], ... "baz": [2.0, 5.5], ... "zap": [False, True], ... } ... ) - Select columns that end with the substring ‘z’: - >>> df.select(cs.ends_with("z")) shape: (2, 1) ┌─────┐ │ baz │ │ --- │ │ f64 │ ╞═════╡ │ 2.0 │ │ 5.5 │ └─────┘ - Select columns that end with either the letter ‘z’ or ‘r’: - >>> df.select(cs.ends_with("z", "r")) shape: (2, 2) ┌─────┬─────┐ │ bar ┆ baz │ │ --- ┆ --- │ │ i64 ┆ f64 │ ╞═════╪═════╡ │ 123 ┆ 2.0 │ │ 456 ┆ 5.5 │ └─────┴─────┘ - Select all columns except for those that end with the substring ‘z’: - >>> df.select(~cs.ends_with("z")) shape: (2, 3) ┌─────┬─────┬───────┐ │ foo ┆ bar ┆ zap │ │ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ bool │ ╞═════╪═════╪═══════╡ │ x ┆ 123 ┆ false │ │ y ┆ 456 ┆ true │ └─────┴─────┴───────┘ 
- polars.selectors.expand_selector(
- target: DataFrame | LazyFrame | Mapping[str, PolarsDataType],
- selector: SelectorType,
- Expand a selector to column names with respect to a specific frame or schema target. - Parameters:
- target
- A polars DataFrame, LazyFrame or schema. 
- selector
- An arbitrary polars selector (or compound selector). 
 
 - Examples - >>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "colx": ["a", "b", "c"], ... "coly": [123, 456, 789], ... "colz": [2.0, 5.5, 8.0], ... } ... ) - Expand selector with respect to an existing DataFrame: - >>> cs.expand_selector(df, cs.numeric()) ('coly', 'colz') >>> cs.expand_selector(df, cs.first() | cs.last()) ('colx', 'colz') - This also works with LazyFrame: - >>> cs.expand_selector(df.lazy(), ~(cs.first() | cs.last())) ('coly',) - Expand selector with respect to a standalone schema: - >>> schema = { ... "colx": pl.Float32, ... "coly": pl.Float64, ... "colz": pl.Date, ... } >>> cs.expand_selector(schema, cs.float()) ('colx', 'coly') 
- polars.selectors.first() SelectorType[source]#
- Select the first column in the current scope. - Examples - >>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": ["x", "y"], ... "bar": [123, 456], ... "baz": [2.0, 5.5], ... "zap": [0, 1], ... } ... ) - Select the first column: - >>> df.select(cs.first()) shape: (2, 1) ┌─────┐ │ foo │ │ --- │ │ str │ ╞═════╡ │ x │ │ y │ └─────┘ - Select everything except for the first column: - >>> df.select(~cs.first()) shape: (2, 3) ┌─────┬─────┬─────┐ │ bar ┆ baz ┆ zap │ │ --- ┆ --- ┆ --- │ │ i64 ┆ f64 ┆ i64 │ ╞═════╪═════╪═════╡ │ 123 ┆ 2.0 ┆ 0 │ │ 456 ┆ 5.5 ┆ 1 │ └─────┴─────┴─────┘ 
- polars.selectors.float() SelectorType[source]#
- Select all float columns. - See also - Examples - >>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": ["x", "y"], ... "bar": [123, 456], ... "baz": [2.0, 5.5], ... "zap": [0.0, 1.0], ... }, ... schema_overrides={"baz": pl.Float32, "zap": pl.Float64}, ... ) - Select all float columns: - >>> df.select(cs.float()) shape: (2, 2) ┌─────┬─────┐ │ baz ┆ zap │ │ --- ┆ --- │ │ f32 ┆ f64 │ ╞═════╪═════╡ │ 2.0 ┆ 0.0 │ │ 5.5 ┆ 1.0 │ └─────┴─────┘ - Select all columns except for those that are float: - >>> df.select(~cs.float()) shape: (2, 2) ┌─────┬─────┐ │ foo ┆ bar │ │ --- ┆ --- │ │ str ┆ i64 │ ╞═════╪═════╡ │ x ┆ 123 │ │ y ┆ 456 │ └─────┴─────┘ 
- polars.selectors.integer() SelectorType[source]#
- Select all integer columns. - See also - Examples - >>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": ["x", "y"], ... "bar": [123, 456], ... "baz": [2.0, 5.5], ... "zap": [0, 1], ... } ... ) - Select all integer columns: - >>> df.select(cs.integer()) shape: (2, 2) ┌─────┬─────┐ │ bar ┆ zap │ │ --- ┆ --- │ │ i64 ┆ i64 │ ╞═════╪═════╡ │ 123 ┆ 0 │ │ 456 ┆ 1 │ └─────┴─────┘ - Select all columns except for those that are integer: - >>> df.select(~cs.integer()) shape: (2, 2) ┌─────┬─────┐ │ foo ┆ baz │ │ --- ┆ --- │ │ str ┆ f64 │ ╞═════╪═════╡ │ x ┆ 2.0 │ │ y ┆ 5.5 │ └─────┴─────┘ 
- polars.selectors.is_selector(obj: Any) bool[source]#
- Indicate whether the given object/expression is a selector. - Examples - >>> from polars.selectors import is_selector >>> import polars.selectors as cs >>> is_selector(pl.col("colx")) False >>> is_selector(cs.first() | cs.last()) True 
- polars.selectors.last() SelectorType[source]#
- Select the last column in the current scope. - Examples - >>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": ["x", "y"], ... "bar": [123, 456], ... "baz": [2.0, 5.5], ... "zap": [0, 1], ... } ... ) - Select the last column: - >>> df.select(cs.last()) shape: (2, 1) ┌─────┐ │ zap │ │ --- │ │ i64 │ ╞═════╡ │ 0 │ │ 1 │ └─────┘ - Select everything except for the last column: - >> df.select(~cs.last()) shape: (2, 3) ┌─────┬─────┬─────┐ │ foo ┆ bar ┆ baz │ │ — ┆ — ┆ — │ │ str ┆ i64 ┆ f64 │ ╞═════╪═════╪═════╡ │ x ┆ 123 ┆ 2.0 │ │ y ┆ 456 ┆ 5.5 │ └─────┴─────┴─────┘ 
- polars.selectors.matches(pattern: str) SelectorType[source]#
- Select all columns that match the given regex pattern. - Parameters:
- pattern
- A valid regular expression pattern, compatible with the regex crate. 
 
 - See also - contains
- Select all columns that contain the given substring. 
- ends_with
- Select all columns that end with the given substring(s). 
- starts_with
- Select all columns that start with the given substring(s). 
 - Examples - >>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": ["x", "y"], ... "bar": [123, 456], ... "baz": [2.0, 5.5], ... "zap": [0, 1], ... } ... ) - Match column names containing an ‘a’, preceded by a character that is not ‘z’: - >>> df.select(cs.matches("[^z]a")) shape: (2, 2) ┌─────┬─────┐ │ bar ┆ baz │ │ --- ┆ --- │ │ i64 ┆ f64 │ ╞═════╪═════╡ │ 123 ┆ 2.0 │ │ 456 ┆ 5.5 │ └─────┴─────┘ - Do not match column names ending in ‘R’ or ‘z’ (case-insensitively): - >>> df.select(~cs.matches(r"(?i)R|z$")) shape: (2, 2) ┌─────┬─────┐ │ foo ┆ zap │ │ --- ┆ --- │ │ str ┆ i64 │ ╞═════╪═════╡ │ x ┆ 0 │ │ y ┆ 1 │ └─────┴─────┘ 
- polars.selectors.numeric() SelectorType[source]#
- Select all numeric columns. - See also - Examples - >>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": ["x", "y"], ... "bar": [123, 456], ... "baz": [2.0, 5.5], ... "zap": [0, 0], ... }, ... schema_overrides={"bar": pl.Int16, "baz": pl.Float32, "zap": pl.UInt8}, ... ) - Match all numeric columns: - >>> df.select(cs.numeric()) shape: (2, 3) ┌─────┬─────┬─────┐ │ bar ┆ baz ┆ zap │ │ --- ┆ --- ┆ --- │ │ i16 ┆ f32 ┆ u8 │ ╞═════╪═════╪═════╡ │ 123 ┆ 2.0 ┆ 0 │ │ 456 ┆ 5.5 ┆ 0 │ └─────┴─────┴─────┘ - Match all columns except for those that are numeric: - >>> df.select(~cs.numeric()) shape: (2, 1) ┌─────┐ │ foo │ │ --- │ │ str │ ╞═════╡ │ x │ │ y │ └─────┘ 
- polars.selectors.starts_with(*prefix: str) SelectorType[source]#
- Select columns that start with the given substring(s). - Parameters:
- prefix
- Substring(s) that matching column names should start with. 
 
 - See also - Examples - >>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "foo": [1.0, 2.0], ... "bar": [3.0, 4.0], ... "baz": [5, 6], ... "zap": [7, 8], ... } ... ) - Match columns starting with a ‘b’: - >>> df.select(cs.starts_with("b")) shape: (2, 2) ┌─────┬─────┐ │ bar ┆ baz │ │ --- ┆ --- │ │ f64 ┆ i64 │ ╞═════╪═════╡ │ 3.0 ┆ 5 │ │ 4.0 ┆ 6 │ └─────┴─────┘ - Match columns starting with either the letter ‘b’ or ‘z’: - >>> df.select(cs.starts_with("b", "z")) shape: (2, 3) ┌─────┬─────┬─────┐ │ bar ┆ baz ┆ zap │ │ --- ┆ --- ┆ --- │ │ f64 ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╡ │ 3.0 ┆ 5 ┆ 7 │ │ 4.0 ┆ 6 ┆ 8 │ └─────┴─────┴─────┘ - Match all columns except for those starting with ‘b’: - >>> df.select(~cs.starts_with("b")) shape: (2, 2) ┌─────┬─────┐ │ foo ┆ zap │ │ --- ┆ --- │ │ f64 ┆ i64 │ ╞═════╪═════╡ │ 1.0 ┆ 7 │ │ 2.0 ┆ 8 │ └─────┴─────┘ 
- polars.selectors.string(include_categorical: bool = False) SelectorType[source]#
- Select all Utf8 (and, optionally, Categorical) string columns. - See also - Examples - >>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "w": ["xx", "yy", "xx", "yy", "xx"], ... "x": [1, 2, 1, 4, -2], ... "y": [3.0, 4.5, 1.0, 2.5, -2.0], ... "z": ["a", "b", "a", "b", "b"], ... }, ... ).with_columns( ... z=pl.col("z").cast(pl.Categorical).cat.set_ordering("lexical"), ... ) - Group by all string columns, sum the numeric columns, then sort by the string cols: - >>> df.groupby(cs.string()).agg(cs.numeric().sum()).sort(by=cs.string()) shape: (2, 3) ┌─────┬─────┬─────┐ │ w ┆ x ┆ y │ │ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ f64 │ ╞═════╪═════╪═════╡ │ xx ┆ 0 ┆ 2.0 │ │ yy ┆ 6 ┆ 7.0 │ └─────┴─────┴─────┘ - Group by all string and categorical columns: - >>> df.groupby(cs.string(True)).agg(cs.numeric().sum()).sort(by=cs.string(True)) shape: (3, 4) ┌─────┬─────┬─────┬──────┐ │ w ┆ z ┆ x ┆ y │ │ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ cat ┆ i64 ┆ f64 │ ╞═════╪═════╪═════╪══════╡ │ xx ┆ a ┆ 2 ┆ 4.0 │ │ xx ┆ b ┆ -2 ┆ -2.0 │ │ yy ┆ b ┆ 6 ┆ 7.0 │ └─────┴─────┴─────┴──────┘ 
- polars.selectors.temporal() SelectorType[source]#
- Select all temporal columns. - See also - Examples - >>> from datetime import date, time >>> import polars.selectors as cs >>> df = pl.DataFrame( ... { ... "dt": [date(2021, 1, 1), date(2021, 1, 2)], ... "tm": [time(12, 0, 0), time(20, 30, 45)], ... "value": [1.2345, 2.3456], ... } ... ) - Match all temporal columns: - >>> df.select(cs.temporal()) shape: (2, 2) ┌────────────┬──────────┐ │ dt ┆ tm │ │ --- ┆ --- │ │ date ┆ time │ ╞════════════╪══════════╡ │ 2021-01-01 ┆ 12:00:00 │ │ 2021-01-02 ┆ 20:30:45 │ └────────────┴──────────┘ - Match all temporal columns except for Time columns: - >>> df.select(cs.temporal() - cs.by_dtype(pl.Time)) shape: (2, 1) ┌────────────┐ │ dt │ │ --- │ │ date │ ╞════════════╡ │ 2021-01-01 │ │ 2021-01-02 │ └────────────┘ - Match all columns except for temporal columns: - >>> df.select(~cs.temporal()) shape: (2, 1) ┌────────┐ │ value │ │ --- │ │ f64 │ ╞════════╡ │ 1.2345 │ │ 2.3456 │ └────────┘