Expression expansion

As you've seen in the section about expressions and contexts, expression expansion is a feature that enables you to write a single expression that can expand to multiple different expressions, possibly depending on the schema of the context in which the expression is used.

This feature isn't just decorative or syntactic sugar. It allows for a very powerful application of DRY principles in your code: a single expression that specifies multiple columns expands into a list of expressions, which means you can write one single expression and reuse the computation that it represents.

In this section we will show several forms of expression expansion and we will be using the dataframe that you can see below for that effect:

Python Rust

import polars as pl

df = pl.DataFrame(
    {  # As of 14th October 2024, ~3pm UTC
        "ticker": ["AAPL", "NVDA", "MSFT", "GOOG", "AMZN"],
        "company_name": ["Apple", "NVIDIA", "Microsoft", "Alphabet (Google)", "Amazon"],
        "price": [229.9, 138.93, 420.56, 166.41, 188.4],
        "day_high": [231.31, 139.6, 424.04, 167.62, 189.83],
        "day_low": [228.6, 136.3, 417.52, 164.78, 188.44],
        "year_high": [237.23, 140.76, 468.35, 193.31, 201.2],
        "year_low": [164.08, 39.23, 324.39, 121.46, 118.35],
    }
)

print(df)

use polars::prelude::*;

// Data as of 14th October 2024, ~3pm UTC
let df = df!(
    "ticker" => ["AAPL", "NVDA", "MSFT", "GOOG", "AMZN"],
    "company_name" => ["Apple", "NVIDIA", "Microsoft", "Alphabet (Google)", "Amazon"],
    "price" => [229.9, 138.93, 420.56, 166.41, 188.4],
    "day_high" => [231.31, 139.6, 424.04, 167.62, 189.83],
    "day_low" => [228.6, 136.3, 417.52, 164.78, 188.44],
    "year_high" => [237.23, 140.76, 468.35, 193.31, 201.2],
    "year_low" => [164.08, 39.23, 324.39, 121.46, 118.35],
)?;

println!("{}", df);

shape: (5, 7)
┌────────┬───────────────────┬────────┬──────────┬─────────┬───────────┬──────────┐
│ ticker ┆ company_name      ┆ price  ┆ day_high ┆ day_low ┆ year_high ┆ year_low │
│ ---    ┆ ---               ┆ ---    ┆ ---      ┆ ---     ┆ ---       ┆ ---      │
│ str    ┆ str               ┆ f64    ┆ f64      ┆ f64     ┆ f64       ┆ f64      │
╞════════╪═══════════════════╪════════╪══════════╪═════════╪═══════════╪══════════╡
│ AAPL   ┆ Apple             ┆ 229.9  ┆ 231.31   ┆ 228.6   ┆ 237.23    ┆ 164.08   │
│ NVDA   ┆ NVIDIA            ┆ 138.93 ┆ 139.6    ┆ 136.3   ┆ 140.76    ┆ 39.23    │
│ MSFT   ┆ Microsoft         ┆ 420.56 ┆ 424.04   ┆ 417.52  ┆ 468.35    ┆ 324.39   │
│ GOOG   ┆ Alphabet (Google) ┆ 166.41 ┆ 167.62   ┆ 164.78  ┆ 193.31    ┆ 121.46   │
│ AMZN   ┆ Amazon            ┆ 188.4  ┆ 189.83   ┆ 188.44  ┆ 201.2     ┆ 118.35   │
└────────┴───────────────────┴────────┴──────────┴─────────┴───────────┴──────────┘

Function `col`

The function col is the most common way of making use of expression expansion features in Polars. Typically used to refer to one column of a dataframe, in this section we explore other ways in which you can use col (or its variants, when in Rust).

Explicit expansion by column name

The simplest form of expression expansion happens when you provide multiple column names to the function col.

The example below uses a single function col with multiple column names to convert the values in USD to EUR:

Python Rust

col

eur_usd_rate = 1.09  # As of 14th October 2024

result = df.with_columns(
    (
        pl.col(
            "price",
            "day_high",
            "day_low",
            "year_high",
            "year_low",
        )
        / eur_usd_rate
    ).round(2)
)
print(result)

col

let eur_usd_rate = 1.09; // As of 14th October 2024

let result = df
    .clone()
    .lazy()
    .with_column(
        (cols(["price", "day_high", "day_low", "year_high", "year_low"]) / lit(eur_usd_rate))
            .round(2),
    )
    .collect()?;
println!("{}", result);

shape: (5, 7)
┌────────┬───────────────────┬────────┬──────────┬─────────┬───────────┬──────────┐
│ ticker ┆ company_name      ┆ price  ┆ day_high ┆ day_low ┆ year_high ┆ year_low │
│ ---    ┆ ---               ┆ ---    ┆ ---      ┆ ---     ┆ ---       ┆ ---      │
│ str    ┆ str               ┆ f64    ┆ f64      ┆ f64     ┆ f64       ┆ f64      │
╞════════╪═══════════════════╪════════╪══════════╪═════════╪═══════════╪══════════╡
│ AAPL   ┆ Apple             ┆ 210.92 ┆ 212.21   ┆ 209.72  ┆ 217.64    ┆ 150.53   │
│ NVDA   ┆ NVIDIA            ┆ 127.46 ┆ 128.07   ┆ 125.05  ┆ 129.14    ┆ 35.99    │
│ MSFT   ┆ Microsoft         ┆ 385.83 ┆ 389.03   ┆ 383.05  ┆ 429.68    ┆ 297.61   │
│ GOOG   ┆ Alphabet (Google) ┆ 152.67 ┆ 153.78   ┆ 151.17  ┆ 177.35    ┆ 111.43   │
│ AMZN   ┆ Amazon            ┆ 172.84 ┆ 174.16   ┆ 172.88  ┆ 184.59    ┆ 108.58   │
└────────┴───────────────────┴────────┴──────────┴─────────┴───────────┴──────────┘

When you list the column names you want the expression to expand to, you can predict what the expression will expand to. In this case, the expression that does the currency conversion is expanded to a list of five expressions:

Python Rust

col

exprs = [
    (pl.col("price") / eur_usd_rate).round(2),
    (pl.col("day_high") / eur_usd_rate).round(2),
    (pl.col("day_low") / eur_usd_rate).round(2),
    (pl.col("year_high") / eur_usd_rate).round(2),
    (pl.col("year_low") / eur_usd_rate).round(2),
]

result2 = df.with_columns(exprs)
print(result.equals(result2))

col

let exprs = [
    (col("price") / lit(eur_usd_rate)).round(2),
    (col("day_high") / lit(eur_usd_rate)).round(2),
    (col("day_low") / lit(eur_usd_rate)).round(2),
    (col("year_high") / lit(eur_usd_rate)).round(2),
    (col("year_low") / lit(eur_usd_rate)).round(2),
];

let result2 = df.clone().lazy().with_columns(exprs).collect()?;
println!("{}", result.equals(&result2));

True

Expansion by data type

We had to type five column names in the previous example but the function col can also conveniently accept one or more data types. If you provide data types instead of column names, the expression is expanded to all columns that match one of the data types provided.

The example below performs the exact same computation as before:

Python Rust

col

result = df.with_columns((pl.col(pl.Float64) / eur_usd_rate).round(2))
print(result)

dtype_col

let result = df
    .clone()
    .lazy()
    .with_column((dtype_col(&DataType::Float64) / lit(eur_usd_rate)).round(2))
    .collect()?;
println!("{}", result);

shape: (5, 7)
┌────────┬───────────────────┬────────┬──────────┬─────────┬───────────┬──────────┐
│ ticker ┆ company_name      ┆ price  ┆ day_high ┆ day_low ┆ year_high ┆ year_low │
│ ---    ┆ ---               ┆ ---    ┆ ---      ┆ ---     ┆ ---       ┆ ---      │
│ str    ┆ str               ┆ f64    ┆ f64      ┆ f64     ┆ f64       ┆ f64      │
╞════════╪═══════════════════╪════════╪══════════╪═════════╪═══════════╪══════════╡
│ AAPL   ┆ Apple             ┆ 210.92 ┆ 212.21   ┆ 209.72  ┆ 217.64    ┆ 150.53   │
│ NVDA   ┆ NVIDIA            ┆ 127.46 ┆ 128.07   ┆ 125.05  ┆ 129.14    ┆ 35.99    │
│ MSFT   ┆ Microsoft         ┆ 385.83 ┆ 389.03   ┆ 383.05  ┆ 429.68    ┆ 297.61   │
│ GOOG   ┆ Alphabet (Google) ┆ 152.67 ┆ 153.78   ┆ 151.17  ┆ 177.35    ┆ 111.43   │
│ AMZN   ┆ Amazon            ┆ 172.84 ┆ 174.16   ┆ 172.88  ┆ 184.59    ┆ 108.58   │
└────────┴───────────────────┴────────┴──────────┴─────────┴───────────┴──────────┘

When we use a data type with expression expansion we cannot know, beforehand, how many columns a single expression will expand to. We need the schema of the input dataframe if we want to determine what is the final list of expressions that is to be applied.

If we weren't sure about whether the price columns where of the type Float64 or Float32, we could specify both data types:

Python Rust

col

result2 = df.with_columns(
    (
        pl.col(
            pl.Float32,
            pl.Float64,
        )
        / eur_usd_rate
    ).round(2)
)
print(result.equals(result2))

dtype_cols

let result2 = df
    .clone()
    .lazy()
    .with_column(
        (dtype_cols([DataType::Float32, DataType::Float64]) / lit(eur_usd_rate)).round(2),
    )
    .collect()?;
println!("{}", result.equals(&result2));

True

Expansion by pattern matching

You can also use regular expressions to specify patterns that are used to match the column names. To distinguish between a regular column name and expansion by pattern matching, regular expressions start and end with ^ and $, respectively. This also means that the pattern must match against the whole column name string.

Regular expressions can be mixed with regular column names:

Python Rust

col

result = df.select(pl.col("ticker", "^.*_high$", "^.*_low$"))
print(result)

col

// NOTE: Using regex inside `col`/`cols` requires the feature flag `regex`.
let result = df
    .clone()
    .lazy()
    .select([cols(["ticker", "^.*_high$", "^.*_low$"])])
    .collect()?;
println!("{}", result);

shape: (5, 5)
┌────────┬──────────┬───────────┬─────────┬──────────┐
│ ticker ┆ day_high ┆ year_high ┆ day_low ┆ year_low │
│ ---    ┆ ---      ┆ ---       ┆ ---     ┆ ---      │
│ str    ┆ f64      ┆ f64       ┆ f64     ┆ f64      │
╞════════╪══════════╪═══════════╪═════════╪══════════╡
│ AAPL   ┆ 231.31   ┆ 237.23    ┆ 228.6   ┆ 164.08   │
│ NVDA   ┆ 139.6    ┆ 140.76    ┆ 136.3   ┆ 39.23    │
│ MSFT   ┆ 424.04   ┆ 468.35    ┆ 417.52  ┆ 324.39   │
│ GOOG   ┆ 167.62   ┆ 193.31    ┆ 164.78  ┆ 121.46   │
│ AMZN   ┆ 189.83   ┆ 201.2     ┆ 188.44  ┆ 118.35   │
└────────┴──────────┴───────────┴─────────┴──────────┘

Arguments cannot be of mixed types

In Python, the function col accepts an arbitrary number of strings (as column names or as regular expressions) or an arbitrary number of data types, but you cannot mix both in the same function call:

try:
    df.select(pl.col("ticker", pl.Float64))
except TypeError as err:
    print("TypeError:", err)

TypeError: argument 'names': 'DataTypeClass' object cannot be converted to 'PyString'

Selecting all columns

Polars provides the function all as shorthand notation to refer to all columns of a dataframe:

Python Rust

all

result = df.select(pl.all())
print(result.equals(df))

all

let result = df.clone().lazy().select([all()]).collect()?;
println!("{}", result.equals(&df));

True

Note

The function all is syntactic sugar for col("*"), but since the argument "*" is a special case and all reads more like English, the usage of all is preferred.

Excluding columns

Polars also provides a mechanism to exclude certain columns from expression expansion. For that, you use the function exclude, which accepts exactly the same types of arguments as col:

Python Rust

exclude

result = df.select(pl.all().exclude("^day_.*$"))
print(result)

exclude

let result = df
    .clone()
    .lazy()
    .select([all().exclude(["^day_.*$"])])
    .collect()?;
println!("{}", result);

shape: (5, 5)
┌────────┬───────────────────┬────────┬───────────┬──────────┐
│ ticker ┆ company_name      ┆ price  ┆ year_high ┆ year_low │
│ ---    ┆ ---               ┆ ---    ┆ ---       ┆ ---      │
│ str    ┆ str               ┆ f64    ┆ f64       ┆ f64      │
╞════════╪═══════════════════╪════════╪═══════════╪══════════╡
│ AAPL   ┆ Apple             ┆ 229.9  ┆ 237.23    ┆ 164.08   │
│ NVDA   ┆ NVIDIA            ┆ 138.93 ┆ 140.76    ┆ 39.23    │
│ MSFT   ┆ Microsoft         ┆ 420.56 ┆ 468.35    ┆ 324.39   │
│ GOOG   ┆ Alphabet (Google) ┆ 166.41 ┆ 193.31    ┆ 121.46   │
│ AMZN   ┆ Amazon            ┆ 188.4  ┆ 201.2     ┆ 118.35   │
└────────┴───────────────────┴────────┴───────────┴──────────┘

Naturally, the function exclude can also be used after the function col:

Python Rust

exclude

result = df.select(pl.col(pl.Float64).exclude("^day_.*$"))
print(result)

exclude

let result = df
    .clone()
    .lazy()
    .select([dtype_col(&DataType::Float64).exclude(["^day_.*$"])])
    .collect()?;
println!("{}", result);

shape: (5, 3)
┌────────┬───────────┬──────────┐
│ price  ┆ year_high ┆ year_low │
│ ---    ┆ ---       ┆ ---      │
│ f64    ┆ f64       ┆ f64      │
╞════════╪═══════════╪══════════╡
│ 229.9  ┆ 237.23    ┆ 164.08   │
│ 138.93 ┆ 140.76    ┆ 39.23    │
│ 420.56 ┆ 468.35    ┆ 324.39   │
│ 166.41 ┆ 193.31    ┆ 121.46   │
│ 188.4  ┆ 201.2     ┆ 118.35   │
└────────┴───────────┴──────────┘

Column renaming

By default, when you apply an expression to a column, the result keeps the same name as the original column.

Preserving the column name can be semantically wrong and in certain cases Polars may even raise an error if duplicate names occur:

Python Rust

from polars.exceptions import DuplicateError

gbp_usd_rate = 1.31  # As of 14th October 2024

try:
    df.select(
        pl.col("price") / gbp_usd_rate,  # This would be named "price"...
        pl.col("price") / eur_usd_rate,  # And so would this.
    )
except DuplicateError as err:
    print("DuplicateError:", err)

let gbp_usd_rate = 1.31; // As of 14th October 2024

let result = df
    .clone()
    .lazy()
    .select([
        col("price") / lit(gbp_usd_rate),
        col("price") / lit(eur_usd_rate),
    ])
    .collect();
match result {
    Ok(df) => println!("{}", df),
    Err(e) => println!("{}", e),
};

DuplicateError: the name 'price' is duplicate

It's possible that multiple expressions are returning the same default column name. If this is the case, try renaming the columns with `.alias("new_name")` to avoid duplicate column names.

To prevent errors like this, and to allow users to rename their columns when appropriate, Polars provides a series of functions that let you change the name of a column or a group of columns.

Renaming a single column with `alias`

The function alias has been used thoroughly in the documentation already and it lets you rename a single column:

Python Rust

alias

result = df.select(
    (pl.col("price") / gbp_usd_rate).alias("price (GBP)"),
    (pl.col("price") / eur_usd_rate).alias("price (EUR)"),
)

alias

let _result = df
    .clone()
    .lazy()
    .select([
        (col("price") / lit(gbp_usd_rate)).alias("price (GBP)"),
        (col("price") / lit(eur_usd_rate)).alias("price (EUR)"),
    ])
    .collect()?;

Prefixing and suffixing column names

When using expression expansion you cannot use the function alias because the function alias is designed specifically to rename a single column.

When it suffices to add a static prefix or a static suffix to the existing names, we can use the functions prefix and suffix from the namespace name:

Python Rust

name namespace · prefix · suffix

result = df.select(
    (pl.col("^year_.*$") / eur_usd_rate).name.prefix("in_eur_"),
    (pl.col("day_high", "day_low") / gbp_usd_rate).name.suffix("_gbp"),
)
print(result)

name namespace · prefix · suffix · Available on feature lazy

let result = df
    .clone()
    .lazy()
    .select([
        (col("^year_.*$") / lit(eur_usd_rate))
            .name()
            .prefix("in_eur_"),
        (cols(["day_high", "day_low"]) / lit(gbp_usd_rate))
            .name()
            .suffix("_gbp"),
    ])
    .collect()?;
println!("{}", result);

shape: (5, 4)
┌──────────────────┬─────────────────┬──────────────┬─────────────┐
│ in_eur_year_high ┆ in_eur_year_low ┆ day_high_gbp ┆ day_low_gbp │
│ ---              ┆ ---             ┆ ---          ┆ ---         │
│ f64              ┆ f64             ┆ f64          ┆ f64         │
╞══════════════════╪═════════════════╪══════════════╪═════════════╡
│ 217.642202       ┆ 150.53211       ┆ 176.572519   ┆ 174.503817  │
│ 129.137615       ┆ 35.990826       ┆ 106.564885   ┆ 104.045802  │
│ 429.678899       ┆ 297.605505      ┆ 323.694656   ┆ 318.717557  │
│ 177.348624       ┆ 111.431193      ┆ 127.954198   ┆ 125.78626   │
│ 184.587156       ┆ 108.577982      ┆ 144.908397   ┆ 143.847328  │
└──────────────────┴─────────────────┴──────────────┴─────────────┘

Dynamic name replacement

If a static prefix/suffix is not enough, the namespace name also provides the function map that accepts a callable that accepts the old column names and produces the new ones:

Python Rust

name namespace · map

# There is also `.name.to_uppercase`, so this usage of `.map` is moot.
result = df.select(pl.all().name.map(str.upper))
print(result)

name namespace · map · Available on feature lazy

// There is also `name().to_uppercase()`, so this usage of `map` is moot.
let result = df
    .clone()
    .lazy()
    .select([all()
        .name()
        .map(|name| Ok(PlSmallStr::from_string(name.to_ascii_uppercase())))])
    .collect()?;
println!("{}", result);

shape: (5, 7)
┌────────┬───────────────────┬────────┬──────────┬─────────┬───────────┬──────────┐
│ TICKER ┆ COMPANY_NAME      ┆ PRICE  ┆ DAY_HIGH ┆ DAY_LOW ┆ YEAR_HIGH ┆ YEAR_LOW │
│ ---    ┆ ---               ┆ ---    ┆ ---      ┆ ---     ┆ ---       ┆ ---      │
│ str    ┆ str               ┆ f64    ┆ f64      ┆ f64     ┆ f64       ┆ f64      │
╞════════╪═══════════════════╪════════╪══════════╪═════════╪═══════════╪══════════╡
│ AAPL   ┆ Apple             ┆ 229.9  ┆ 231.31   ┆ 228.6   ┆ 237.23    ┆ 164.08   │
│ NVDA   ┆ NVIDIA            ┆ 138.93 ┆ 139.6    ┆ 136.3   ┆ 140.76    ┆ 39.23    │
│ MSFT   ┆ Microsoft         ┆ 420.56 ┆ 424.04   ┆ 417.52  ┆ 468.35    ┆ 324.39   │
│ GOOG   ┆ Alphabet (Google) ┆ 166.41 ┆ 167.62   ┆ 164.78  ┆ 193.31    ┆ 121.46   │
│ AMZN   ┆ Amazon            ┆ 188.4  ┆ 189.83   ┆ 188.44  ┆ 201.2     ┆ 118.35   │
└────────┴───────────────────┴────────┴──────────┴─────────┴───────────┴──────────┘

See the API reference for the full contents of the namespace name.

Programmatically generating expressions

Expression expansion is a very useful feature but it does not solve all of your problems. For example, if we want to compute the day and year amplitude of the prices of the stocks in our dataframe, expression expansion won't help us.

At first, you may think about using a for loop:

Python Rust

result = df
for tp in ["day", "year"]:
    result = result.with_columns(
        (pl.col(f"{tp}_high") - pl.col(f"{tp}_low")).alias(f"{tp}_amplitude")
    )
print(result)

let mut result = df.clone().lazy();
for tp in ["day", "year"] {
    let high = format!("{}_high", tp);
    let low = format!("{}_low", tp);
    let aliased = format!("{}_amplitude", tp);
    result = result.with_column((col(high) - col(low)).alias(aliased))
}
let result = result.collect()?;
println!("{}", result);

shape: (5, 9)
┌────────┬──────────────┬────────┬──────────┬───┬───────────┬──────────┬─────────────┬─────────────┐
│ ticker ┆ company_name ┆ price  ┆ day_high ┆ … ┆ year_high ┆ year_low ┆ day_amplitu ┆ year_amplit │
│ ---    ┆ ---          ┆ ---    ┆ ---      ┆   ┆ ---       ┆ ---      ┆ de          ┆ ude         │
│ str    ┆ str          ┆ f64    ┆ f64      ┆   ┆ f64       ┆ f64      ┆ ---         ┆ ---         │
│        ┆              ┆        ┆          ┆   ┆           ┆          ┆ f64         ┆ f64         │
╞════════╪══════════════╪════════╪══════════╪═══╪═══════════╪══════════╪═════════════╪═════════════╡
│ AAPL   ┆ Apple        ┆ 229.9  ┆ 231.31   ┆ … ┆ 237.23    ┆ 164.08   ┆ 2.71        ┆ 73.15       │
│ NVDA   ┆ NVIDIA       ┆ 138.93 ┆ 139.6    ┆ … ┆ 140.76    ┆ 39.23    ┆ 3.3         ┆ 101.53      │
│ MSFT   ┆ Microsoft    ┆ 420.56 ┆ 424.04   ┆ … ┆ 468.35    ┆ 324.39   ┆ 6.52        ┆ 143.96      │
│ GOOG   ┆ Alphabet     ┆ 166.41 ┆ 167.62   ┆ … ┆ 193.31    ┆ 121.46   ┆ 2.84        ┆ 71.85       │
│        ┆ (Google)     ┆        ┆          ┆   ┆           ┆          ┆             ┆             │
│ AMZN   ┆ Amazon       ┆ 188.4  ┆ 189.83   ┆ … ┆ 201.2     ┆ 118.35   ┆ 1.39        ┆ 82.85       │
└────────┴──────────────┴────────┴──────────┴───┴───────────┴──────────┴─────────────┴─────────────┘

Do not do this. Instead, generate all of the expressions you want to compute programmatically and use them only once in a context. Loosely speaking, you want to swap the for loop with the context with_columns. In practice, you could do something like the following:

Python Rust

def amplitude_expressions(time_periods):
    for tp in time_periods:
        yield (pl.col(f"{tp}_high") - pl.col(f"{tp}_low")).alias(f"{tp}_amplitude")


result = df.with_columns(amplitude_expressions(["day", "year"]))
print(result)

let mut exprs: Vec<Expr> = vec![];
for tp in ["day", "year"] {
    let high = format!("{}_high", tp);
    let low = format!("{}_low", tp);
    let aliased = format!("{}_amplitude", tp);
    exprs.push((col(high) - col(low)).alias(aliased))
}
let result = df.clone().lazy().with_columns(exprs).collect()?;
println!("{}", result);

shape: (5, 9)
┌────────┬──────────────┬────────┬──────────┬───┬───────────┬──────────┬─────────────┬─────────────┐
│ ticker ┆ company_name ┆ price  ┆ day_high ┆ … ┆ year_high ┆ year_low ┆ day_amplitu ┆ year_amplit │
│ ---    ┆ ---          ┆ ---    ┆ ---      ┆   ┆ ---       ┆ ---      ┆ de          ┆ ude         │
│ str    ┆ str          ┆ f64    ┆ f64      ┆   ┆ f64       ┆ f64      ┆ ---         ┆ ---         │
│        ┆              ┆        ┆          ┆   ┆           ┆          ┆ f64         ┆ f64         │
╞════════╪══════════════╪════════╪══════════╪═══╪═══════════╪══════════╪═════════════╪═════════════╡
│ AAPL   ┆ Apple        ┆ 229.9  ┆ 231.31   ┆ … ┆ 237.23    ┆ 164.08   ┆ 2.71        ┆ 73.15       │
│ NVDA   ┆ NVIDIA       ┆ 138.93 ┆ 139.6    ┆ … ┆ 140.76    ┆ 39.23    ┆ 3.3         ┆ 101.53      │
│ MSFT   ┆ Microsoft    ┆ 420.56 ┆ 424.04   ┆ … ┆ 468.35    ┆ 324.39   ┆ 6.52        ┆ 143.96      │
│ GOOG   ┆ Alphabet     ┆ 166.41 ┆ 167.62   ┆ … ┆ 193.31    ┆ 121.46   ┆ 2.84        ┆ 71.85       │
│        ┆ (Google)     ┆        ┆          ┆   ┆           ┆          ┆             ┆             │
│ AMZN   ┆ Amazon       ┆ 188.4  ┆ 189.83   ┆ … ┆ 201.2     ┆ 118.35   ┆ 1.39        ┆ 82.85       │
└────────┴──────────────┴────────┴──────────┴───┴───────────┴──────────┴─────────────┴─────────────┘

This produces the same final result and by specifying all of the expressions in one go we give Polars the opportunity to:

do a better job at optimising the query; and
parallelise the execution of the actual computations.

More flexible column selections

Polars comes with the submodule selectors that provides a number of functions that allow you to write more flexible column selections for expression expansion.

Warning

This functionality is not available in Rust yet. Refer to Polars issue #10594.

As a first example, here is how we can use the functions string and ends_with, and the set operations that the functions from selectors support, to select all string columns and the columns whose names end with "_high":

Python Rust

selectors

import polars.selectors as cs

result = df.select(cs.string() | cs.ends_with("_high"))
print(result)

// Selectors are not available in Rust yet.
// Refer to https://github.com/pola-rs/polars/issues/10594

shape: (5, 4)
┌────────┬───────────────────┬──────────┬───────────┐
│ ticker ┆ company_name      ┆ day_high ┆ year_high │
│ ---    ┆ ---               ┆ ---      ┆ ---       │
│ str    ┆ str               ┆ f64      ┆ f64       │
╞════════╪═══════════════════╪══════════╪═══════════╡
│ AAPL   ┆ Apple             ┆ 231.31   ┆ 237.23    │
│ NVDA   ┆ NVIDIA            ┆ 139.6    ┆ 140.76    │
│ MSFT   ┆ Microsoft         ┆ 424.04   ┆ 468.35    │
│ GOOG   ┆ Alphabet (Google) ┆ 167.62   ┆ 193.31    │
│ AMZN   ┆ Amazon            ┆ 189.83   ┆ 201.2     │
└────────┴───────────────────┴──────────┴───────────┘

The submodule selectors provides a number of selectors that match based on the data type of the columns, of which the most useful are the functions that match a whole category of types, like cs.numeric for all numeric data types or cs.temporal for all temporal data types.

The submodule selectors also provides a number of selectors that match based on patterns in the column names which make it more convenient to specify common patterns you may want to check for, like the function cs.ends_with that was shown above.

Combining selectors with set operations

We can combine multiple selectors using set operations and the usual Python operators:

Operator	Operation
`A \| B`	Union
`A & B`	Intersection
`A - B`	Difference
`A ^ B`	Symmetric difference
`~A`	Complement

The next example matches all non-string columns that contain an underscore in the name:

Python Rust

selectors

result = df.select(cs.contains("_") - cs.string())
print(result)

// Selectors are not available in Rust yet.
// Refer to https://github.com/pola-rs/polars/issues/10594

shape: (5, 4)
┌──────────┬─────────┬───────────┬──────────┐
│ day_high ┆ day_low ┆ year_high ┆ year_low │
│ ---      ┆ ---     ┆ ---       ┆ ---      │
│ f64      ┆ f64     ┆ f64       ┆ f64      │
╞══════════╪═════════╪═══════════╪══════════╡
│ 231.31   ┆ 228.6   ┆ 237.23    ┆ 164.08   │
│ 139.6    ┆ 136.3   ┆ 140.76    ┆ 39.23    │
│ 424.04   ┆ 417.52  ┆ 468.35    ┆ 324.39   │
│ 167.62   ┆ 164.78  ┆ 193.31    ┆ 121.46   │
│ 189.83   ┆ 188.44  ┆ 201.2     ┆ 118.35   │
└──────────┴─────────┴───────────┴──────────┘

Resolving operator ambiguity

Expression functions can be chained on top of selectors:

Python Rust

selectors

result = df.select((cs.contains("_") - cs.string()) / eur_usd_rate)
print(result)

// Selectors are not available in Rust yet.
// Refer to https://github.com/pola-rs/polars/issues/10594

shape: (5, 4)
┌────────────┬────────────┬────────────┬────────────┐
│ day_high   ┆ day_low    ┆ year_high  ┆ year_low   │
│ ---        ┆ ---        ┆ ---        ┆ ---        │
│ f64        ┆ f64        ┆ f64        ┆ f64        │
╞════════════╪════════════╪════════════╪════════════╡
│ 212.211009 ┆ 209.724771 ┆ 217.642202 ┆ 150.53211  │
│ 128.073394 ┆ 125.045872 ┆ 129.137615 ┆ 35.990826  │
│ 389.027523 ┆ 383.045872 ┆ 429.678899 ┆ 297.605505 │
│ 153.779817 ┆ 151.174312 ┆ 177.348624 ┆ 111.431193 │
│ 174.155963 ┆ 172.880734 ┆ 184.587156 ┆ 108.577982 │
└────────────┴────────────┴────────────┴────────────┘

However, some operators have been overloaded to operate both on Polars selectors and on expressions. For example, the operator ~ on a selector represents the set operation “complement” and on an expression represents the Boolean operation of negation.

When you use a selector and then want to use, in the context of an expression, one of the operators that act as set operators for selectors, you can use the function as_expr.

Below, we want to negate the Boolean values in the columns “has_partner”, “has_kids”, and “has_tattoos”. If we are not careful, the combination of the operator ~ and the selector cs.starts_with("has_") will actually select the columns that we do not care about:

Python Rust

people = pl.DataFrame(
    {
        "name": ["Anna", "Bob"],
        "has_partner": [True, False],
        "has_kids": [False, False],
        "has_tattoos": [True, False],
        "is_alive": [True, True],
    }
)

wrong_result = people.select((~cs.starts_with("has_")).name.prefix("not_"))
print(wrong_result)

// Selectors are not available in Rust yet.
// Refer to https://github.com/pola-rs/polars/issues/10594

shape: (2, 2)
┌──────────┬──────────────┐
│ not_name ┆ not_is_alive │
│ ---      ┆ ---          │
│ str      ┆ bool         │
╞══════════╪══════════════╡
│ Anna     ┆ true         │
│ Bob      ┆ true         │
└──────────┴──────────────┘

The correct solution uses as_expr:

Python Rust

result = people.select((~cs.starts_with("has_").as_expr()).name.prefix("not_"))
print(result)

// Selectors are not available in Rust yet.
// Refer to https://github.com/pola-rs/polars/issues/10594

shape: (2, 3)
┌─────────────────┬──────────────┬─────────────────┐
│ not_has_partner ┆ not_has_kids ┆ not_has_tattoos │
│ ---             ┆ ---          ┆ ---             │
│ bool            ┆ bool         ┆ bool            │
╞═════════════════╪══════════════╪═════════════════╡
│ false           ┆ true         ┆ false           │
│ true            ┆ true         ┆ true            │
└─────────────────┴──────────────┴─────────────────┘

Debugging selectors

When you are not sure whether you have a Polars selector at hand or not, you can use the function cs.is_selector to check:

Python Rust

is_selector

print(cs.is_selector(~cs.starts_with("has_").as_expr()))

// Selectors are not available in Rust yet.
// Refer to https://github.com/pola-rs/polars/issues/10594

False

This should help you avoid any ambiguous situations where you think you are operating with expressions but are in fact operating with selectors.

Another helpful debugging utility is the function expand_selector. Given a target frame or schema, you can check what columns a given selector will expand to:

Python Rust

expand_selector

print(
    cs.expand_selector(
        people,
        cs.starts_with("has_"),
    )
)

// Selectors are not available in Rust yet.
// Refer to https://github.com/pola-rs/polars/issues/10594

('has_partner', 'has_kids', 'has_tattoos')

Complete reference

The tables below group the functions available in the submodule selectors by their type of behaviour.

Selectors for data types

Selectors that match based on the data type of the column:

Selector function	Data type(s) matched
`binary`	`Binary`
`boolean`	`Boolean`
`by_dtype`	Data types specified as arguments
`categorical`	`Categorical`
`date`	`Date`
`datetime`	`Datetime`, optionally filtering by time unit/zone
`decimal`	`Decimal`
`duration`	`Duration`, optionally filtering by time unit
`float`	All float types, regardless of precision
`integer`	All integer types, signed and unsigned, regardless of precision
`numeric`	All numeric types, namely integers, floats, and `Decimal`
`signed_integer`	All signed integer types, regardless of precision
`string`	`String`
`temporal`	All temporal data types, namely `Date`, `Datetime`, and `Duration`
`time`	`Time`
`unsigned_integer`	All unsigned integer types, regardless of precision

Selectors for column name patterns

Selectors that match based on column name patterns:

Selector function	Columns selected
`alpha`	Columns with alphabetical names
`alphanumeric`	Columns with alphanumeric names (letters and the digits 0-9)
`by_name`	Columns with the names specified as arguments
`contains`	Columns whose names contain the substring specified
`digit`	Columns with numeric names (only the digits 0-9)
`ends_with`	Columns whose names end with the given substring
`matches`	Columns whose names match the given regex pattern
`starts_with`	Columns whose names start with the given substring

Positional selectors

Selectors that match based on the position of the columns:

Selector function	Columns selected
`all`	All columns
`by_index`	The columns at the specified indices
`first`	The first column in the context
`last`	The last column in the context

Miscellaneous functions

The submodule selectors also provides the following functions:

Function	Behaviour
`as_expr`*	Convert a selector to an expression
`exclude`	Selects all columns except those matching the given names, data types, or selectors
`expand_selector`	Expand selector to matching columns with respect to a specific frame or target schema
`is_selector`	Check whether the given object/expression is a selector

*as_expr isn't a function defined on the submodule selectors, but rather a method defined on selectors.

Expression expansion

Function col

Explicit expansion by column name

Expansion by data type

Expansion by pattern matching

Arguments cannot be of mixed types

Selecting all columns

Excluding columns

Column renaming

Renaming a single column with alias

Prefixing and suffixing column names

Dynamic name replacement

Programmatically generating expressions

More flexible column selections

Combining selectors with set operations

Resolving operator ambiguity

Debugging selectors

Complete reference

Selectors for data types

Selectors for column name patterns

Positional selectors

Miscellaneous functions

Function `col`

Renaming a single column with `alias`