Expressions#

This page gives an overview of all public Polars expressions.

class polars.Expr[source]

Expressions that can be used in various contexts.

Methods:

`abs`	Compute absolute values.
`add`	Method equivalent of addition operator `expr + other`.
`agg_groups`	Get the group indexes of the group by operation.
`alias`	Rename the expression.
`all`	Return whether all values in the column are `True`.
`and_`	Method equivalent of bitwise "and" operator `expr & other & ...`.
`any`	Return whether any of the values in the column are `True`.
`append`	Append expressions.
`approx_n_unique`	Approximate count of unique values.
`arccos`	Compute the element-wise value for the inverse cosine.
`arccosh`	Compute the element-wise value for the inverse hyperbolic cosine.
`arcsin`	Compute the element-wise value for the inverse sine.
`arcsinh`	Compute the element-wise value for the inverse hyperbolic sine.
`arctan`	Compute the element-wise value for the inverse tangent.
`arctanh`	Compute the element-wise value for the inverse hyperbolic tangent.
`arg_max`	Get the index of the maximal value.
`arg_min`	Get the index of the minimal value.
`arg_sort`	Get the index values that would sort this column.
`arg_true`	Return indices where expression evaluates `True`.
`arg_unique`	Get index of first unique value.
`backward_fill`	Fill missing values with the next non-null value.
`bitwise_and`	Perform an aggregation of bitwise ANDs.
`bitwise_count_ones`	Evaluate the number of set bits.
`bitwise_count_zeros`	Evaluate the number of unset bits.
`bitwise_leading_ones`	Evaluate the number most-significant set bits before seeing an unset bit.
`bitwise_leading_zeros`	Evaluate the number most-significant unset bits before seeing a set bit.
`bitwise_or`	Perform an aggregation of bitwise ORs.
`bitwise_trailing_ones`	Evaluate the number least-significant set bits before seeing an unset bit.
`bitwise_trailing_zeros`	Evaluate the number least-significant unset bits before seeing a set bit.
`bitwise_xor`	Perform an aggregation of bitwise XORs.
`bottom_k`	Return the `k` smallest elements.
`bottom_k_by`	Return the elements corresponding to the `k` smallest elements of the `by` column(s).
`cast`	Cast between data types.
`cbrt`	Compute the cube root of the elements.
`ceil`	Rounds up to the nearest integer value.
`clip`	Set values outside the given boundaries to the boundary value.
`cos`	Compute the element-wise value for the cosine.
`cosh`	Compute the element-wise value for the hyperbolic cosine.
`cot`	Compute the element-wise value for the cotangent.
`count`	Return the number of non-null elements in the column.
`cum_count`	Return the cumulative count of the non-null values in the column.
`cum_max`	Get an array with the cumulative max computed at every element.
`cum_min`	Get an array with the cumulative min computed at every element.
`cum_prod`	Get an array with the cumulative product computed at every element.
`cum_sum`	Get an array with the cumulative sum computed at every element.
`cumulative_eval`	Run an expression over a sliding window that increases `1` slot every iteration.
`cut`	Bin continuous values into discrete categories.
`degrees`	Convert from radians to degrees.
`deserialize`	Read a serialized expression from a file.
`diff`	Calculate the first discrete difference between shifted items.
`dot`	Compute the dot/inner product between two Expressions.
`drop_nans`	Drop all floating point NaN values.
`drop_nulls`	Drop all null values.
`entropy`	Computes the entropy.
`eq`	Method equivalent of equality operator `expr == other`.
`eq_missing`	Method equivalent of equality operator `expr == other` where `None == None`.
`ewm_mean`	Compute exponentially-weighted moving average.
`ewm_mean_by`	Compute time-based exponentially weighted moving average.
`ewm_std`	Compute exponentially-weighted moving standard deviation.
`ewm_var`	Compute exponentially-weighted moving variance.
`exclude`	Exclude columns from a multi-column expression.
`exp`	Compute the exponential, element-wise.
`explode`	Explode a list expression.
`extend_constant`	Extremely fast method for extending the Series with 'n' copies of a value.
`fill_nan`	Fill floating point NaN value with a fill value.
`fill_null`	Fill null values using the specified value or strategy.
`filter`	Filter the expression based on one or more predicate expressions.
`first`	Get the first value.
`flatten`	Flatten a list or string column.
`floor`	Rounds down to the nearest integer value.
`floordiv`	Method equivalent of integer division operator `expr // other`.
`forward_fill`	Fill missing values with the last non-null value.
`from_json`	Read an expression from a JSON encoded string to construct an Expression.
`gather`	Take values by index.
`gather_every`	Take every nth value in the Series and return as a new Series.
`ge`	Method equivalent of "greater than or equal" operator `expr >= other`.
`get`	Return a single value by index.
`gt`	Method equivalent of "greater than" operator `expr > other`.
`has_nulls`	Check whether the expression contains one or more null values.
`hash`	Hash the elements in the selection.
`head`	Get the first `n` rows.
`hist`	Bin values into buckets and count their occurrences.
`implode`	Aggregate values into a list.
`index_of`	Get the index of the first occurrence of a value, or `None` if it's not found.
`inspect`	Print the value that this expression evaluates to and pass on the value.
`interpolate`	Interpolate intermediate values.
`interpolate_by`	Fill null values using interpolation based on another column.
`is_between`	Check if this expression is between the given lower and upper bounds.
`is_close`	Check if this expression is close, i.e. almost equal, to the other expression.
`is_duplicated`	Return a boolean mask indicating duplicated values.
`is_finite`	Returns a boolean Series indicating which values are finite.
`is_first_distinct`	Return a boolean mask indicating the first occurrence of each distinct value.
`is_in`	Check if elements of this expression are present in the other Series.
`is_infinite`	Returns a boolean Series indicating which values are infinite.
`is_last_distinct`	Return a boolean mask indicating the last occurrence of each distinct value.
`is_nan`	Returns a boolean Series indicating which values are NaN.
`is_not_nan`	Returns a boolean Series indicating which values are not NaN.
`is_not_null`	Returns a boolean Series indicating which values are not null.
`is_null`	Returns a boolean Series indicating which values are null.
`is_unique`	Get mask of unique values.
`kurtosis`	Compute the kurtosis (Fisher or Pearson) of a dataset.
`last`	Get the last value.
`le`	Method equivalent of "less than or equal" operator `expr <= other`.
`len`	Return the number of elements in the column.
`limit`	Get the first `n` rows (alias for `Expr.head()`).
`log`	Compute the logarithm to a given base.
`log10`	Compute the base 10 logarithm of the input array, element-wise.
`log1p`	Compute the natural logarithm of each element plus one.
`lower_bound`	Calculate the lower bound.
`lt`	Method equivalent of "less than" operator `expr < other`.
`map_batches`	Apply a custom python function to a whole Series or sequence of Series.
`map_elements`	Map a custom/user-defined function (UDF) to each element of a column.
`max`	Get maximum value.
`mean`	Get mean value.
`median`	Get median value using linear interpolation.
`min`	Get minimum value.
`mod`	Method equivalent of modulus operator `expr % other`.
`mode`	Compute the most occurring value(s).
`mul`	Method equivalent of multiplication operator `expr * other`.
`n_unique`	Count unique values.
`nan_max`	Get maximum value, but propagate/poison encountered NaN values.
`nan_min`	Get minimum value, but propagate/poison encountered NaN values.
`ne`	Method equivalent of inequality operator `expr != other`.
`ne_missing`	Method equivalent of equality operator `expr != other` where `None == None`.
`neg`	Method equivalent of unary minus operator `-expr`.
`not_`	Negate a boolean expression.
`null_count`	Count null values.
`or_`	Method equivalent of bitwise "or" operator `expr \| other \| ...`.
`over`	Compute expressions over the given groups.
`pct_change`	Computes percentage change between values.
`peak_max`	Get a boolean mask of the local maximum peaks.
`peak_min`	Get a boolean mask of the local minimum peaks.
`pipe`	Offers a structured way to apply a sequence of user-defined functions (UDFs).
`pow`	Method equivalent of exponentiation operator `expr ** exponent`.
`product`	Compute the product of an expression.
`qcut`	Bin continuous values into discrete categories based on their quantiles.
`quantile`	Get quantile value.
`radians`	Convert from degrees to radians.
`rank`	Assign ranks to data, dealing with ties appropriately.
`rechunk`	Create a single chunk of memory for this Series.
`register_plugin`	Register a plugin function.
`reinterpret`	Reinterpret the underlying bits as a signed/unsigned integer.
`repeat_by`	Repeat the elements in this Series as specified in the given expression.
`replace`	Replace the given values by different values of the same data type.
`replace_strict`	Replace all values by different values.
`reshape`	Reshape this Expr to a flat column or an Array column.
`reverse`	Reverse the selection.
`rle`	Compress the column data using run-length encoding.
`rle_id`	Get a distinct integer ID for each run of identical values.
`rolling`	Create rolling groups based on a temporal or integer column.
`rolling_kurtosis`	Compute a rolling kurtosis.
`rolling_map`	Compute a custom rolling window function.
`rolling_max`	Apply a rolling max (moving max) over the values in this array.
`rolling_max_by`	Apply a rolling max based on another column.
`rolling_mean`	Apply a rolling mean (moving mean) over the values in this array.
`rolling_mean_by`	Apply a rolling mean based on another column.
`rolling_median`	Compute a rolling median.
`rolling_median_by`	Compute a rolling median based on another column.
`rolling_min`	Apply a rolling min (moving min) over the values in this array.
`rolling_min_by`	Apply a rolling min based on another column.
`rolling_quantile`	Compute a rolling quantile.
`rolling_quantile_by`	Compute a rolling quantile based on another column.
`rolling_skew`	Compute a rolling skew.
`rolling_std`	Compute a rolling standard deviation.
`rolling_std_by`	Compute a rolling standard deviation based on another column.
`rolling_sum`	Apply a rolling sum (moving sum) over the values in this array.
`rolling_sum_by`	Apply a rolling sum based on another column.
`rolling_var`	Compute a rolling variance.
`rolling_var_by`	Compute a rolling variance based on another column.
`round`	Round underlying floating point data by `decimals` digits.
`round_sig_figs`	Round to a number of significant figures.
`sample`	Sample from this expression.
`search_sorted`	Find indices where elements should be inserted to maintain order.
`set_sorted`	Flags the expression as 'sorted'.
`shift`	Shift values by the given number of indices.
`shrink_dtype`	Shrink numeric columns to the minimal required datatype.
`shuffle`	Shuffle the contents of this expression.
`sign`	Compute the element-wise sign function on numeric types.
`sin`	Compute the element-wise value for the sine.
`sinh`	Compute the element-wise value for the hyperbolic sine.
`skew`	Compute the sample skewness of a data set.
`slice`	Get a slice of this expression.
`sort`	Sort this column.
`sort_by`	Sort this column by the ordering of other columns.
`sqrt`	Compute the square root of the elements.
`std`	Get standard deviation.
`sub`	Method equivalent of subtraction operator `expr - other`.
`sum`	Get sum value.
`tail`	Get the last `n` rows.
`tan`	Compute the element-wise value for the tangent.
`tanh`	Compute the element-wise value for the hyperbolic tangent.
`to_physical`	Cast to physical representation of the logical dtype.
`top_k`	Return the `k` largest elements.
`top_k_by`	Return the elements corresponding to the `k` largest elements of the `by` column(s).
`truediv`	Method equivalent of float division operator `expr / other`.
`unique`	Get unique values of this expression.
`unique_counts`	Return a count of the unique values in the order of appearance.
`upper_bound`	Calculate the upper bound.
`value_counts`	Count the occurrence of unique values.
`var`	Get variance.
`where`	Filter a single column.
`xor`	Method equivalent of bitwise exclusive-or operator `expr ^ other`.

abs() → Expr[source]

Compute absolute values.

Same as abs(expr).

Examples

>>> df = pl.DataFrame(
...     {
...         "A": [-1.0, 0.0, 1.0, 2.0],
...     }
... )
>>> df.select(pl.col("A").abs())
shape: (4, 1)
┌─────┐
│ A   │
│ --- │
│ f64 │
╞═════╡
│ 1.0 │
│ 0.0 │
│ 1.0 │
│ 2.0 │
└─────┘

add(other: Any) → Expr[source]

Method equivalent of addition operator expr + other.

Parameters:

other: numeric or string value; accepts expression input.

Examples

>>> df = pl.DataFrame({"x": [1, 2, 3, 4, 5]})
>>> df.with_columns(
...     pl.col("x").add(2).alias("x+int"),
...     pl.col("x").add(pl.col("x").cum_prod()).alias("x+expr"),
... )
shape: (5, 3)
┌─────┬───────┬────────┐
│ x   ┆ x+int ┆ x+expr │
│ --- ┆ ---   ┆ ---    │
│ i64 ┆ i64   ┆ i64    │
╞═════╪═══════╪════════╡
│ 1   ┆ 3     ┆ 2      │
│ 2   ┆ 4     ┆ 4      │
│ 3   ┆ 5     ┆ 9      │
│ 4   ┆ 6     ┆ 28     │
│ 5   ┆ 7     ┆ 125    │
└─────┴───────┴────────┘

>>> df = pl.DataFrame(
...     {"x": ["a", "d", "g"], "y": ["b", "e", "h"], "z": ["c", "f", "i"]}
... )
>>> df.with_columns(pl.col("x").add(pl.col("y")).add(pl.col("z")).alias("xyz"))
shape: (3, 4)
┌─────┬─────┬─────┬─────┐
│ x   ┆ y   ┆ z   ┆ xyz │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str │
╞═════╪═════╪═════╪═════╡
│ a   ┆ b   ┆ c   ┆ abc │
│ d   ┆ e   ┆ f   ┆ def │
│ g   ┆ h   ┆ i   ┆ ghi │
└─────┴─────┴─────┴─────┘

agg_groups() → Expr[source]

Get the group indexes of the group by operation.

Should be used in aggregation context only.

Examples

>>> df = pl.DataFrame(
...     {
...         "group": [
...             "one",
...             "one",
...             "one",
...             "two",
...             "two",
...             "two",
...         ],
...         "value": [94, 95, 96, 97, 97, 99],
...     }
... )
>>> df.group_by("group", maintain_order=True).agg(pl.col("value").agg_groups())
shape: (2, 2)
┌───────┬───────────┐
│ group ┆ value     │
│ ---   ┆ ---       │
│ str   ┆ list[u32] │
╞═══════╪═══════════╡
│ one   ┆ [0, 1, 2] │
│ two   ┆ [3, 4, 5] │
└───────┴───────────┘

alias(name: str) → Expr[source]

Rename the expression.

Parameters:

name: The new name.

See also

name.map
name.prefix
name.suffix

Examples

Rename an expression to avoid overwriting an existing column.

>>> df = pl.DataFrame(
...     {
...         "a": [1, 2, 3],
...         "b": ["x", "y", "z"],
...     }
... )
>>> df.with_columns(
...     pl.col("a") + 10,
...     pl.col("b").str.to_uppercase().alias("c"),
... )
shape: (3, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str │
╞═════╪═════╪═════╡
│ 11  ┆ x   ┆ X   │
│ 12  ┆ y   ┆ Y   │
│ 13  ┆ z   ┆ Z   │
└─────┴─────┴─────┘

Overwrite the default name of literal columns to prevent errors due to duplicate column names.

>>> df.with_columns(
...     pl.lit(True).alias("c"),
...     pl.lit(4.0).alias("d"),
... )
shape: (3, 4)
┌─────┬─────┬──────┬─────┐
│ a   ┆ b   ┆ c    ┆ d   │
│ --- ┆ --- ┆ ---  ┆ --- │
│ i64 ┆ str ┆ bool ┆ f64 │
╞═════╪═════╪══════╪═════╡
│ 1   ┆ x   ┆ true ┆ 4.0 │
│ 2   ┆ y   ┆ true ┆ 4.0 │
│ 3   ┆ z   ┆ true ┆ 4.0 │
└─────┴─────┴──────┴─────┘

all(*, ignore_nulls: bool = True) → Expr[source]

Return whether all values in the column are True.

Only works on columns of data type Boolean.

Note

This method is not to be confused with the function polars.all(), which can be used to select all columns.

Parameters:

ignore_nulls

If set to True (default), null values are ignored. If there are no non-null values, the output is True.
If set to False, Kleene logic is used to deal with nulls: if the column contains any null values and no False values, the output is null.

Returns:

Expr: Expression of data type Boolean.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [True, True],
...         "b": [False, True],
...         "c": [None, True],
...     }
... )
>>> df.select(pl.col("*").all())
shape: (1, 3)
┌──────┬───────┬──────┐
│ a    ┆ b     ┆ c    │
│ ---  ┆ ---   ┆ ---  │
│ bool ┆ bool  ┆ bool │
╞══════╪═══════╪══════╡
│ true ┆ false ┆ true │
└──────┴───────┴──────┘

Enable Kleene logic by setting ignore_nulls=False.

>>> df.select(pl.col("*").all(ignore_nulls=False))
shape: (1, 3)
┌──────┬───────┬──────┐
│ a    ┆ b     ┆ c    │
│ ---  ┆ ---   ┆ ---  │
│ bool ┆ bool  ┆ bool │
╞══════╪═══════╪══════╡
│ true ┆ false ┆ null │
└──────┴───────┴──────┘

and_(*others: Any) → Expr[source]

Method equivalent of bitwise “and” operator expr & other & ....

Parameters:

*others: One or more integer or boolean expressions to evaluate/combine.

Examples

>>> df = pl.DataFrame(
...     data={
...         "x": [5, 6, 7, 4, 8],
...         "y": [1.5, 2.5, 1.0, 4.0, -5.75],
...         "z": [-9, 2, -1, 4, 8],
...     }
... )
>>> df.select(
...     (pl.col("x") >= pl.col("z"))
...     .and_(
...         pl.col("y") >= pl.col("z"),
...         pl.col("y") == pl.col("y"),
...         pl.col("z") <= pl.col("x"),
...         pl.col("y") != pl.col("x"),
...     )
...     .alias("all")
... )
shape: (5, 1)
┌───────┐
│ all   │
│ ---   │
│ bool  │
╞═══════╡
│ true  │
│ true  │
│ true  │
│ false │
│ false │
└───────┘

any(*, ignore_nulls: bool = True) → Expr[source]

Return whether any of the values in the column are True.

Only works on columns of data type Boolean.

Parameters:

ignore_nulls

If set to True (default), null values are ignored. If there are no non-null values, the output is False.
If set to False, Kleene logic is used to deal with nulls: if the column contains any null values and no True values, the output is null.

Returns:

Expr: Expression of data type Boolean.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [True, False],
...         "b": [False, False],
...         "c": [None, False],
...     }
... )
>>> df.select(pl.col("*").any())
shape: (1, 3)
┌──────┬───────┬───────┐
│ a    ┆ b     ┆ c     │
│ ---  ┆ ---   ┆ ---   │
│ bool ┆ bool  ┆ bool  │
╞══════╪═══════╪═══════╡
│ true ┆ false ┆ false │
└──────┴───────┴───────┘

Enable Kleene logic by setting ignore_nulls=False.

>>> df.select(pl.col("*").any(ignore_nulls=False))
shape: (1, 3)
┌──────┬───────┬──────┐
│ a    ┆ b     ┆ c    │
│ ---  ┆ ---   ┆ ---  │
│ bool ┆ bool  ┆ bool │
╞══════╪═══════╪══════╡
│ true ┆ false ┆ null │
└──────┴───────┴──────┘

append(other: IntoExpr, *, upcast: bool = True) → Expr[source]

Append expressions.

This is done by adding the chunks of other to this Series.

Parameters:

other: Expression to append.
upcast: Cast both Series to the same supertype.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [8, 9, 10],
...         "b": [None, 4, 4],
...     }
... )
>>> df.select(pl.all().head(1).append(pl.all().tail(1)))
shape: (2, 2)
┌─────┬──────┐
│ a   ┆ b    │
│ --- ┆ ---  │
│ i64 ┆ i64  │
╞═════╪══════╡
│ 8   ┆ null │
│ 10  ┆ 4    │
└─────┴──────┘

approx_n_unique() → Expr[source]

Approximate count of unique values.

This is done using the HyperLogLog++ algorithm for cardinality estimation.

Examples

>>> df = pl.DataFrame({"n": [1, 1, 2]})
>>> df.select(pl.col("n").approx_n_unique())
shape: (1, 1)
┌─────┐
│ n   │
│ --- │
│ u32 │
╞═════╡
│ 2   │
└─────┘
>>> df = pl.DataFrame({"n": range(1000)})
>>> df.select(
...     exact=pl.col("n").n_unique(),
...     approx=pl.col("n").approx_n_unique(),
... )  
shape: (1, 2)
┌───────┬────────┐
│ exact ┆ approx │
│ ---   ┆ ---    │
│ u32   ┆ u32    │
╞═══════╪════════╡
│ 1000  ┆ 1005   │
└───────┴────────┘

arccos() → Expr[source]

Compute the element-wise value for the inverse cosine.

Returns:

Expr: Expression of data type Float64.

Examples

>>> df = pl.DataFrame({"a": [0.0]})
>>> df.select(pl.col("a").arccos())
shape: (1, 1)
┌──────────┐
│ a        │
│ ---      │
│ f64      │
╞══════════╡
│ 1.570796 │
└──────────┘

arccosh() → Expr[source]

Compute the element-wise value for the inverse hyperbolic cosine.

Returns:

Expr: Expression of data type Float64.

Examples

>>> df = pl.DataFrame({"a": [1.0]})
>>> df.select(pl.col("a").arccosh())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 0.0 │
└─────┘

arcsin() → Expr[source]

Compute the element-wise value for the inverse sine.

Returns:

Expr: Expression of data type Float64.

Examples

>>> df = pl.DataFrame({"a": [1.0]})
>>> df.select(pl.col("a").arcsin())
shape: (1, 1)
┌──────────┐
│ a        │
│ ---      │
│ f64      │
╞══════════╡
│ 1.570796 │
└──────────┘

arcsinh() → Expr[source]

Compute the element-wise value for the inverse hyperbolic sine.

Returns:

Expr: Expression of data type Float64.

Examples

>>> df = pl.DataFrame({"a": [1.0]})
>>> df.select(pl.col("a").arcsinh())
shape: (1, 1)
┌──────────┐
│ a        │
│ ---      │
│ f64      │
╞══════════╡
│ 0.881374 │
└──────────┘

arctan() → Expr[source]

Compute the element-wise value for the inverse tangent.

Returns:

Expr: Expression of data type Float64.

Examples

>>> df = pl.DataFrame({"a": [1.0]})
>>> df.select(pl.col("a").arctan())
shape: (1, 1)
┌──────────┐
│ a        │
│ ---      │
│ f64      │
╞══════════╡
│ 0.785398 │
└──────────┘

arctanh() → Expr[source]

Compute the element-wise value for the inverse hyperbolic tangent.

Returns:

Expr: Expression of data type Float64.

Examples

>>> df = pl.DataFrame({"a": [1.0]})
>>> df.select(pl.col("a").arctanh())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ inf │
└─────┘

arg_max() → Expr[source]

Get the index of the maximal value.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [20, 10, 30],
...     }
... )
>>> df.select(pl.col("a").arg_max())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ u32 │
╞═════╡
│ 2   │
└─────┘

arg_min() → Expr[source]

Get the index of the minimal value.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [20, 10, 30],
...     }
... )
>>> df.select(pl.col("a").arg_min())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ u32 │
╞═════╡
│ 1   │
└─────┘

arg_sort( *, descending: bool = False, nulls_last: bool = False, ) → Expr[source]

Get the index values that would sort this column.

Parameters:

descending: Sort in descending (descending) order.
nulls_last: Place null values last instead of first.

Returns:

Expr: Expression of data type UInt32.

See also

Expr.gather: Take values by index.
Expr.rank: Get the rank of each row.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [20, 10, 30],
...         "b": [1, 2, 3],
...     }
... )
>>> df.select(pl.col("a").arg_sort())
shape: (3, 1)
┌─────┐
│ a   │
│ --- │
│ u32 │
╞═════╡
│ 1   │
│ 0   │
│ 2   │
└─────┘

Use gather to apply the arg sort to other columns.

>>> df.select(pl.col("b").gather(pl.col("a").arg_sort()))
shape: (3, 1)
┌─────┐
│ b   │
│ --- │
│ i64 │
╞═════╡
│ 2   │
│ 1   │
│ 3   │
└─────┘

arg_true() → Expr[source]

Return indices where expression evaluates True.

Warning

Modifies number of rows returned, so will fail in combination with other expressions. Use as only expression in select / with_columns.

See also

Series.arg_true: Return indices where Series is True
polars.arg_where

Examples

>>> df = pl.DataFrame({"a": [1, 1, 2, 1]})
>>> df.select((pl.col("a") == 1).arg_true())
shape: (3, 1)
┌─────┐
│ a   │
│ --- │
│ u32 │
╞═════╡
│ 0   │
│ 1   │
│ 3   │
└─────┘

arg_unique() → Expr[source]

Get index of first unique value.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [8, 9, 10],
...         "b": [None, 4, 4],
...     }
... )
>>> df.select(pl.col("a").arg_unique())
shape: (3, 1)
┌─────┐
│ a   │
│ --- │
│ u32 │
╞═════╡
│ 0   │
│ 1   │
│ 2   │
└─────┘
>>> df.select(pl.col("b").arg_unique())
shape: (2, 1)
┌─────┐
│ b   │
│ --- │
│ u32 │
╞═════╡
│ 0   │
│ 1   │
└─────┘

backward_fill(limit: int | None = None) → Expr[source]

Fill missing values with the next non-null value.

This is an alias of .fill_null(strategy="backward").

Parameters:

limit: The number of consecutive null values to backward fill.

See also

fill_null
forward_fill
shift

bitwise_and() → Expr[source]

Perform an aggregation of bitwise ANDs.

Examples

>>> df = pl.DataFrame({"n": [-1, 0, 1]})
>>> df.select(pl.col("n").bitwise_and())
shape: (1, 1)
┌─────┐
│ n   │
│ --- │
│ i64 │
╞═════╡
│ 0   │
└─────┘
>>> df = pl.DataFrame(
...     {"grouper": ["a", "a", "a", "b", "b"], "n": [-1, 0, 1, -1, 1]}
... )
>>> df.group_by("grouper", maintain_order=True).agg(pl.col("n").bitwise_and())
shape: (2, 2)
┌─────────┬─────┐
│ grouper ┆ n   │
│ ---     ┆ --- │
│ str     ┆ i64 │
╞═════════╪═════╡
│ a       ┆ 0   │
│ b       ┆ 1   │
└─────────┴─────┘

bitwise_count_ones() → Expr[source]: Evaluate the number of set bits.

bitwise_count_zeros() → Expr[source]: Evaluate the number of unset bits.

bitwise_leading_ones() → Expr[source]: Evaluate the number most-significant set bits before seeing an unset bit.

bitwise_leading_zeros() → Expr[source]: Evaluate the number most-significant unset bits before seeing a set bit.

bitwise_or() → Expr[source]

Perform an aggregation of bitwise ORs.

Examples

>>> df = pl.DataFrame({"n": [-1, 0, 1]})
>>> df.select(pl.col("n").bitwise_or())
shape: (1, 1)
┌─────┐
│ n   │
│ --- │
│ i64 │
╞═════╡
│ -1  │
└─────┘
>>> df = pl.DataFrame(
...     {"grouper": ["a", "a", "a", "b", "b"], "n": [-1, 0, 1, -1, 1]}
... )
>>> df.group_by("grouper", maintain_order=True).agg(pl.col("n").bitwise_or())
shape: (2, 2)
┌─────────┬─────┐
│ grouper ┆ n   │
│ ---     ┆ --- │
│ str     ┆ i64 │
╞═════════╪═════╡
│ a       ┆ -1  │
│ b       ┆ -1  │
└─────────┴─────┘

bitwise_trailing_ones() → Expr[source]: Evaluate the number least-significant set bits before seeing an unset bit.

bitwise_trailing_zeros() → Expr[source]: Evaluate the number least-significant unset bits before seeing a set bit.

bitwise_xor() → Expr[source]

Perform an aggregation of bitwise XORs.

Examples

>>> df = pl.DataFrame({"n": [-1, 0, 1]})
>>> df.select(pl.col("n").bitwise_xor())
shape: (1, 1)
┌─────┐
│ n   │
│ --- │
│ i64 │
╞═════╡
│ -2  │
└─────┘
>>> df = pl.DataFrame(
...     {"grouper": ["a", "a", "a", "b", "b"], "n": [-1, 0, 1, -1, 1]}
... )
>>> df.group_by("grouper", maintain_order=True).agg(pl.col("n").bitwise_xor())
shape: (2, 2)
┌─────────┬─────┐
│ grouper ┆ n   │
│ ---     ┆ --- │
│ str     ┆ i64 │
╞═════════╪═════╡
│ a       ┆ -2  │
│ b       ┆ -2  │
└─────────┴─────┘

bottom_k(k: int | IntoExprColumn = 5) → Expr[source]

Return the k smallest elements.

Non-null elements are always preferred over null elements. The output is not guaranteed to be in any particular order, call sort() after this function if you wish the output to be sorted.

This has time complexity:

\[O(n)\]

Parameters:

k: Number of elements to return.

See also

top_k
top_k_by
bottom_k_by

Examples

>>> df = pl.DataFrame(
...     {
...         "value": [1, 98, 2, 3, 99, 4],
...     }
... )
>>> df.select(
...     pl.col("value").top_k().alias("top_k"),
...     pl.col("value").bottom_k().alias("bottom_k"),
... )
shape: (5, 2)
┌───────┬──────────┐
│ top_k ┆ bottom_k │
│ ---   ┆ ---      │
│ i64   ┆ i64      │
╞═══════╪══════════╡
│ 4     ┆ 1        │
│ 98    ┆ 98       │
│ 2     ┆ 2        │
│ 3     ┆ 3        │
│ 99    ┆ 4        │
└───────┴──────────┘

bottom_k_by( by: IntoExpr | Iterable[IntoExpr], k: int | IntoExprColumn = 5, *, reverse: bool | Sequence[bool] = False, ) → Expr[source]

Return the elements corresponding to the k smallest elements of the by column(s).

Non-null elements are always preferred over null elements, regardless of the value of reverse. The output is not guaranteed to be in any particular order, call sort() after this function if you wish the output to be sorted.

This has time complexity:

\[O(n \log{n})\]

Changed in version 1.0.0: The descending parameter was renamed reverse.

Parameters:

by: Column(s) used to determine the smallest elements. Accepts expression input. Strings are parsed as column names.
k: Number of elements to return.
reverse: Consider the k largest elements of the by column(s) (instead of the k smallest). This can be specified per column by passing a sequence of booleans.

See also

top_k
top_k_by
bottom_k

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 2, 3, 4, 5, 6],
...         "b": [6, 5, 4, 3, 2, 1],
...         "c": ["Apple", "Orange", "Apple", "Apple", "Banana", "Banana"],
...     }
... )
>>> df
shape: (6, 3)
┌─────┬─────┬────────┐
│ a   ┆ b   ┆ c      │
│ --- ┆ --- ┆ ---    │
│ i64 ┆ i64 ┆ str    │
╞═════╪═════╪════════╡
│ 1   ┆ 6   ┆ Apple  │
│ 2   ┆ 5   ┆ Orange │
│ 3   ┆ 4   ┆ Apple  │
│ 4   ┆ 3   ┆ Apple  │
│ 5   ┆ 2   ┆ Banana │
│ 6   ┆ 1   ┆ Banana │
└─────┴─────┴────────┘

Get the bottom 2 rows by column a or b.

>>> df.select(
...     pl.all().bottom_k_by("a", 2).name.suffix("_btm_by_a"),
...     pl.all().bottom_k_by("b", 2).name.suffix("_btm_by_b"),
... )
shape: (2, 6)
┌────────────┬────────────┬────────────┬────────────┬────────────┬────────────┐
│ a_btm_by_a ┆ b_btm_by_a ┆ c_btm_by_a ┆ a_btm_by_b ┆ b_btm_by_b ┆ c_btm_by_b │
│ ---        ┆ ---        ┆ ---        ┆ ---        ┆ ---        ┆ ---        │
│ i64        ┆ i64        ┆ str        ┆ i64        ┆ i64        ┆ str        │
╞════════════╪════════════╪════════════╪════════════╪════════════╪════════════╡
│ 1          ┆ 6          ┆ Apple      ┆ 6          ┆ 1          ┆ Banana     │
│ 2          ┆ 5          ┆ Orange     ┆ 5          ┆ 2          ┆ Banana     │
└────────────┴────────────┴────────────┴────────────┴────────────┴────────────┘

Get the bottom 2 rows by multiple columns with given order.

>>> df.select(
...     pl.all()
...     .bottom_k_by(["c", "a"], 2, reverse=[False, True])
...     .name.suffix("_by_ca"),
...     pl.all()
...     .bottom_k_by(["c", "b"], 2, reverse=[False, True])
...     .name.suffix("_by_cb"),
... )
shape: (2, 6)
┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
│ a_by_ca ┆ b_by_ca ┆ c_by_ca ┆ a_by_cb ┆ b_by_cb ┆ c_by_cb │
│ ---     ┆ ---     ┆ ---     ┆ ---     ┆ ---     ┆ ---     │
│ i64     ┆ i64     ┆ str     ┆ i64     ┆ i64     ┆ str     │
╞═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ 4       ┆ 3       ┆ Apple   ┆ 1       ┆ 6       ┆ Apple   │
│ 3       ┆ 4       ┆ Apple   ┆ 3       ┆ 4       ┆ Apple   │
└─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘

Get the bottom 2 rows by column a in each group.

>>> (
...     df.group_by("c", maintain_order=True)
...     .agg(pl.all().bottom_k_by("a", 2))
...     .explode(pl.all().exclude("c"))
... )
shape: (5, 3)
┌────────┬─────┬─────┐
│ c      ┆ a   ┆ b   │
│ ---    ┆ --- ┆ --- │
│ str    ┆ i64 ┆ i64 │
╞════════╪═════╪═════╡
│ Apple  ┆ 1   ┆ 6   │
│ Apple  ┆ 3   ┆ 4   │
│ Orange ┆ 2   ┆ 5   │
│ Banana ┆ 5   ┆ 2   │
│ Banana ┆ 6   ┆ 1   │
└────────┴─────┴─────┘

cast( dtype: PolarsDataType | DataTypeExpr | type[Any], *, strict: bool = True, wrap_numerical: bool = False, ) → Expr[source]

Cast between data types.

Parameters:

dtype: DataType to cast to.
strict: Raise if cast is invalid on rows after predicates are pushed down. If False, invalid casts will produce null values.
wrap_numerical: If True numeric casts wrap overflowing values instead of marking the cast as invalid.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 2, 3],
...         "b": ["4", "5", "6"],
...     }
... )
>>> df.with_columns(
...     pl.col("a").cast(pl.Float64),
...     pl.col("b").cast(pl.Int32),
... )
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ f64 ┆ i32 │
╞═════╪═════╡
│ 1.0 ┆ 4   │
│ 2.0 ┆ 5   │
│ 3.0 ┆ 6   │
└─────┴─────┘

cbrt() → Expr[source]

Compute the cube root of the elements.

Examples

>>> df = pl.DataFrame({"values": [1.0, 2.0, 4.0]})
>>> df.select(pl.col("values").cbrt())
shape: (3, 1)
┌──────────┐
│ values   │
│ ---      │
│ f64      │
╞══════════╡
│ 1.0      │
│ 1.259921 │
│ 1.587401 │
└──────────┘

ceil() → Expr[source]

Rounds up to the nearest integer value.

Only works on floating point Series.

Examples

>>> df = pl.DataFrame({"a": [0.3, 0.5, 1.0, 1.1]})
>>> df.select(pl.col("a").ceil())
shape: (4, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 1.0 │
│ 1.0 │
│ 1.0 │
│ 2.0 │
└─────┘

Set values outside the given boundaries to the boundary value.

Parameters:

lower_bound: Lower bound. Accepts expression input. Non-expression inputs are parsed as literals. Strings are parsed as column names.
upper_bound: Upper bound. Accepts expression input. Non-expression inputs are parsed as literals. Strings are parsed as column names.

See also

when

Notes

This method only works for numeric and temporal columns. To clip other data types, consider writing a when-then-otherwise expression. See when().

Examples

Specifying both a lower and upper bound:

>>> df = pl.DataFrame({"a": [-50, 5, 50, None]})
>>> df.with_columns(clip=pl.col("a").clip(1, 10))
shape: (4, 2)
┌──────┬──────┐
│ a    ┆ clip │
│ ---  ┆ ---  │
│ i64  ┆ i64  │
╞══════╪══════╡
│ -50  ┆ 1    │
│ 5    ┆ 5    │
│ 50   ┆ 10   │
│ null ┆ null │
└──────┴──────┘

Specifying only a single bound:

>>> df.with_columns(clip=pl.col("a").clip(upper_bound=10))
shape: (4, 2)
┌──────┬──────┐
│ a    ┆ clip │
│ ---  ┆ ---  │
│ i64  ┆ i64  │
╞══════╪══════╡
│ -50  ┆ -50  │
│ 5    ┆ 5    │
│ 50   ┆ 10   │
│ null ┆ null │
└──────┴──────┘

Using columns as bounds:

>>> df = pl.DataFrame(
...     {"a": [-50, 5, 50, None], "low": [10, 1, 0, 0], "up": [20, 4, 3, 2]}
... )
>>> df.with_columns(clip=pl.col("a").clip("low", "up"))
shape: (4, 4)
┌──────┬─────┬─────┬──────┐
│ a    ┆ low ┆ up  ┆ clip │
│ ---  ┆ --- ┆ --- ┆ ---  │
│ i64  ┆ i64 ┆ i64 ┆ i64  │
╞══════╪═════╪═════╪══════╡
│ -50  ┆ 10  ┆ 20  ┆ 10   │
│ 5    ┆ 1   ┆ 4   ┆ 4    │
│ 50   ┆ 0   ┆ 3   ┆ 3    │
│ null ┆ 0   ┆ 2   ┆ null │
└──────┴─────┴─────┴──────┘

cos() → Expr[source]

Compute the element-wise value for the cosine.

Returns:

Expr: Expression of data type Float64.

Examples

>>> df = pl.DataFrame({"a": [0.0]})
>>> df.select(pl.col("a").cos())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 1.0 │
└─────┘

cosh() → Expr[source]

Compute the element-wise value for the hyperbolic cosine.

Returns:

Expr: Expression of data type Float64.

Examples

>>> df = pl.DataFrame({"a": [1.0]})
>>> df.select(pl.col("a").cosh())
shape: (1, 1)
┌──────────┐
│ a        │
│ ---      │
│ f64      │
╞══════════╡
│ 1.543081 │
└──────────┘

cot() → Expr[source]

Compute the element-wise value for the cotangent.

Returns:

Expr: Expression of data type Float64.

Examples

>>> df = pl.DataFrame({"a": [1.0]})
>>> df.select(pl.col("a").cot().round(2))
shape: (1, 1)
┌──────┐
│ a    │
│ ---  │
│ f64  │
╞══════╡
│ 0.64 │
└──────┘

count() → Expr[source]

Return the number of non-null elements in the column.

Returns:

Expr: Expression of data type UInt32.

See also

len

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3], "b": [None, 4, 4]})
>>> df.select(pl.all().count())
shape: (1, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ u32 ┆ u32 │
╞═════╪═════╡
│ 3   ┆ 2   │
└─────┴─────┘

cum_count(*, reverse: bool = False) → Expr[source]

Return the cumulative count of the non-null values in the column.

Parameters:

reverse: Reverse the operation.

Examples

>>> df = pl.DataFrame({"a": ["x", "k", None, "d"]})
>>> df.with_columns(
...     pl.col("a").cum_count().alias("cum_count"),
...     pl.col("a").cum_count(reverse=True).alias("cum_count_reverse"),
... )
shape: (4, 3)
┌──────┬───────────┬───────────────────┐
│ a    ┆ cum_count ┆ cum_count_reverse │
│ ---  ┆ ---       ┆ ---               │
│ str  ┆ u32       ┆ u32               │
╞══════╪═══════════╪═══════════════════╡
│ x    ┆ 1         ┆ 3                 │
│ k    ┆ 2         ┆ 2                 │
│ null ┆ 2         ┆ 1                 │
│ d    ┆ 3         ┆ 1                 │
└──────┴───────────┴───────────────────┘

cum_max(*, reverse: bool = False) → Expr[source]

Get an array with the cumulative max computed at every element.

Parameters:

reverse: Reverse the operation.

Examples

>>> df = pl.DataFrame({"a": [1, 3, 2]})
>>> df.with_columns(
...     pl.col("a").cum_max().alias("cum_max"),
...     pl.col("a").cum_max(reverse=True).alias("cum_max_reverse"),
... )
shape: (3, 3)
┌─────┬─────────┬─────────────────┐
│ a   ┆ cum_max ┆ cum_max_reverse │
│ --- ┆ ---     ┆ ---             │
│ i64 ┆ i64     ┆ i64             │
╞═════╪═════════╪═════════════════╡
│ 1   ┆ 1       ┆ 3               │
│ 3   ┆ 3       ┆ 3               │
│ 2   ┆ 3       ┆ 2               │
└─────┴─────────┴─────────────────┘

Null values are excluded, but can also be filled by calling fill_null(strategy="forward").

>>> df = pl.DataFrame({"values": [None, 10, None, 8, 9, None, 16, None]})
>>> df.with_columns(
...     pl.col("values").cum_max().alias("cum_max"),
...     pl.col("values")
...     .cum_max()
...     .fill_null(strategy="forward")
...     .alias("cum_max_all_filled"),
... )
shape: (8, 3)
┌────────┬─────────┬────────────────────┐
│ values ┆ cum_max ┆ cum_max_all_filled │
│ ---    ┆ ---     ┆ ---                │
│ i64    ┆ i64     ┆ i64                │
╞════════╪═════════╪════════════════════╡
│ null   ┆ null    ┆ null               │
│ 10     ┆ 10      ┆ 10                 │
│ null   ┆ null    ┆ 10                 │
│ 8      ┆ 10      ┆ 10                 │
│ 9      ┆ 10      ┆ 10                 │
│ null   ┆ null    ┆ 10                 │
│ 16     ┆ 16      ┆ 16                 │
│ null   ┆ null    ┆ 16                 │
└────────┴─────────┴────────────────────┘

cum_min(*, reverse: bool = False) → Expr[source]

Get an array with the cumulative min computed at every element.

Parameters:

reverse: Reverse the operation.

Examples

>>> df = pl.DataFrame({"a": [3, 1, 2]})
>>> df.with_columns(
...     pl.col("a").cum_min().alias("cum_min"),
...     pl.col("a").cum_min(reverse=True).alias("cum_min_reverse"),
... )
shape: (3, 3)
┌─────┬─────────┬─────────────────┐
│ a   ┆ cum_min ┆ cum_min_reverse │
│ --- ┆ ---     ┆ ---             │
│ i64 ┆ i64     ┆ i64             │
╞═════╪═════════╪═════════════════╡
│ 3   ┆ 3       ┆ 1               │
│ 1   ┆ 1       ┆ 1               │
│ 2   ┆ 1       ┆ 2               │
└─────┴─────────┴─────────────────┘

cum_prod(*, reverse: bool = False) → Expr[source]

Get an array with the cumulative product computed at every element.

Parameters:

reverse: Reverse the operation.

Notes

Dtypes in {Int8, UInt8, Int16, UInt16} are cast to Int64 before summing to prevent overflow issues.

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3, 4]})
>>> df.with_columns(
...     pl.col("a").cum_prod().alias("cum_prod"),
...     pl.col("a").cum_prod(reverse=True).alias("cum_prod_reverse"),
... )
shape: (4, 3)
┌─────┬──────────┬──────────────────┐
│ a   ┆ cum_prod ┆ cum_prod_reverse │
│ --- ┆ ---      ┆ ---              │
│ i64 ┆ i64      ┆ i64              │
╞═════╪══════════╪══════════════════╡
│ 1   ┆ 1        ┆ 24               │
│ 2   ┆ 2        ┆ 24               │
│ 3   ┆ 6        ┆ 12               │
│ 4   ┆ 24       ┆ 4                │
└─────┴──────────┴──────────────────┘

cum_sum(*, reverse: bool = False) → Expr[source]

Get an array with the cumulative sum computed at every element.

Parameters:

reverse: Reverse the operation.

Notes

Dtypes in {Int8, UInt8, Int16, UInt16} are cast to Int64 before summing to prevent overflow issues.

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3, 4]})
>>> df.with_columns(
...     pl.col("a").cum_sum().alias("cum_sum"),
...     pl.col("a").cum_sum(reverse=True).alias("cum_sum_reverse"),
... )
shape: (4, 3)
┌─────┬─────────┬─────────────────┐
│ a   ┆ cum_sum ┆ cum_sum_reverse │
│ --- ┆ ---     ┆ ---             │
│ i64 ┆ i64     ┆ i64             │
╞═════╪═════════╪═════════════════╡
│ 1   ┆ 1       ┆ 10              │
│ 2   ┆ 3       ┆ 9               │
│ 3   ┆ 6       ┆ 7               │
│ 4   ┆ 10      ┆ 4               │
└─────┴─────────┴─────────────────┘

Null values are excluded, but can also be filled by calling fill_null(strategy="forward").

>>> df = pl.DataFrame({"values": [None, 10, None, 8, 9, None, 16, None]})
>>> df.with_columns(
...     pl.col("values").cum_sum().alias("value_cum_sum"),
...     pl.col("values")
...     .cum_sum()
...     .fill_null(strategy="forward")
...     .alias("value_cum_sum_all_filled"),
... )
shape: (8, 3)
┌────────┬───────────────┬──────────────────────────┐
│ values ┆ value_cum_sum ┆ value_cum_sum_all_filled │
│ ---    ┆ ---           ┆ ---                      │
│ i64    ┆ i64           ┆ i64                      │
╞════════╪═══════════════╪══════════════════════════╡
│ null   ┆ null          ┆ null                     │
│ 10     ┆ 10            ┆ 10                       │
│ null   ┆ null          ┆ 10                       │
│ 8      ┆ 18            ┆ 18                       │
│ 9      ┆ 27            ┆ 27                       │
│ null   ┆ null          ┆ 27                       │
│ 16     ┆ 43            ┆ 43                       │
│ null   ┆ null          ┆ 43                       │
└────────┴───────────────┴──────────────────────────┘

cumulative_eval( expr: Expr, *, min_samples: int = 1, ) → Expr[source]

Run an expression over a sliding window that increases 1 slot every iteration.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

Changed in version 1.21.0: The min_periods parameter was renamed min_samples.

Parameters:

expr: Expression to evaluate
min_samples: Number of valid values there should be in the window before the expression is evaluated. valid values = length - null_count

Warning

This can be really slow as it can have O(n^2) complexity. Don’t use this for operations that visit all elements.

Examples

>>> df = pl.DataFrame({"values": [1, 2, 3, 4, 5]})
>>> df.select(
...     [
...         pl.col("values").cumulative_eval(
...             pl.element().first() - pl.element().last() ** 2
...         )
...     ]
... )
shape: (5, 1)
┌────────┐
│ values │
│ ---    │
│ i64    │
╞════════╡
│ 0      │
│ -3     │
│ -8     │
│ -15    │
│ -24    │
└────────┘

cut( breaks: Sequence[float], *, labels: Sequence[str] | None = None, left_closed: bool = False, include_breaks: bool = False, ) → Expr[source]

Bin continuous values into discrete categories.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

Parameters:

breaks: List of unique cut points.
labels: Names of the categories. The number of labels must be equal to the number of cut points plus one.
left_closed: Set the intervals to be left-closed instead of right-closed.
include_breaks: Include a column with the right endpoint of the bin each observation falls in. This will change the data type of the output from a Categorical to a Struct.

Returns:

Expr: Expression of data type Categorical if include_breaks is set to False (default), otherwise an expression of data type Struct.

See also

qcut

Examples

Divide a column into three categories.

>>> df = pl.DataFrame({"foo": [-2, -1, 0, 1, 2]})
>>> df.with_columns(
...     pl.col("foo").cut([-1, 1], labels=["a", "b", "c"]).alias("cut")
... )
shape: (5, 2)
┌─────┬─────┐
│ foo ┆ cut │
│ --- ┆ --- │
│ i64 ┆ cat │
╞═════╪═════╡
│ -2  ┆ a   │
│ -1  ┆ a   │
│ 0   ┆ b   │
│ 1   ┆ b   │
│ 2   ┆ c   │
└─────┴─────┘

Add both the category and the breakpoint.

>>> df.with_columns(
...     pl.col("foo").cut([-1, 1], include_breaks=True).alias("cut")
... ).unnest("cut")
shape: (5, 3)
┌─────┬────────────┬────────────┐
│ foo ┆ breakpoint ┆ category   │
│ --- ┆ ---        ┆ ---        │
│ i64 ┆ f64        ┆ cat        │
╞═════╪════════════╪════════════╡
│ -2  ┆ -1.0       ┆ (-inf, -1] │
│ -1  ┆ -1.0       ┆ (-inf, -1] │
│ 0   ┆ 1.0        ┆ (-1, 1]    │
│ 1   ┆ 1.0        ┆ (-1, 1]    │
│ 2   ┆ inf        ┆ (1, inf]   │
└─────┴────────────┴────────────┘

degrees() → Expr[source]

Convert from radians to degrees.

Returns:

Expr: Expression of data type Float64.

Examples

>>> import math
>>> df = pl.DataFrame({"a": [x * math.pi for x in range(-4, 5)]})
>>> df.select(pl.col("a").degrees())
shape: (9, 1)
┌────────┐
│ a      │
│ ---    │
│ f64    │
╞════════╡
│ -720.0 │
│ -540.0 │
│ -360.0 │
│ -180.0 │
│ 0.0    │
│ 180.0  │
│ 360.0  │
│ 540.0  │
│ 720.0  │
└────────┘

classmethod deserialize( source: str | Path | IOBase | bytes, *, format: SerializationFormat = 'binary', ) → Expr[source]

Read a serialized expression from a file.

Parameters:

source

Path to a file or a file-like object (by file-like object, we refer to objects that have a read() method, such as a file handler (e.g. via builtin open function) or BytesIO).

format

The format with which the Expr was serialized. Options:

"binary": Deserialize from binary format (bytes). This is the default.
"json": Deserialize from JSON format (string).

Warning

This function uses pickle if the logical plan contains Python UDFs, and as such inherits the security implications. Deserializing can execute arbitrary code, so it should only be attempted on trusted data.

See also

Expr.meta.serialize

Notes

Serialization is not stable across Polars versions: a LazyFrame serialized in one Polars version may not be deserializable in another Polars version.

Examples

>>> import io
>>> expr = pl.col("foo").sum().over("bar")
>>> bytes = expr.meta.serialize()
>>> pl.Expr.deserialize(io.BytesIO(bytes))
<Expr ['col("foo").sum().over([col("ba…'] at ...>

diff(n: int | IntoExpr = 1, null_behavior: NullBehavior = 'ignore') → Expr[source]

Calculate the first discrete difference between shifted items.

Parameters:

n: Number of slots to shift.
null_behavior{‘ignore’, ‘drop’}: How to handle null values.

Examples

>>> df = pl.DataFrame({"int": [20, 10, 30, 25, 35]})
>>> df.with_columns(change=pl.col("int").diff())
shape: (5, 2)
┌─────┬────────┐
│ int ┆ change │
│ --- ┆ ---    │
│ i64 ┆ i64    │
╞═════╪════════╡
│ 20  ┆ null   │
│ 10  ┆ -10    │
│ 30  ┆ 20     │
│ 25  ┆ -5     │
│ 35  ┆ 10     │
└─────┴────────┘

>>> df.with_columns(change=pl.col("int").diff(n=2))
shape: (5, 2)
┌─────┬────────┐
│ int ┆ change │
│ --- ┆ ---    │
│ i64 ┆ i64    │
╞═════╪════════╡
│ 20  ┆ null   │
│ 10  ┆ null   │
│ 30  ┆ 10     │
│ 25  ┆ 15     │
│ 35  ┆ 5      │
└─────┴────────┘

>>> df.select(pl.col("int").diff(n=2, null_behavior="drop").alias("diff"))
shape: (3, 1)
┌──────┐
│ diff │
│ ---  │
│ i64  │
╞══════╡
│ 10   │
│ 15   │
│ 5    │
└──────┘

dot(other: Expr | str) → Expr[source]

Compute the dot/inner product between two Expressions.

Parameters:

other: Expression to compute dot product with.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 3, 5],
...         "b": [2, 4, 6],
...     }
... )
>>> df.select(pl.col("a").dot(pl.col("b")))
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 44  │
└─────┘

drop_nans() → Expr[source]

Drop all floating point NaN values.

The original order of the remaining elements is preserved.

See also

drop_nulls

Notes

A NaN value is not the same as a null value. To drop null values, use drop_nulls().

Examples

>>> df = pl.DataFrame({"a": [1.0, None, 3.0, float("nan")]})
>>> df.select(pl.col("a").drop_nans())
shape: (3, 1)
┌──────┐
│ a    │
│ ---  │
│ f64  │
╞══════╡
│ 1.0  │
│ null │
│ 3.0  │
└──────┘

drop_nulls() → Expr[source]

Drop all null values.

The original order of the remaining elements is preserved.

See also

drop_nans

Notes

A null value is not the same as a NaN value. To drop NaN values, use drop_nans().

Examples

>>> df = pl.DataFrame({"a": [1.0, None, 3.0, float("nan")]})
>>> df.select(pl.col("a").drop_nulls())
shape: (3, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 1.0 │
│ 3.0 │
│ NaN │
└─────┘

entropy( base: float = 2.718281828459045, *, normalize: bool = True, ) → Expr[source]

Computes the entropy.

Uses the formula -sum(pk * log(pk)) where pk are discrete probabilities.

Parameters:

base: Given base, defaults to e
normalize: Normalize pk if it doesn’t sum to 1.

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3]})
>>> df.select(pl.col("a").entropy(base=2))
shape: (1, 1)
┌──────────┐
│ a        │
│ ---      │
│ f64      │
╞══════════╡
│ 1.459148 │
└──────────┘
>>> df.select(pl.col("a").entropy(base=2, normalize=False))
shape: (1, 1)
┌───────────┐
│ a         │
│ ---       │
│ f64       │
╞═══════════╡
│ -6.754888 │
└───────────┘

eq(other: Any) → Expr[source]

Method equivalent of equality operator expr == other.

Parameters:

other: A literal or expression value to compare with.

Examples

>>> df = pl.DataFrame(
...     data={
...         "x": [1.0, 2.0, float("nan"), 4.0],
...         "y": [2.0, 2.0, float("nan"), 4.0],
...     }
... )
>>> df.with_columns(
...     pl.col("x").eq(pl.col("y")).alias("x == y"),
... )
shape: (4, 3)
┌─────┬─────┬────────┐
│ x   ┆ y   ┆ x == y │
│ --- ┆ --- ┆ ---    │
│ f64 ┆ f64 ┆ bool   │
╞═════╪═════╪════════╡
│ 1.0 ┆ 2.0 ┆ false  │
│ 2.0 ┆ 2.0 ┆ true   │
│ NaN ┆ NaN ┆ true   │
│ 4.0 ┆ 4.0 ┆ true   │
└─────┴─────┴────────┘

eq_missing(other: Any) → Expr[source]

Method equivalent of equality operator expr == other where None == None.

This differs from default eq where null values are propagated.

Parameters:

other: A literal or expression value to compare with.

Examples

>>> df = pl.DataFrame(
...     data={
...         "x": [1.0, 2.0, float("nan"), 4.0, None, None],
...         "y": [2.0, 2.0, float("nan"), 4.0, 5.0, None],
...     }
... )
>>> df.with_columns(
...     pl.col("x").eq(pl.col("y")).alias("x eq y"),
...     pl.col("x").eq_missing(pl.col("y")).alias("x eq_missing y"),
... )
shape: (6, 4)
┌──────┬──────┬────────┬────────────────┐
│ x    ┆ y    ┆ x eq y ┆ x eq_missing y │
│ ---  ┆ ---  ┆ ---    ┆ ---            │
│ f64  ┆ f64  ┆ bool   ┆ bool           │
╞══════╪══════╪════════╪════════════════╡
│ 1.0  ┆ 2.0  ┆ false  ┆ false          │
│ 2.0  ┆ 2.0  ┆ true   ┆ true           │
│ NaN  ┆ NaN  ┆ true   ┆ true           │
│ 4.0  ┆ 4.0  ┆ true   ┆ true           │
│ null ┆ 5.0  ┆ null   ┆ false          │
│ null ┆ null ┆ null   ┆ true           │
└──────┴──────┴────────┴────────────────┘

ewm_mean( *, com: float | None = None, span: float | None = None, half_life: float | None = None, alpha: float | None = None, adjust: bool = True, min_samples: int = 1, ignore_nulls: bool = False, ) → Expr[source]

Compute exponentially-weighted moving average.

Changed in version 1.21.0: The min_periods parameter was renamed min_samples.

Parameters:

com

Specify decay in terms of center of mass, $\gamma$, with

\[\alpha = \frac{1}{1 + \gamma} \; \forall \; \gamma \geq 0\]

span

Specify decay in terms of span, $\theta$, with

\[\alpha = \frac{2}{\theta + 1} \; \forall \; \theta \geq 1\]

half_life

Specify decay in terms of half-life, $\tau$, with

\[\alpha = 1 - \exp \left\{ \frac{ -\ln(2) }{ \tau } \right\} \; \forall \; \tau > 0\]

alpha

Specify smoothing factor alpha directly, $0 < \alpha \leq 1$.

adjust

Divide by decaying adjustment factor in beginning periods to account for imbalance in relative weightings

When adjust=True (the default) the EW function is calculated using weights $w_i = (1 - \alpha)^i$

When adjust=False the EW function is calculated recursively by

\[\begin{split}y_0 &= x_0 \\ y_t &= (1 - \alpha)y_{t - 1} + \alpha x_t\end{split}\]

min_samples

Minimum number of observations in window required to have a value (otherwise result is null).

ignore_nulls

Ignore missing values when calculating weights.

When ignore_nulls=False (default), weights are based on absolute positions. For example, the weights of $x_0$ and $x_2$ used in calculating the final weighted average of [$x_0$, None, $x_2$] are $(1-\alpha)^2$ and $1$ if adjust=True, and $(1-\alpha)^2$ and $\alpha$ if adjust=False.

When ignore_nulls=True, weights are based on relative positions. For example, the weights of $x_0$ and $x_2$ used in calculating the final weighted average of [$x_0$, None, $x_2$] are $1-\alpha$ and $1$ if adjust=True, and $1-\alpha$ and $\alpha$ if adjust=False.

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3]})
>>> df.select(pl.col("a").ewm_mean(com=1, ignore_nulls=False))
shape: (3, 1)
┌──────────┐
│ a        │
│ ---      │
│ f64      │
╞══════════╡
│ 1.0      │
│ 1.666667 │
│ 2.428571 │
└──────────┘

ewm_mean_by(by: str | IntoExpr, *, half_life: str | timedelta) → Expr[source]

Compute time-based exponentially weighted moving average.

Given observations $x_0, x_1, \ldots, x_{n-1}$ at times $t_0, t_1, \ldots, t_{n-1}$, the EWMA is calculated as

\[ \begin{align}\begin{aligned}y_0 &= x_0\\\alpha_i &= 1 - \exp \left\{ \frac{ -\ln(2)(t_i-t_{i-1}) } { \tau } \right\}\\y_i &= \alpha_i x_i + (1 - \alpha_i) y_{i-1}; \quad i > 0\end{aligned}\end{align} \]

where $\tau$ is the half_life.

Parameters:

by

Times to calculate average by. Should be DateTime, Date, UInt64, UInt32, Int64, or Int32 data type.

half_life

Unit over which observation decays to half its value.

Can be created either from a timedelta, or by using the following string language:

1ns (1 nanosecond)
1us (1 microsecond)
1ms (1 millisecond)
1s (1 second)
1m (1 minute)
1h (1 hour)
1d (1 day)
1w (1 week)
1i (1 index count)

Or combine them: “3d12h4m25s” # 3 days, 12 hours, 4 minutes, and 25 seconds

Note that half_life is treated as a constant duration - calendar durations such as months (or even days in the time-zone-aware case) are not supported, please express your duration in an approximately equivalent number of hours (e.g. ‘370h’ instead of ‘1mo’).

Returns:

Expr: Float32 if input is Float32, otherwise Float64.

Examples

>>> from datetime import date, timedelta
>>> df = pl.DataFrame(
...     {
...         "values": [0, 1, 2, None, 4],
...         "times": [
...             date(2020, 1, 1),
...             date(2020, 1, 3),
...             date(2020, 1, 10),
...             date(2020, 1, 15),
...             date(2020, 1, 17),
...         ],
...     }
... ).sort("times")
>>> df.with_columns(
...     result=pl.col("values").ewm_mean_by("times", half_life="4d"),
... )
shape: (5, 3)
┌────────┬────────────┬──────────┐
│ values ┆ times      ┆ result   │
│ ---    ┆ ---        ┆ ---      │
│ i64    ┆ date       ┆ f64      │
╞════════╪════════════╪══════════╡
│ 0      ┆ 2020-01-01 ┆ 0.0      │
│ 1      ┆ 2020-01-03 ┆ 0.292893 │
│ 2      ┆ 2020-01-10 ┆ 1.492474 │
│ null   ┆ 2020-01-15 ┆ null     │
│ 4      ┆ 2020-01-17 ┆ 3.254508 │
└────────┴────────────┴──────────┘

ewm_std( *, com: float | None = None, span: float | None = None, half_life: float | None = None, alpha: float | None = None, adjust: bool = True, bias: bool = False, min_samples: int = 1, ignore_nulls: bool = False, ) → Expr[source]

Compute exponentially-weighted moving standard deviation.

Changed in version 1.21.0: The min_periods parameter was renamed min_samples.

Parameters:

com

Specify decay in terms of center of mass, $\gamma$, with

\[\alpha = \frac{1}{1 + \gamma} \; \forall \; \gamma \geq 0\]

span

Specify decay in terms of span, $\theta$, with

\[\alpha = \frac{2}{\theta + 1} \; \forall \; \theta \geq 1\]

half_life

Specify decay in terms of half-life, $\lambda$, with

\[\alpha = 1 - \exp \left\{ \frac{ -\ln(2) }{ \lambda } \right\} \; \forall \; \lambda > 0\]

alpha

Specify smoothing factor alpha directly, $0 < \alpha \leq 1$.

adjust

Divide by decaying adjustment factor in beginning periods to account for imbalance in relative weightings

When adjust=True (the default) the EW function is calculated using weights $w_i = (1 - \alpha)^i$

When adjust=False the EW function is calculated recursively by

\[\begin{split}y_0 &= x_0 \\ y_t &= (1 - \alpha)y_{t - 1} + \alpha x_t\end{split}\]

bias

When bias=False, apply a correction to make the estimate statistically unbiased.

min_samples

Minimum number of observations in window required to have a value (otherwise result is null).

ignore_nulls

Ignore missing values when calculating weights.

When ignore_nulls=False (default), weights are based on absolute positions. For example, the weights of $x_0$ and $x_2$ used in calculating the final weighted average of [$x_0$, None, $x_2$] are $(1-\alpha)^2$ and $1$ if adjust=True, and $(1-\alpha)^2$ and $\alpha$ if adjust=False.

When ignore_nulls=True, weights are based on relative positions. For example, the weights of $x_0$ and $x_2$ used in calculating the final weighted average of [$x_0$, None, $x_2$] are $1-\alpha$ and $1$ if adjust=True, and $1-\alpha$ and $\alpha$ if adjust=False.

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3]})
>>> df.select(pl.col("a").ewm_std(com=1, ignore_nulls=False))
shape: (3, 1)
┌──────────┐
│ a        │
│ ---      │
│ f64      │
╞══════════╡
│ 0.0      │
│ 0.707107 │
│ 0.963624 │
└──────────┘

ewm_var( *, com: float | None = None, span: float | None = None, half_life: float | None = None, alpha: float | None = None, adjust: bool = True, bias: bool = False, min_samples: int = 1, ignore_nulls: bool = False, ) → Expr[source]

Compute exponentially-weighted moving variance.

Changed in version 1.21.0: The min_periods parameter was renamed min_samples.

Parameters:

com

Specify decay in terms of center of mass, $\gamma$, with

\[\alpha = \frac{1}{1 + \gamma} \; \forall \; \gamma \geq 0\]

span

Specify decay in terms of span, $\theta$, with

\[\alpha = \frac{2}{\theta + 1} \; \forall \; \theta \geq 1\]

half_life

Specify decay in terms of half-life, $\lambda$, with

\[\alpha = 1 - \exp \left\{ \frac{ -\ln(2) }{ \lambda } \right\} \; \forall \; \lambda > 0\]

alpha

Specify smoothing factor alpha directly, $0 < \alpha \leq 1$.

adjust

Divide by decaying adjustment factor in beginning periods to account for imbalance in relative weightings

When adjust=True (the default) the EW function is calculated using weights $w_i = (1 - \alpha)^i$

When adjust=False the EW function is calculated recursively by

\[\begin{split}y_0 &= x_0 \\ y_t &= (1 - \alpha)y_{t - 1} + \alpha x_t\end{split}\]

bias

When bias=False, apply a correction to make the estimate statistically unbiased.

min_samples

Minimum number of observations in window required to have a value (otherwise result is null).

ignore_nulls

Ignore missing values when calculating weights.

When ignore_nulls=False (default), weights are based on absolute positions. For example, the weights of $x_0$ and $x_2$ used in calculating the final weighted average of [$x_0$, None, $x_2$] are $(1-\alpha)^2$ and $1$ if adjust=True, and $(1-\alpha)^2$ and $\alpha$ if adjust=False.

When ignore_nulls=True, weights are based on relative positions. For example, the weights of $x_0$ and $x_2$ used in calculating the final weighted average of [$x_0$, None, $x_2$] are $1-\alpha$ and $1$ if adjust=True, and $1-\alpha$ and $\alpha$ if adjust=False.

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3]})
>>> df.select(pl.col("a").ewm_var(com=1, ignore_nulls=False))
shape: (3, 1)
┌──────────┐
│ a        │
│ ---      │
│ f64      │
╞══════════╡
│ 0.0      │
│ 0.5      │
│ 0.928571 │
└──────────┘

exclude( columns: str | PolarsDataType | Collection[str] | Collection[PolarsDataType], *more_columns: str | PolarsDataType, ) → Expr[source]

Exclude columns from a multi-column expression.

Only works after a wildcard or regex column selection, and you cannot provide both string column names and dtypes (you may prefer to use selectors instead).

Parameters:

columns: The name or datatype of the column(s) to exclude. Accepts regular expression input. Regular expressions should start with ^ and end with $.
*more_columns: Additional names or datatypes of columns to exclude, specified as positional arguments.

Examples

>>> df = pl.DataFrame(
...     {
...         "aa": [1, 2, 3],
...         "ba": ["a", "b", None],
...         "cc": [None, 2.5, 1.5],
...     }
... )
>>> df
shape: (3, 3)
┌─────┬──────┬──────┐
│ aa  ┆ ba   ┆ cc   │
│ --- ┆ ---  ┆ ---  │
│ i64 ┆ str  ┆ f64  │
╞═════╪══════╪══════╡
│ 1   ┆ a    ┆ null │
│ 2   ┆ b    ┆ 2.5  │
│ 3   ┆ null ┆ 1.5  │
└─────┴──────┴──────┘

Exclude by column name(s):

>>> df.select(pl.all().exclude("ba"))
shape: (3, 2)
┌─────┬──────┐
│ aa  ┆ cc   │
│ --- ┆ ---  │
│ i64 ┆ f64  │
╞═════╪══════╡
│ 1   ┆ null │
│ 2   ┆ 2.5  │
│ 3   ┆ 1.5  │
└─────┴──────┘

Exclude by regex, e.g. removing all columns whose names end with the letter “a”:

>>> df.select(pl.all().exclude("^.*a$"))
shape: (3, 1)
┌──────┐
│ cc   │
│ ---  │
│ f64  │
╞══════╡
│ null │
│ 2.5  │
│ 1.5  │
└──────┘

Exclude by dtype(s), e.g. removing all columns of type Int64 or Float64:

>>> df.select(pl.all().exclude([pl.Int64, pl.Float64]))
shape: (3, 1)
┌──────┐
│ ba   │
│ ---  │
│ str  │
╞══════╡
│ a    │
│ b    │
│ null │
└──────┘

exp() → Expr[source]

Compute the exponential, element-wise.

Examples

>>> df = pl.DataFrame({"values": [1.0, 2.0, 4.0]})
>>> df.select(pl.col("values").exp())
shape: (3, 1)
┌──────────┐
│ values   │
│ ---      │
│ f64      │
╞══════════╡
│ 2.718282 │
│ 7.389056 │
│ 54.59815 │
└──────────┘

explode() → Expr[source]

Explode a list expression.

This means that every item is expanded to a new row.

Returns:

Expr: Expression with the data type of the list elements.

See also

Expr.list.explode: Explode a list column.

Examples

>>> df = pl.DataFrame(
...     {
...         "group": ["a", "b"],
...         "values": [
...             [1, 2],
...             [3, 4],
...         ],
...     }
... )
>>> df.select(pl.col("values").explode())
shape: (4, 1)
┌────────┐
│ values │
│ ---    │
│ i64    │
╞════════╡
│ 1      │
│ 2      │
│ 3      │
│ 4      │
└────────┘

extend_constant(value: IntoExpr, n: int | IntoExprColumn) → Expr[source]

Extremely fast method for extending the Series with ‘n’ copies of a value.

Parameters:

value: A constant literal value or a unit expression with which to extend the expression result Series; can pass None to extend with nulls.
n: The number of additional values that will be added.

Examples

>>> df = pl.DataFrame({"values": [1, 2, 3]})
>>> df.select((pl.col("values") - 1).extend_constant(99, n=2))
shape: (5, 1)
┌────────┐
│ values │
│ ---    │
│ i64    │
╞════════╡
│ 0      │
│ 1      │
│ 2      │
│ 99     │
│ 99     │
└────────┘

fill_nan( value: int | float | Expr | None, ) → Expr[source]

Fill floating point NaN value with a fill value.

Parameters:

value: Value used to fill NaN values.

See also

fill_null

Notes

A NaN value is not the same as a null value. To fill null values, use fill_null().

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1.0, None, float("nan")],
...         "b": [4.0, float("nan"), 6],
...     }
... )
>>> df.with_columns(pl.col("b").fill_nan(0))
shape: (3, 2)
┌──────┬─────┐
│ a    ┆ b   │
│ ---  ┆ --- │
│ f64  ┆ f64 │
╞══════╪═════╡
│ 1.0  ┆ 4.0 │
│ null ┆ 0.0 │
│ NaN  ┆ 6.0 │
└──────┴─────┘

fill_null( value: Any | Expr | None = None, strategy: FillNullStrategy | None = None, limit: int | None = None, ) → Expr[source]

Fill null values using the specified value or strategy.

To interpolate over null values see interpolate. See the examples below to fill nulls with an expression.

Parameters:

value: Value used to fill null values.
strategy{None, ‘forward’, ‘backward’, ‘min’, ‘max’, ‘mean’, ‘zero’, ‘one’}: Strategy used to fill null values.
limit: Number of consecutive null values to fill when using the ‘forward’ or ‘backward’ strategy.

See also

backward_fill
fill_nan
forward_fill

Notes

A null value is not the same as a NaN value. To fill NaN values, use fill_nan().

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 2, None],
...         "b": [4, None, 6],
...     }
... )
>>> df.with_columns(pl.col("b").fill_null(strategy="zero"))
shape: (3, 2)
┌──────┬─────┐
│ a    ┆ b   │
│ ---  ┆ --- │
│ i64  ┆ i64 │
╞══════╪═════╡
│ 1    ┆ 4   │
│ 2    ┆ 0   │
│ null ┆ 6   │
└──────┴─────┘
>>> df.with_columns(pl.col("b").fill_null(99))
shape: (3, 2)
┌──────┬─────┐
│ a    ┆ b   │
│ ---  ┆ --- │
│ i64  ┆ i64 │
╞══════╪═════╡
│ 1    ┆ 4   │
│ 2    ┆ 99  │
│ null ┆ 6   │
└──────┴─────┘
>>> df.with_columns(pl.col("b").fill_null(strategy="forward"))
shape: (3, 2)
┌──────┬─────┐
│ a    ┆ b   │
│ ---  ┆ --- │
│ i64  ┆ i64 │
╞══════╪═════╡
│ 1    ┆ 4   │
│ 2    ┆ 4   │
│ null ┆ 6   │
└──────┴─────┘
>>> df.with_columns(pl.col("b").fill_null(pl.col("b").median()))
shape: (3, 2)
┌──────┬─────┐
│ a    ┆ b   │
│ ---  ┆ --- │
│ i64  ┆ f64 │
╞══════╪═════╡
│ 1    ┆ 4.0 │
│ 2    ┆ 5.0 │
│ null ┆ 6.0 │
└──────┴─────┘
>>> df.with_columns(pl.all().fill_null(pl.all().median()))
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞═════╪═════╡
│ 1.0 ┆ 4.0 │
│ 2.0 ┆ 5.0 │
│ 1.5 ┆ 6.0 │
└─────┴─────┘

filter(

*predicates: IntoExprColumn | Iterable[IntoExprColumn],

**constraints: Any,

) → Expr[source]

Filter the expression based on one or more predicate expressions.

The original order of the remaining elements is preserved.

Elements where the filter does not evaluate to True are discarded, including nulls.

Mostly useful in an aggregation context. If you want to filter on a DataFrame level, use LazyFrame.filter.

Parameters:

predicates: Expression(s) that evaluates to a boolean Series.
constraints: Column filters; use name = value to filter columns by the supplied value. Each constraint will behave the same as pl.col(name).eq(value), and be implicitly joined with the other filter conditions using &.

Examples

>>> df = pl.DataFrame(
...     {
...         "group_col": ["g1", "g1", "g2"],
...         "b": [1, 2, 3],
...     }
... )
>>> df.group_by("group_col").agg(
...     lt=pl.col("b").filter(pl.col("b") < 2).sum(),
...     gte=pl.col("b").filter(pl.col("b") >= 2).sum(),
... ).sort("group_col")
shape: (2, 3)
┌───────────┬─────┬─────┐
│ group_col ┆ lt  ┆ gte │
│ ---       ┆ --- ┆ --- │
│ str       ┆ i64 ┆ i64 │
╞═══════════╪═════╪═════╡
│ g1        ┆ 1   ┆ 2   │
│ g2        ┆ 0   ┆ 3   │
└───────────┴─────┴─────┘

Filter expressions can also take constraints as keyword arguments.

>>> df = pl.DataFrame(
...     {
...         "key": ["a", "a", "a", "a", "b", "b", "b", "b", "b"],
...         "n": [1, 2, 2, 3, 1, 3, 3, 2, 3],
...     },
... )
>>> df.group_by("key").agg(
...     n_1=pl.col("n").filter(n=1).sum(),
...     n_2=pl.col("n").filter(n=2).sum(),
...     n_3=pl.col("n").filter(n=3).sum(),
... ).sort(by="key")
shape: (2, 4)
┌─────┬─────┬─────┬─────┐
│ key ┆ n_1 ┆ n_2 ┆ n_3 │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪═════╡
│ a   ┆ 1   ┆ 4   ┆ 3   │
│ b   ┆ 1   ┆ 2   ┆ 9   │
└─────┴─────┴─────┴─────┘

first() → Expr[source]

Get the first value.

Examples

>>> df = pl.DataFrame({"a": [1, 1, 2]})
>>> df.select(pl.col("a").first())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 1   │
└─────┘

flatten() → Expr[source]

Flatten a list or string column.

Alias for Expr.list.explode().

Examples

>>> df = pl.DataFrame(
...     {
...         "group": ["a", "b", "b"],
...         "values": [[1, 2], [2, 3], [4]],
...     }
... )
>>> df.group_by("group").agg(pl.col("values").flatten())  
shape: (2, 2)
┌───────┬───────────┐
│ group ┆ values    │
│ ---   ┆ ---       │
│ str   ┆ list[i64] │
╞═══════╪═══════════╡
│ a     ┆ [1, 2]    │
│ b     ┆ [2, 3, 4] │
└───────┴───────────┘

floor() → Expr[source]

Rounds down to the nearest integer value.

Only works on floating point Series.

Examples

>>> df = pl.DataFrame({"a": [0.3, 0.5, 1.0, 1.1]})
>>> df.select(pl.col("a").floor())
shape: (4, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 0.0 │
│ 0.0 │
│ 1.0 │
│ 1.0 │
└─────┘

floordiv(other: Any) → Expr[source]

Method equivalent of integer division operator expr // other.

Parameters:

other: Numeric literal or expression value.

See also

truediv

Examples

>>> df = pl.DataFrame({"x": [1, 2, 3, 4, 5]})
>>> df.with_columns(
...     pl.col("x").truediv(2).alias("x/2"),
...     pl.col("x").floordiv(2).alias("x//2"),
... )
shape: (5, 3)
┌─────┬─────┬──────┐
│ x   ┆ x/2 ┆ x//2 │
│ --- ┆ --- ┆ ---  │
│ i64 ┆ f64 ┆ i64  │
╞═════╪═════╪══════╡
│ 1   ┆ 0.5 ┆ 0    │
│ 2   ┆ 1.0 ┆ 1    │
│ 3   ┆ 1.5 ┆ 1    │
│ 4   ┆ 2.0 ┆ 2    │
│ 5   ┆ 2.5 ┆ 2    │
└─────┴─────┴──────┘

Note that Polars’ floordiv is subtly different from Python’s floor division. For example, consider 6.0 floor-divided by 0.1. Python gives:

>>> 6.0 // 0.1
59.0

because 0.1 is not represented internally as that exact value, but a slightly larger value. So the result of the division is slightly less than 60, meaning the flooring operation returns 59.0.

Polars instead first does the floating-point division, resulting in a floating-point value of 60.0, and then performs the flooring operation using floor:

>>> df = pl.DataFrame({"x": [6.0, 6.03]})
>>> df.with_columns(
...     pl.col("x").truediv(0.1).alias("x/0.1"),
... ).with_columns(
...     pl.col("x/0.1").floor().alias("x/0.1 floor"),
... )
shape: (2, 3)
┌──────┬───────┬─────────────┐
│ x    ┆ x/0.1 ┆ x/0.1 floor │
│ ---  ┆ ---   ┆ ---         │
│ f64  ┆ f64   ┆ f64         │
╞══════╪═══════╪═════════════╡
│ 6.0  ┆ 60.0  ┆ 60.0        │
│ 6.03 ┆ 60.3  ┆ 60.0        │
└──────┴───────┴─────────────┘

yielding the more intuitive result 60.0. The row with x = 6.03 is included to demonstrate the effect of the flooring operation.

floordiv combines those two steps to give the same result with one expression:

>>> df.with_columns(
...     pl.col("x").floordiv(0.1).alias("x//0.1"),
... )
shape: (2, 2)
┌──────┬────────┐
│ x    ┆ x//0.1 │
│ ---  ┆ ---    │
│ f64  ┆ f64    │
╞══════╪════════╡
│ 6.0  ┆ 60.0   │
│ 6.03 ┆ 60.0   │
└──────┴────────┘

forward_fill(limit: int | None = None) → Expr[source]

Fill missing values with the last non-null value.

This is an alias of .fill_null(strategy="forward").

Parameters:

limit: The number of consecutive null values to forward fill.

See also

backward_fill
fill_null
shift

classmethod from_json(value: str) → Expr[source]

Read an expression from a JSON encoded string to construct an Expression.

Deprecated since version 0.20.11: This method has been renamed to deserialize(). Note that the new method operates on file-like inputs rather than strings. Enclose your input in io.StringIO to keep the same behavior.

Parameters:

value: JSON encoded string value

gather( indices: int | Sequence[int] | IntoExpr | Series | np.ndarray[Any, Any], ) → Expr[source]

Take values by index.

Parameters:

indices: An expression that leads to a UInt32 dtyped Series.

Returns:

Expr: Expression of the same data type.

See also

Expr.get: Take a single value

Examples

>>> df = pl.DataFrame(
...     {
...         "group": [
...             "one",
...             "one",
...             "one",
...             "two",
...             "two",
...             "two",
...         ],
...         "value": [1, 98, 2, 3, 99, 4],
...     }
... )
>>> df.group_by("group", maintain_order=True).agg(
...     pl.col("value").gather([2, 1])
... )
shape: (2, 2)
┌───────┬───────────┐
│ group ┆ value     │
│ ---   ┆ ---       │
│ str   ┆ list[i64] │
╞═══════╪═══════════╡
│ one   ┆ [2, 98]   │
│ two   ┆ [4, 99]   │
└───────┴───────────┘

gather_every(n: int, offset: int = 0) → Expr[source]

Take every nth value in the Series and return as a new Series.

Parameters:

n: Gather every n-th row.
offset: Starting index.

Examples

>>> df = pl.DataFrame({"foo": [1, 2, 3, 4, 5, 6, 7, 8, 9]})
>>> df.select(pl.col("foo").gather_every(3))
shape: (3, 1)
┌─────┐
│ foo │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 4   │
│ 7   │
└─────┘

>>> df.select(pl.col("foo").gather_every(3, offset=1))
shape: (3, 1)
┌─────┐
│ foo │
│ --- │
│ i64 │
╞═════╡
│ 2   │
│ 5   │
│ 8   │
└─────┘

ge(other: Any) → Expr[source]

Method equivalent of “greater than or equal” operator expr >= other.

Parameters:

other: A literal or expression value to compare with.

Examples

>>> df = pl.DataFrame(
...     data={
...         "x": [5.0, 4.0, float("nan"), 2.0],
...         "y": [5.0, 3.0, float("nan"), 1.0],
...     }
... )
>>> df.with_columns(
...     pl.col("x").ge(pl.col("y")).alias("x >= y"),
... )
shape: (4, 3)
┌─────┬─────┬────────┐
│ x   ┆ y   ┆ x >= y │
│ --- ┆ --- ┆ ---    │
│ f64 ┆ f64 ┆ bool   │
╞═════╪═════╪════════╡
│ 5.0 ┆ 5.0 ┆ true   │
│ 4.0 ┆ 3.0 ┆ true   │
│ NaN ┆ NaN ┆ true   │
│ 2.0 ┆ 1.0 ┆ true   │
└─────┴─────┴────────┘

get(index: int | Expr) → Expr[source]

Return a single value by index.

Parameters:

index: An expression that leads to a UInt32 index.

Returns:

Expr: Expression of the same data type.

Examples

>>> df = pl.DataFrame(
...     {
...         "group": [
...             "one",
...             "one",
...             "one",
...             "two",
...             "two",
...             "two",
...         ],
...         "value": [1, 98, 2, 3, 99, 4],
...     }
... )
>>> df.group_by("group", maintain_order=True).agg(pl.col("value").get(1))
shape: (2, 2)
┌───────┬───────┐
│ group ┆ value │
│ ---   ┆ ---   │
│ str   ┆ i64   │
╞═══════╪═══════╡
│ one   ┆ 98    │
│ two   ┆ 99    │
└───────┴───────┘

gt(other: Any) → Expr[source]

Method equivalent of “greater than” operator expr > other.

Parameters:

other: A literal or expression value to compare with.

Examples

>>> df = pl.DataFrame(
...     data={
...         "x": [5.0, 4.0, float("nan"), 2.0],
...         "y": [5.0, 3.0, float("nan"), 1.0],
...     }
... )
>>> df.with_columns(
...     pl.col("x").gt(pl.col("y")).alias("x > y"),
... )
shape: (4, 3)
┌─────┬─────┬───────┐
│ x   ┆ y   ┆ x > y │
│ --- ┆ --- ┆ ---   │
│ f64 ┆ f64 ┆ bool  │
╞═════╪═════╪═══════╡
│ 5.0 ┆ 5.0 ┆ false │
│ 4.0 ┆ 3.0 ┆ true  │
│ NaN ┆ NaN ┆ false │
│ 2.0 ┆ 1.0 ┆ true  │
└─────┴─────┴───────┘

has_nulls() → Expr[source]

Check whether the expression contains one or more null values.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [None, 1, None],
...         "b": [10, None, 300],
...         "c": [350, 650, 850],
...     }
... )
>>> df.select(pl.all().has_nulls())
shape: (1, 3)
┌──────┬──────┬───────┐
│ a    ┆ b    ┆ c     │
│ ---  ┆ ---  ┆ ---   │
│ bool ┆ bool ┆ bool  │
╞══════╪══════╪═══════╡
│ true ┆ true ┆ false │
└──────┴──────┴───────┘

hash( seed: int = 0, seed_1: int | None = None, seed_2: int | None = None, seed_3: int | None = None, ) → Expr[source]

Hash the elements in the selection.

The hash value is of type UInt64.

Parameters:

seed: Random seed parameter. Defaults to 0.
seed_1: Random seed parameter. Defaults to seed if not set.
seed_2: Random seed parameter. Defaults to seed if not set.
seed_3: Random seed parameter. Defaults to seed if not set.

Notes

This implementation of hash does not guarantee stable results across different Polars versions. Its stability is only guaranteed within a single version.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 2, None],
...         "b": ["x", None, "z"],
...     }
... )
>>> df.with_columns(pl.all().hash(10, 20, 30, 40))  
shape: (3, 2)
┌──────────────────────┬──────────────────────┐
│ a                    ┆ b                    │
│ ---                  ┆ ---                  │
│ u64                  ┆ u64                  │
╞══════════════════════╪══════════════════════╡
│ 9774092659964970114  ┆ 13614470193936745724 │
│ 1101441246220388612  ┆ 11638928888656214026 │
│ 11638928888656214026 ┆ 13382926553367784577 │
└──────────────────────┴──────────────────────┘

head(n: int | Expr = 10) → Expr[source]

Get the first n rows.

Parameters:

n: Number of rows to return.

Examples

>>> df = pl.DataFrame({"foo": [1, 2, 3, 4, 5, 6, 7]})
>>> df.select(pl.col("foo").head(3))
shape: (3, 1)
┌─────┐
│ foo │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
└─────┘

hist( bins: IntoExpr | None = None, *, bin_count: int | None = None, include_category: bool = False, include_breakpoint: bool = False, ) → Expr[source]

Bin values into buckets and count their occurrences.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

Parameters:

bins: Bin edges. If None given, we determine the edges based on the data.
bin_count: If bins is not provided, bin_count uniform bins are created that fully encompass the data.
include_breakpoint: Include a column that indicates the upper breakpoint.
include_category: Include a column that shows the intervals as categories.

Returns:

DataFrame

Examples

>>> df = pl.DataFrame({"a": [1, 3, 8, 8, 2, 1, 3]})
>>> df.select(pl.col("a").hist(bins=[1, 2, 3]))
shape: (2, 1)
┌─────┐
│ a   │
│ --- │
│ u32 │
╞═════╡
│ 3   │
│ 2   │
└─────┘
>>> df.select(
...     pl.col("a").hist(
...         bins=[1, 2, 3], include_breakpoint=True, include_category=True
...     )
... )
shape: (2, 1)
┌──────────────────────┐
│ a                    │
│ ---                  │
│ struct[3]            │
╞══════════════════════╡
│ {2.0,"[1.0, 2.0]",3} │
│ {3.0,"(2.0, 3.0]",2} │
└──────────────────────┘

implode() → Expr[source]

Aggregate values into a list.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 2, 3],
...         "b": [4, 5, 6],
...     }
... )
>>> df.select(pl.all().implode())
shape: (1, 2)
┌───────────┬───────────┐
│ a         ┆ b         │
│ ---       ┆ ---       │
│ list[i64] ┆ list[i64] │
╞═══════════╪═══════════╡
│ [1, 2, 3] ┆ [4, 5, 6] │
└───────────┴───────────┘

index_of(element: IntoExpr) → Expr[source]

Get the index of the first occurrence of a value, or None if it’s not found.

Parameters:

element: Value to find.

Examples

>>> df = pl.DataFrame({"a": [1, None, 17]})
>>> df.select(
...     [
...         pl.col("a").index_of(17).alias("seventeen"),
...         pl.col("a").index_of(None).alias("null"),
...         pl.col("a").index_of(55).alias("fiftyfive"),
...     ]
... )
shape: (1, 3)
┌───────────┬──────┬───────────┐
│ seventeen ┆ null ┆ fiftyfive │
│ ---       ┆ ---  ┆ ---       │
│ u32       ┆ u32  ┆ u32       │
╞═══════════╪══════╪═══════════╡
│ 2         ┆ 1    ┆ null      │
└───────────┴──────┴───────────┘

inspect(fmt: str = '{}') → Expr[source]

Print the value that this expression evaluates to and pass on the value.

Examples

>>> df = pl.DataFrame({"foo": [1, 1, 2]})
>>> df.select(pl.col("foo").cum_sum().inspect("value is: {}").alias("bar"))
value is: shape: (3,)
Series: 'foo' [i64]
[
    1
    2
    4
]
shape: (3, 1)
┌─────┐
│ bar │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
│ 4   │
└─────┘

interpolate(method: InterpolationMethod = 'linear') → Expr[source]

Interpolate intermediate values.

Nulls at the beginning and end of the series remain null.

Parameters:

method{‘linear’, ‘nearest’}: Interpolation method.

Examples

Fill null values using linear interpolation.

>>> df = pl.DataFrame(
...     {
...         "a": [1, None, 3],
...         "b": [1.0, float("nan"), 3.0],
...     }
... )
>>> df.select(pl.all().interpolate())
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞═════╪═════╡
│ 1.0 ┆ 1.0 │
│ 2.0 ┆ NaN │
│ 3.0 ┆ 3.0 │
└─────┴─────┘

Fill null values using nearest interpolation.

>>> df.select(pl.all().interpolate("nearest"))
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═════╪═════╡
│ 1   ┆ 1.0 │
│ 3   ┆ NaN │
│ 3   ┆ 3.0 │
└─────┴─────┘

Regrid data to a new grid.

>>> df_original_grid = pl.DataFrame(
...     {
...         "grid_points": [1, 3, 10],
...         "values": [2.0, 6.0, 20.0],
...     }
... )  # Interpolate from this to the new grid
>>> df_new_grid = pl.DataFrame({"grid_points": range(1, 11)})
>>> df_new_grid.join(
...     df_original_grid, on="grid_points", how="left", coalesce=True
... ).with_columns(pl.col("values").interpolate())
shape: (10, 2)
┌─────────────┬────────┐
│ grid_points ┆ values │
│ ---         ┆ ---    │
│ i64         ┆ f64    │
╞═════════════╪════════╡
│ 1           ┆ 2.0    │
│ 2           ┆ 4.0    │
│ 3           ┆ 6.0    │
│ 4           ┆ 8.0    │
│ 5           ┆ 10.0   │
│ 6           ┆ 12.0   │
│ 7           ┆ 14.0   │
│ 8           ┆ 16.0   │
│ 9           ┆ 18.0   │
│ 10          ┆ 20.0   │
└─────────────┴────────┘

interpolate_by(by: IntoExpr) → Expr[source]

Fill null values using interpolation based on another column.

Nulls at the beginning and end of the series remain null.

Parameters:

by: Column to interpolate values based on.

Examples

Fill null values using linear interpolation.

>>> df = pl.DataFrame(
...     {
...         "a": [1, None, None, 3],
...         "b": [1, 2, 7, 8],
...     }
... )
>>> df.with_columns(a_interpolated=pl.col("a").interpolate_by("b"))
shape: (4, 3)
┌──────┬─────┬────────────────┐
│ a    ┆ b   ┆ a_interpolated │
│ ---  ┆ --- ┆ ---            │
│ i64  ┆ i64 ┆ f64            │
╞══════╪═════╪════════════════╡
│ 1    ┆ 1   ┆ 1.0            │
│ null ┆ 2   ┆ 1.285714       │
│ null ┆ 7   ┆ 2.714286       │
│ 3    ┆ 8   ┆ 3.0            │
└──────┴─────┴────────────────┘

is_between( lower_bound: IntoExpr, upper_bound: IntoExpr, closed: ClosedInterval = 'both', ) → Expr[source]

Check if this expression is between the given lower and upper bounds.

Parameters:

lower_bound: Lower bound value. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals.
upper_bound: Upper bound value. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals.
closed{‘both’, ‘left’, ‘right’, ‘none’}: Define which sides of the interval are closed (inclusive).

Returns:

Expr: Expression of data type Boolean.

Notes

If the value of the lower_bound is greater than that of the upper_bound then the result will be False, as no value can satisfy the condition.

Examples

>>> df = pl.DataFrame({"num": [1, 2, 3, 4, 5]})
>>> df.with_columns(pl.col("num").is_between(2, 4).alias("is_between"))
shape: (5, 2)
┌─────┬────────────┐
│ num ┆ is_between │
│ --- ┆ ---        │
│ i64 ┆ bool       │
╞═════╪════════════╡
│ 1   ┆ false      │
│ 2   ┆ true       │
│ 3   ┆ true       │
│ 4   ┆ true       │
│ 5   ┆ false      │
└─────┴────────────┘

Use the closed argument to include or exclude the values at the bounds:

>>> df.with_columns(
...     pl.col("num").is_between(2, 4, closed="left").alias("is_between")
... )
shape: (5, 2)
┌─────┬────────────┐
│ num ┆ is_between │
│ --- ┆ ---        │
│ i64 ┆ bool       │
╞═════╪════════════╡
│ 1   ┆ false      │
│ 2   ┆ true       │
│ 3   ┆ true       │
│ 4   ┆ false      │
│ 5   ┆ false      │
└─────┴────────────┘

You can also use strings as well as numeric/temporal values (note: ensure that string literals are wrapped with lit so as not to conflate them with column names):

>>> df = pl.DataFrame({"a": ["a", "b", "c", "d", "e"]})
>>> df.with_columns(
...     pl.col("a")
...     .is_between(pl.lit("a"), pl.lit("c"), closed="both")
...     .alias("is_between")
... )
shape: (5, 2)
┌─────┬────────────┐
│ a   ┆ is_between │
│ --- ┆ ---        │
│ str ┆ bool       │
╞═════╪════════════╡
│ a   ┆ true       │
│ b   ┆ true       │
│ c   ┆ true       │
│ d   ┆ false      │
│ e   ┆ false      │
└─────┴────────────┘

Use column expressions as lower/upper bounds, comparing to a literal value:

>>> df = pl.DataFrame({"a": [1, 2, 3, 4, 5], "b": [5, 4, 3, 2, 1]})
>>> df.with_columns(
...     pl.lit(3).is_between(pl.col("a"), pl.col("b")).alias("between_ab")
... )
shape: (5, 3)
┌─────┬─────┬────────────┐
│ a   ┆ b   ┆ between_ab │
│ --- ┆ --- ┆ ---        │
│ i64 ┆ i64 ┆ bool       │
╞═════╪═════╪════════════╡
│ 1   ┆ 5   ┆ true       │
│ 2   ┆ 4   ┆ true       │
│ 3   ┆ 3   ┆ true       │
│ 4   ┆ 2   ┆ false      │
│ 5   ┆ 1   ┆ false      │
└─────┴─────┴────────────┘

is_close( other: IntoExpr, *, abs_tol: float = 0.0, rel_tol: float = 1e-09, nans_equal: bool = False, ) → Expr[source]

Check if this expression is close, i.e. almost equal, to the other expression.

Two values a and b are considered close if the following condition holds:

\[|a-b| \le max \{ \text{rel_tol} \cdot max \{ |a|, |b| \}, \text{abs_tol} \}\]

Parameters:

abs_tol: Absolute tolerance. This is the maximum allowed absolute difference between two values. Must be non-negative.
rel_tol: Relative tolerance. This is the maximum allowed difference between two values, relative to the larger absolute value. Must be non-negative.
nans_equal: Whether NaN values should be considered equal.

Returns:

Expr: Expression of data type Boolean.

Notes

The implementation of this method is symmetric and mirrors the behavior of math.isclose(). Specifically note that this behavior is different to numpy.isclose().

Examples

>>> df = pl.DataFrame({"a": [1.5, 2.0, 2.5], "b": [1.55, 2.2, 3.0]})
>>> df.with_columns(pl.col("a").is_close("b", abs_tol=0.1).alias("is_close"))
shape: (3, 3)
┌─────┬──────┬──────────┐
│ a   ┆ b    ┆ is_close │
│ --- ┆ ---  ┆ ---      │
│ f64 ┆ f64  ┆ bool     │
╞═════╪══════╪══════════╡
│ 1.5 ┆ 1.55 ┆ true     │
│ 2.0 ┆ 2.2  ┆ false    │
│ 2.5 ┆ 3.0  ┆ false    │
└─────┴──────┴──────────┘

is_duplicated() → Expr[source]

Return a boolean mask indicating duplicated values.

Returns:

Expr: Expression of data type Boolean.

Examples

>>> df = pl.DataFrame({"a": [1, 1, 2]})
>>> df.select(pl.col("a").is_duplicated())
shape: (3, 1)
┌───────┐
│ a     │
│ ---   │
│ bool  │
╞═══════╡
│ true  │
│ true  │
│ false │
└───────┘

is_finite() → Expr[source]

Returns a boolean Series indicating which values are finite.

Returns:

Expr: Expression of data type Boolean.

Examples

>>> df = pl.DataFrame(
...     {
...         "A": [1.0, 2],
...         "B": [3.0, float("inf")],
...     }
... )
>>> df.select(pl.all().is_finite())
shape: (2, 2)
┌──────┬───────┐
│ A    ┆ B     │
│ ---  ┆ ---   │
│ bool ┆ bool  │
╞══════╪═══════╡
│ true ┆ true  │
│ true ┆ false │
└──────┴───────┘

is_first_distinct() → Expr[source]

Return a boolean mask indicating the first occurrence of each distinct value.

Returns:

Expr: Expression of data type Boolean.

Examples

>>> df = pl.DataFrame({"a": [1, 1, 2, 3, 2]})
>>> df.with_columns(pl.col("a").is_first_distinct().alias("first"))
shape: (5, 2)
┌─────┬───────┐
│ a   ┆ first │
│ --- ┆ ---   │
│ i64 ┆ bool  │
╞═════╪═══════╡
│ 1   ┆ true  │
│ 1   ┆ false │
│ 2   ┆ true  │
│ 3   ┆ true  │
│ 2   ┆ false │
└─────┴───────┘

is_in(other: Expr | Collection[Any] | Series, *, nulls_equal: bool = False) → Expr[source]

Check if elements of this expression are present in the other Series.

Parameters:

other: Series or sequence of primitive type.
nulls_equalbool, default False: If True, treat null as a distinct value. Null values will not propagate.

Returns:

Expr: Expression of data type Boolean.

Examples

>>> df = pl.DataFrame(
...     {"sets": [[1, 2, 3], [1, 2], [9, 10]], "optional_members": [1, 2, 3]}
... )
>>> df.with_columns(contains=pl.col("optional_members").is_in("sets"))
shape: (3, 3)
┌───────────┬──────────────────┬──────────┐
│ sets      ┆ optional_members ┆ contains │
│ ---       ┆ ---              ┆ ---      │
│ list[i64] ┆ i64              ┆ bool     │
╞═══════════╪══════════════════╪══════════╡
│ [1, 2, 3] ┆ 1                ┆ true     │
│ [1, 2]    ┆ 2                ┆ true     │
│ [9, 10]   ┆ 3                ┆ false    │
└───────────┴──────────────────┴──────────┘

is_infinite() → Expr[source]

Returns a boolean Series indicating which values are infinite.

Returns:

Expr: Expression of data type Boolean.

Examples

>>> df = pl.DataFrame(
...     {
...         "A": [1.0, 2],
...         "B": [3.0, float("inf")],
...     }
... )
>>> df.select(pl.all().is_infinite())
shape: (2, 2)
┌───────┬───────┐
│ A     ┆ B     │
│ ---   ┆ ---   │
│ bool  ┆ bool  │
╞═══════╪═══════╡
│ false ┆ false │
│ false ┆ true  │
└───────┴───────┘

is_last_distinct() → Expr[source]

Return a boolean mask indicating the last occurrence of each distinct value.

Returns:

Expr: Expression of data type Boolean.

Examples

>>> df = pl.DataFrame({"a": [1, 1, 2, 3, 2]})
>>> df.with_columns(pl.col("a").is_last_distinct().alias("last"))
shape: (5, 2)
┌─────┬───────┐
│ a   ┆ last  │
│ --- ┆ ---   │
│ i64 ┆ bool  │
╞═════╪═══════╡
│ 1   ┆ false │
│ 1   ┆ true  │
│ 2   ┆ false │
│ 3   ┆ true  │
│ 2   ┆ true  │
└─────┴───────┘

is_nan() → Expr[source]

Returns a boolean Series indicating which values are NaN.

Notes

Floating point NaN (Not A Number) should not be confused with missing data represented as Null/None.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 2, None, 1, 5],
...         "b": [1.0, 2.0, float("nan"), 1.0, 5.0],
...     }
... )
>>> df.with_columns(pl.col(pl.Float64).is_nan().name.suffix("_isnan"))
shape: (5, 3)
┌──────┬─────┬─────────┐
│ a    ┆ b   ┆ b_isnan │
│ ---  ┆ --- ┆ ---     │
│ i64  ┆ f64 ┆ bool    │
╞══════╪═════╪═════════╡
│ 1    ┆ 1.0 ┆ false   │
│ 2    ┆ 2.0 ┆ false   │
│ null ┆ NaN ┆ true    │
│ 1    ┆ 1.0 ┆ false   │
│ 5    ┆ 5.0 ┆ false   │
└──────┴─────┴─────────┘

is_not_nan() → Expr[source]

Returns a boolean Series indicating which values are not NaN.

Notes

Floating point NaN (Not A Number) should not be confused with missing data represented as Null/None.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 2, None, 1, 5],
...         "b": [1.0, 2.0, float("nan"), 1.0, 5.0],
...     }
... )
>>> df.with_columns(pl.col(pl.Float64).is_not_nan().name.suffix("_is_not_nan"))
shape: (5, 3)
┌──────┬─────┬──────────────┐
│ a    ┆ b   ┆ b_is_not_nan │
│ ---  ┆ --- ┆ ---          │
│ i64  ┆ f64 ┆ bool         │
╞══════╪═════╪══════════════╡
│ 1    ┆ 1.0 ┆ true         │
│ 2    ┆ 2.0 ┆ true         │
│ null ┆ NaN ┆ false        │
│ 1    ┆ 1.0 ┆ true         │
│ 5    ┆ 5.0 ┆ true         │
└──────┴─────┴──────────────┘

is_not_null() → Expr[source]

Returns a boolean Series indicating which values are not null.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 2, None, 1, 5],
...         "b": [1.0, 2.0, float("nan"), 1.0, 5.0],
...     }
... )
>>> df.with_columns(
...     pl.all().is_not_null().name.suffix("_not_null")  # nan != null
... )
shape: (5, 4)
┌──────┬─────┬────────────┬────────────┐
│ a    ┆ b   ┆ a_not_null ┆ b_not_null │
│ ---  ┆ --- ┆ ---        ┆ ---        │
│ i64  ┆ f64 ┆ bool       ┆ bool       │
╞══════╪═════╪════════════╪════════════╡
│ 1    ┆ 1.0 ┆ true       ┆ true       │
│ 2    ┆ 2.0 ┆ true       ┆ true       │
│ null ┆ NaN ┆ false      ┆ true       │
│ 1    ┆ 1.0 ┆ true       ┆ true       │
│ 5    ┆ 5.0 ┆ true       ┆ true       │
└──────┴─────┴────────────┴────────────┘

is_null() → Expr[source]

Returns a boolean Series indicating which values are null.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 2, None, 1, 5],
...         "b": [1.0, 2.0, float("nan"), 1.0, 5.0],
...     }
... )
>>> df.with_columns(pl.all().is_null().name.suffix("_isnull"))  # nan != null
shape: (5, 4)
┌──────┬─────┬──────────┬──────────┐
│ a    ┆ b   ┆ a_isnull ┆ b_isnull │
│ ---  ┆ --- ┆ ---      ┆ ---      │
│ i64  ┆ f64 ┆ bool     ┆ bool     │
╞══════╪═════╪══════════╪══════════╡
│ 1    ┆ 1.0 ┆ false    ┆ false    │
│ 2    ┆ 2.0 ┆ false    ┆ false    │
│ null ┆ NaN ┆ true     ┆ false    │
│ 1    ┆ 1.0 ┆ false    ┆ false    │
│ 5    ┆ 5.0 ┆ false    ┆ false    │
└──────┴─────┴──────────┴──────────┘

is_unique() → Expr[source]

Get mask of unique values.

Examples

>>> df = pl.DataFrame({"a": [1, 1, 2]})
>>> df.select(pl.col("a").is_unique())
shape: (3, 1)
┌───────┐
│ a     │
│ ---   │
│ bool  │
╞═══════╡
│ false │
│ false │
│ true  │
└───────┘

kurtosis(*, fisher: bool = True, bias: bool = True) → Expr[source]

Compute the kurtosis (Fisher or Pearson) of a dataset.

Kurtosis is the fourth central moment divided by the square of the variance. If Fisher’s definition is used, then 3.0 is subtracted from the result to give 0.0 for a normal distribution. If bias is False then the kurtosis is calculated using k statistics to eliminate bias coming from biased moment estimators.

See scipy.stats for more information

Parameters:

fisherbool, optional: If True, Fisher’s definition is used (normal ==> 0.0). If False, Pearson’s definition is used (normal ==> 3.0).
biasbool, optional: If False, the calculations are corrected for statistical bias.

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3, 2, 1]})
>>> df.select(pl.col("a").kurtosis())
shape: (1, 1)
┌───────────┐
│ a         │
│ ---       │
│ f64       │
╞═══════════╡
│ -1.153061 │
└───────────┘

last() → Expr[source]

Get the last value.

Examples

>>> df = pl.DataFrame({"a": [1, 3, 2]})
>>> df.select(pl.col("a").last())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 2   │
└─────┘

le(other: Any) → Expr[source]

Method equivalent of “less than or equal” operator expr <= other.

Parameters:

other: A literal or expression value to compare with.

Examples

>>> df = pl.DataFrame(
...     data={
...         "x": [5.0, 4.0, float("nan"), 0.5],
...         "y": [5.0, 3.5, float("nan"), 2.0],
...     }
... )
>>> df.with_columns(
...     pl.col("x").le(pl.col("y")).alias("x <= y"),
... )
shape: (4, 3)
┌─────┬─────┬────────┐
│ x   ┆ y   ┆ x <= y │
│ --- ┆ --- ┆ ---    │
│ f64 ┆ f64 ┆ bool   │
╞═════╪═════╪════════╡
│ 5.0 ┆ 5.0 ┆ true   │
│ 4.0 ┆ 3.5 ┆ false  │
│ NaN ┆ NaN ┆ true   │
│ 0.5 ┆ 2.0 ┆ true   │
└─────┴─────┴────────┘

len() → Expr[source]

Return the number of elements in the column.

Null values count towards the total.

Returns:

Expr: Expression of data type UInt32.

See also

count

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3], "b": [None, 4, 4]})
>>> df.select(pl.all().len())
shape: (1, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ u32 ┆ u32 │
╞═════╪═════╡
│ 3   ┆ 3   │
└─────┴─────┘

limit(n: int | Expr = 10) → Expr[source]

Get the first n rows (alias for Expr.head()).

Parameters:

n: Number of rows to return.

Examples

>>> df = pl.DataFrame({"foo": [1, 2, 3, 4, 5, 6, 7]})
>>> df.select(pl.col("foo").limit(3))
shape: (3, 1)
┌─────┐
│ foo │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
└─────┘

log(base: float | IntoExpr = 2.718281828459045) → Expr[source]

Compute the logarithm to a given base.

Parameters:

base: Given base, defaults to e

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3]})
>>> df.select(pl.col("a").log(base=2))
shape: (3, 1)
┌──────────┐
│ a        │
│ ---      │
│ f64      │
╞══════════╡
│ 0.0      │
│ 1.0      │
│ 1.584963 │
└──────────┘

log10() → Expr[source]

Compute the base 10 logarithm of the input array, element-wise.

Examples

>>> df = pl.DataFrame({"values": [1.0, 2.0, 4.0]})
>>> df.select(pl.col("values").log10())
shape: (3, 1)
┌─────────┐
│ values  │
│ ---     │
│ f64     │
╞═════════╡
│ 0.0     │
│ 0.30103 │
│ 0.60206 │
└─────────┘

log1p() → Expr[source]

Compute the natural logarithm of each element plus one.

This computes log(1 + x) but is more numerically stable for x close to zero.

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3]})
>>> df.select(pl.col("a").log1p())
shape: (3, 1)
┌──────────┐
│ a        │
│ ---      │
│ f64      │
╞══════════╡
│ 0.693147 │
│ 1.098612 │
│ 1.386294 │
└──────────┘

lower_bound() → Expr[source]

Calculate the lower bound.

Returns a unit Series with the lowest value possible for the dtype of this expression.

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3, 2, 1]})
>>> df.select(pl.col("a").lower_bound())
shape: (1, 1)
┌──────────────────────┐
│ a                    │
│ ---                  │
│ i64                  │
╞══════════════════════╡
│ -9223372036854775808 │
└──────────────────────┘

lt(other: Any) → Expr[source]

Method equivalent of “less than” operator expr < other.

Parameters:

other: A literal or expression value to compare with.

Examples

>>> df = pl.DataFrame(
...     data={
...         "x": [1.0, 2.0, float("nan"), 3.0],
...         "y": [2.0, 2.0, float("nan"), 4.0],
...     }
... )
>>> df.with_columns(
...     pl.col("x").lt(pl.col("y")).alias("x < y"),
... )
shape: (4, 3)
┌─────┬─────┬───────┐
│ x   ┆ y   ┆ x < y │
│ --- ┆ --- ┆ ---   │
│ f64 ┆ f64 ┆ bool  │
╞═════╪═════╪═══════╡
│ 1.0 ┆ 2.0 ┆ true  │
│ 2.0 ┆ 2.0 ┆ false │
│ NaN ┆ NaN ┆ false │
│ 3.0 ┆ 4.0 ┆ true  │
└─────┴─────┴───────┘

map_batches( function: Callable[[Series], Series | Any], return_dtype: PolarsDataType | DataTypeExpr | None = None, *, agg_list: bool = False, is_elementwise: bool = False, returns_scalar: bool = False, ) → Expr[source]

Apply a custom python function to a whole Series or sequence of Series.

The output of this custom function is presumed to be either a Series, or a NumPy array (in which case it will be automatically converted into a Series), or a scalar that will be converted into a Series. If the result is a scalar and you want it to stay as a scalar, pass in returns_scalar=True. If you want to apply a custom function elementwise over single values, see map_elements(). A reasonable use case for map functions is transforming the values represented by an expression using a third-party library.

Parameters:

function

Lambda/function to apply.

return_dtype

Datatype of the output Series.

It is recommended to set this whenever possible. If this is None, it tries to infer the datatype by calling the function with dummy data and looking at the output.

agg_list

First implode when in a group-by aggregation.

Deprecated since version 1.32.0: Use expr.implode().map_batches(..) instead.

is_elementwise

Set to true if the operations is elementwise for better performance and optimization.

An elementwise operations has unit or equal length for all inputs and can be ran sequentially on slices without results being affected.

returns_scalar

If the function returns a scalar, by default it will be wrapped in a list in the output, since the assumption is that the function always returns something Series-like. If you want to keep the result as a scalar, set this argument to True.

See also

map_elements
replace

Notes

A UDF passed to map_batches must be pure, meaning that it cannot modify or depend on state other than its arguments. Polars may call the function with arbitrary input data.

Examples

>>> df = pl.DataFrame(
...     {
...         "sine": [0.0, 1.0, 0.0, -1.0],
...         "cosine": [1.0, 0.0, -1.0, 0.0],
...     }
... )
>>> df.select(
...     pl.all().map_batches(
...         lambda x: x.to_numpy().argmax(),
...         returns_scalar=True,
...     )
... )
shape: (1, 2)
┌──────┬────────┐
│ sine ┆ cosine │
│ ---  ┆ ---    │
│ i64  ┆ i64    │
╞══════╪════════╡
│ 1    ┆ 0      │
└──────┴────────┘

Here’s an example of a function that returns a scalar, where we want it to stay as a scalar:

>>> df = pl.DataFrame(
...     {
...         "a": [0, 1, 0, 1],
...         "b": [1, 2, 3, 4],
...     }
... )
>>> df.group_by("a").agg(
...     pl.col("b").map_batches(
...         lambda x: x.max(), returns_scalar=True, return_dtype=pl.self_dtype()
...     )
... )  
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 4   │
│ 0   ┆ 3   │
└─────┴─────┘

Call a function that takes multiple arguments by creating a struct and referencing its fields inside the function call.

>>> df = pl.DataFrame(
...     {
...         "a": [5, 1, 0, 3],
...         "b": [4, 2, 3, 4],
...     }
... )
>>> df.with_columns(
...     a_times_b=pl.struct("a", "b").map_batches(
...         lambda x: np.multiply(x.struct.field("a"), x.struct.field("b")),
...         return_dtype=pl.Int64,
...     )
... )
shape: (4, 3)
┌─────┬─────┬───────────┐
│ a   ┆ b   ┆ a_times_b │
│ --- ┆ --- ┆ ---       │
│ i64 ┆ i64 ┆ i64       │
╞═════╪═════╪═══════════╡
│ 5   ┆ 4   ┆ 20        │
│ 1   ┆ 2   ┆ 2         │
│ 0   ┆ 3   ┆ 0         │
│ 3   ┆ 4   ┆ 12        │
└─────┴─────┴───────────┘

map_elements( function: Callable[[Any], Any], return_dtype: PolarsDataType | DataTypeExpr | None = None, *, skip_nulls: bool = True, pass_name: bool = False, strategy: MapElementsStrategy = 'thread_local', returns_scalar: bool = False, ) → Expr[source]

Map a custom/user-defined function (UDF) to each element of a column.

Warning

This method is much slower than the native expressions API. Only use it if you cannot implement your logic otherwise.

Suppose that the function is: x ↦ sqrt(x):

For mapping elements of a series, consider: pl.col("col_name").sqrt().
For mapping inner elements of lists, consider: pl.col("col_name").list.eval(pl.element().sqrt()).
For mapping elements of struct fields, consider: pl.col("col_name").struct.field("field_name").sqrt().

If you want to replace the original column or field, consider .with_columns and .with_fields.

Parameters:

function

Lambda/function to map.

return_dtype

Datatype of the output Series.

It is recommended to set this whenever possible. If this is None, it tries to infer the datatype by calling the function with dummy data and looking at the output.

skip_nulls

Don’t map the function over values that contain nulls (this is faster).

pass_name

Pass the Series name to the custom function (this is more expensive).

returns_scalar

Deprecated since version 1.32.0: Is ignored and will be removed in 2.0.

strategy{‘thread_local’, ‘threading’}

The threading strategy to use.

‘thread_local’: run the python function on a single thread.
‘threading’: run the python function on separate threads. Use with care as this can slow performance. This might only speed up your code if the amount of work per element is significant and the python function releases the GIL (e.g. via calling a c function)

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

Notes

Using map_elements is strongly discouraged as you will be effectively running python “for” loops, which will be very slow. Wherever possible you should prefer the native expression API to achieve the best performance.
If your function is expensive and you don’t want it to be called more than once for a given input, consider applying an @lru_cache decorator to it. If your data is suitable you may achieve significant speedups.
Window function application using over is considered a GroupBy context here, so map_elements can be used to map functions over window groups.
A UDF passed to map_elements must be pure, meaning that it cannot modify or depend on state other than its arguments. Polars may call the function with arbitrary input data.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 2, 3, 1],
...         "b": ["a", "b", "c", "c"],
...     }
... )

The function is applied to each element of column 'a':

>>> df.with_columns(  
...     pl.col("a")
...     .map_elements(lambda x: x * 2, return_dtype=pl.self_dtype())
...     .alias("a_times_2"),
... )
shape: (4, 3)
┌─────┬─────┬───────────┐
│ a   ┆ b   ┆ a_times_2 │
│ --- ┆ --- ┆ ---       │
│ i64 ┆ str ┆ i64       │
╞═════╪═════╪═══════════╡
│ 1   ┆ a   ┆ 2         │
│ 2   ┆ b   ┆ 4         │
│ 3   ┆ c   ┆ 6         │
│ 1   ┆ c   ┆ 2         │
└─────┴─────┴───────────┘

Tip: it is better to implement this with an expression:

>>> df.with_columns(
...     (pl.col("a") * 2).alias("a_times_2"),
... )  

>>> (
...     df.lazy()
...     .group_by("b")
...     .agg(
...         pl.col("a")
...         .implode()
...         .map_elements(lambda x: x.sum(), return_dtype=pl.Int64)
...     )
...     .collect()
... )  
shape: (3, 2)
┌─────┬─────┐
│ b   ┆ a   │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a   ┆ 1   │
│ b   ┆ 2   │
│ c   ┆ 4   │
└─────┴─────┘

Tip: again, it is better to implement this with an expression:

>>> (
...     df.lazy()
...     .group_by("b", maintain_order=True)
...     .agg(pl.col("a").sum())
...     .collect()
... )  

Window function application using over will behave as a GroupBy context, with your function receiving individual window groups:

>>> df = pl.DataFrame(
...     {
...         "key": ["x", "x", "y", "x", "y", "z"],
...         "val": [1, 1, 1, 1, 1, 1],
...     }
... )
>>> df.with_columns(
...     scaled=pl.col("val")
...     .implode()
...     .map_elements(lambda s: s * len(s), return_dtype=pl.List(pl.Int64))
...     .explode()
...     .over("key"),
... ).sort("key")
shape: (6, 3)
┌─────┬─────┬────────┐
│ key ┆ val ┆ scaled │
│ --- ┆ --- ┆ ---    │
│ str ┆ i64 ┆ i64    │
╞═════╪═════╪════════╡
│ x   ┆ 1   ┆ 3      │
│ x   ┆ 1   ┆ 3      │
│ x   ┆ 1   ┆ 3      │
│ y   ┆ 1   ┆ 2      │
│ y   ┆ 1   ┆ 2      │
│ z   ┆ 1   ┆ 1      │
└─────┴─────┴────────┘

Note that this function would also be better-implemented natively:

>>> df.with_columns(
...     scaled=(pl.col("val") * pl.col("val").count()).over("key"),
... ).sort("key")  

max() → Expr[source]

Get maximum value.

Examples

>>> df = pl.DataFrame({"a": [-1.0, float("nan"), 1.0]})
>>> df.select(pl.col("a").max())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 1.0 │
└─────┘

mean() → Expr[source]

Get mean value.

Examples

>>> df = pl.DataFrame({"a": [-1, 0, 1]})
>>> df.select(pl.col("a").mean())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 0.0 │
└─────┘

median() → Expr[source]

Get median value using linear interpolation.

Examples

>>> df = pl.DataFrame({"a": [-1, 0, 1]})
>>> df.select(pl.col("a").median())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 0.0 │
└─────┘

min() → Expr[source]

Get minimum value.

Examples

>>> df = pl.DataFrame({"a": [-1.0, float("nan"), 1.0]})
>>> df.select(pl.col("a").min())
shape: (1, 1)
┌──────┐
│ a    │
│ ---  │
│ f64  │
╞══════╡
│ -1.0 │
└──────┘

mod(other: Any) → Expr[source]

Method equivalent of modulus operator expr % other.

Parameters:

other: Numeric literal or expression value.

Examples

>>> df = pl.DataFrame({"x": [0, 1, 2, 3, 4]})
>>> df.with_columns(pl.col("x").mod(2).alias("x%2"))
shape: (5, 2)
┌─────┬─────┐
│ x   ┆ x%2 │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 0   ┆ 0   │
│ 1   ┆ 1   │
│ 2   ┆ 0   │
│ 3   ┆ 1   │
│ 4   ┆ 0   │
└─────┴─────┘

mode() → Expr[source]

Compute the most occurring value(s).

Can return multiple Values.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 1, 2, 3],
...         "b": [1, 1, 2, 2],
...     }
... )
>>> df.select(pl.all().mode().first())  
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 1   │
└─────┴─────┘

mul(other: Any) → Expr[source]

Method equivalent of multiplication operator expr * other.

Parameters:

other: Numeric literal or expression value.

Examples

>>> df = pl.DataFrame({"x": [1, 2, 4, 8, 16]})
>>> df.with_columns(
...     pl.col("x").mul(2).alias("x*2"),
...     pl.col("x").mul(pl.col("x").log(2)).alias("x * xlog2"),
... )
shape: (5, 3)
┌─────┬─────┬───────────┐
│ x   ┆ x*2 ┆ x * xlog2 │
│ --- ┆ --- ┆ ---       │
│ i64 ┆ i64 ┆ f64       │
╞═════╪═════╪═══════════╡
│ 1   ┆ 2   ┆ 0.0       │
│ 2   ┆ 4   ┆ 2.0       │
│ 4   ┆ 8   ┆ 8.0       │
│ 8   ┆ 16  ┆ 24.0      │
│ 16  ┆ 32  ┆ 64.0      │
└─────┴─────┴───────────┘

n_unique() → Expr[source]

Count unique values.

Notes

null is considered to be a unique value for the purposes of this operation.

Examples

>>> df = pl.DataFrame({"x": [1, 1, 2, 2, 3], "y": [1, 1, 1, None, None]})
>>> df.select(
...     x_unique=pl.col("x").n_unique(),
...     y_unique=pl.col("y").n_unique(),
... )
shape: (1, 2)
┌──────────┬──────────┐
│ x_unique ┆ y_unique │
│ ---      ┆ ---      │
│ u32      ┆ u32      │
╞══════════╪══════════╡
│ 3        ┆ 2        │
└──────────┴──────────┘

nan_max() → Expr[source]

Get maximum value, but propagate/poison encountered NaN values.

This differs from numpy’s nanmax as numpy defaults to propagating NaN values, whereas polars defaults to ignoring them.

Examples

>>> df = pl.DataFrame({"a": [0.0, float("nan")]})
>>> df.select(pl.col("a").nan_max())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ NaN │
└─────┘

nan_min() → Expr[source]

Get minimum value, but propagate/poison encountered NaN values.

This differs from numpy’s nanmax as numpy defaults to propagating NaN values, whereas polars defaults to ignoring them.

Examples

>>> df = pl.DataFrame({"a": [0.0, float("nan")]})
>>> df.select(pl.col("a").nan_min())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ NaN │
└─────┘

ne(other: Any) → Expr[source]

Method equivalent of inequality operator expr != other.

Parameters:

other: A literal or expression value to compare with.

Examples

>>> df = pl.DataFrame(
...     data={
...         "x": [1.0, 2.0, float("nan"), 4.0],
...         "y": [2.0, 2.0, float("nan"), 4.0],
...     }
... )
>>> df.with_columns(
...     pl.col("x").ne(pl.col("y")).alias("x != y"),
... )
shape: (4, 3)
┌─────┬─────┬────────┐
│ x   ┆ y   ┆ x != y │
│ --- ┆ --- ┆ ---    │
│ f64 ┆ f64 ┆ bool   │
╞═════╪═════╪════════╡
│ 1.0 ┆ 2.0 ┆ true   │
│ 2.0 ┆ 2.0 ┆ false  │
│ NaN ┆ NaN ┆ false  │
│ 4.0 ┆ 4.0 ┆ false  │
└─────┴─────┴────────┘

ne_missing(other: Any) → Expr[source]

Method equivalent of equality operator expr != other where None == None.

This differs from default ne where null values are propagated.

Parameters:

other: A literal or expression value to compare with.

Examples

>>> df = pl.DataFrame(
...     data={
...         "x": [1.0, 2.0, float("nan"), 4.0, None, None],
...         "y": [2.0, 2.0, float("nan"), 4.0, 5.0, None],
...     }
... )
>>> df.with_columns(
...     pl.col("x").ne(pl.col("y")).alias("x ne y"),
...     pl.col("x").ne_missing(pl.col("y")).alias("x ne_missing y"),
... )
shape: (6, 4)
┌──────┬──────┬────────┬────────────────┐
│ x    ┆ y    ┆ x ne y ┆ x ne_missing y │
│ ---  ┆ ---  ┆ ---    ┆ ---            │
│ f64  ┆ f64  ┆ bool   ┆ bool           │
╞══════╪══════╪════════╪════════════════╡
│ 1.0  ┆ 2.0  ┆ true   ┆ true           │
│ 2.0  ┆ 2.0  ┆ false  ┆ false          │
│ NaN  ┆ NaN  ┆ false  ┆ false          │
│ 4.0  ┆ 4.0  ┆ false  ┆ false          │
│ null ┆ 5.0  ┆ null   ┆ true           │
│ null ┆ null ┆ null   ┆ false          │
└──────┴──────┴────────┴────────────────┘

neg() → Expr[source]

Method equivalent of unary minus operator -expr.

Examples

>>> df = pl.DataFrame({"a": [-1, 0, 2, None]})
>>> df.with_columns(pl.col("a").neg())
shape: (4, 1)
┌──────┐
│ a    │
│ ---  │
│ i64  │
╞══════╡
│ 1    │
│ 0    │
│ -2   │
│ null │
└──────┘

not_() → Expr[source]

Negate a boolean expression.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [True, False, False],
...         "b": ["a", "b", None],
...     }
... )
>>> df
shape: (3, 2)
┌───────┬──────┐
│ a     ┆ b    │
│ ---   ┆ ---  │
│ bool  ┆ str  │
╞═══════╪══════╡
│ true  ┆ a    │
│ false ┆ b    │
│ false ┆ null │
└───────┴──────┘
>>> df.select(pl.col("a").not_())
shape: (3, 1)
┌───────┐
│ a     │
│ ---   │
│ bool  │
╞═══════╡
│ false │
│ true  │
│ true  │
└───────┘

null_count() → Expr[source]

Count null values.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [None, 1, None],
...         "b": [10, None, 300],
...         "c": [350, 650, 850],
...     }
... )
>>> df.select(pl.all().null_count())
shape: (1, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ u32 ┆ u32 ┆ u32 │
╞═════╪═════╪═════╡
│ 2   ┆ 1   ┆ 0   │
└─────┴─────┴─────┘

or_(*others: Any) → Expr[source]

Method equivalent of bitwise “or” operator expr | other | ....

Parameters:

*others: One or more integer or boolean expressions to evaluate/combine.

Examples

>>> df = pl.DataFrame(
...     data={
...         "x": [5, 6, 7, 4, 8],
...         "y": [1.5, 2.5, 1.0, 4.0, -5.75],
...         "z": [-9, 2, -1, 4, 8],
...     }
... )
>>> df.select(
...     (pl.col("x") == pl.col("y"))
...     .or_(
...         pl.col("x") == pl.col("y"),
...         pl.col("y") == pl.col("z"),
...         pl.col("y").cast(int) == pl.col("z"),
...     )
...     .alias("any")
... )
shape: (5, 1)
┌───────┐
│ any   │
│ ---   │
│ bool  │
╞═══════╡
│ false │
│ true  │
│ false │
│ true  │
│ false │
└───────┘

over( partition_by: IntoExpr | Iterable[IntoExpr] | None = None, *more_exprs: IntoExpr, order_by: IntoExpr | Iterable[IntoExpr] | None = None, descending: bool = False, nulls_last: bool = False, mapping_strategy: WindowMappingStrategy = 'group_to_rows', ) → Expr[source]

Compute expressions over the given groups.

This expression is similar to performing a group by aggregation and joining the result back into the original DataFrame.

The outcome is similar to how window functions work in PostgreSQL.

Parameters:

partition_by

Column(s) to group by. Accepts expression input. Strings are parsed as column names.

*more_exprs

Additional columns to group by, specified as positional arguments.

order_by

Order the window functions/aggregations with the partitioned groups by the result of the expression passed to order_by.

descending

In case ‘order_by’ is given, indicate whether to order in ascending or descending order.

nulls_last

In case ‘order_by’ is given, indicate whether to order the nulls in last position.

mapping_strategy: {‘group_to_rows’, ‘join’, ‘explode’}

group_to_rows
If the aggregation results in multiple values, assign them back to their position in the DataFrame. This can only be done if the group yields the same elements before aggregation as after.
join
Join the groups as ‘List<group_dtype>’ to the row positions. warning: this can be memory intensive.
explode
Explodes the grouped data into new rows, similar to the results of group_by + agg + explode. Sorting of the given groups is required if the groups are not part of the window operation for the operation, otherwise the result would not make sense. This operation changes the number of rows.

Examples

Pass the name of a column to compute the expression over that column.

>>> df = pl.DataFrame(
...     {
...         "a": ["a", "a", "b", "b", "b"],
...         "b": [1, 2, 3, 5, 3],
...         "c": [5, 4, 3, 2, 1],
...     }
... )
>>> df.with_columns(c_max=pl.col("c").max().over("a"))
shape: (5, 4)
┌─────┬─────┬─────┬───────┐
│ a   ┆ b   ┆ c   ┆ c_max │
│ --- ┆ --- ┆ --- ┆ ---   │
│ str ┆ i64 ┆ i64 ┆ i64   │
╞═════╪═════╪═════╪═══════╡
│ a   ┆ 1   ┆ 5   ┆ 5     │
│ a   ┆ 2   ┆ 4   ┆ 5     │
│ b   ┆ 3   ┆ 3   ┆ 3     │
│ b   ┆ 5   ┆ 2   ┆ 3     │
│ b   ┆ 3   ┆ 1   ┆ 3     │
└─────┴─────┴─────┴───────┘

Expression input is also supported.

>>> df.with_columns(c_max=pl.col("c").max().over(pl.col("b") // 2))
shape: (5, 4)
┌─────┬─────┬─────┬───────┐
│ a   ┆ b   ┆ c   ┆ c_max │
│ --- ┆ --- ┆ --- ┆ ---   │
│ str ┆ i64 ┆ i64 ┆ i64   │
╞═════╪═════╪═════╪═══════╡
│ a   ┆ 1   ┆ 5   ┆ 5     │
│ a   ┆ 2   ┆ 4   ┆ 4     │
│ b   ┆ 3   ┆ 3   ┆ 4     │
│ b   ┆ 5   ┆ 2   ┆ 2     │
│ b   ┆ 3   ┆ 1   ┆ 4     │
└─────┴─────┴─────┴───────┘

Group by multiple columns by passing multiple column names or expressions.

>>> df.with_columns(c_min=pl.col("c").min().over("a", pl.col("b") % 2))
shape: (5, 4)
┌─────┬─────┬─────┬───────┐
│ a   ┆ b   ┆ c   ┆ c_min │
│ --- ┆ --- ┆ --- ┆ ---   │
│ str ┆ i64 ┆ i64 ┆ i64   │
╞═════╪═════╪═════╪═══════╡
│ a   ┆ 1   ┆ 5   ┆ 5     │
│ a   ┆ 2   ┆ 4   ┆ 4     │
│ b   ┆ 3   ┆ 3   ┆ 1     │
│ b   ┆ 5   ┆ 2   ┆ 1     │
│ b   ┆ 3   ┆ 1   ┆ 1     │
└─────┴─────┴─────┴───────┘

You can use non-elementwise expressions with over too. By default they are evaluated using row-order, but you can specify a different one using order_by.

>>> from datetime import date
>>> df = pl.DataFrame(
...     {
...         "store_id": ["a", "a", "b", "b"],
...         "date": [
...             date(2024, 9, 18),
...             date(2024, 9, 17),
...             date(2024, 9, 18),
...             date(2024, 9, 16),
...         ],
...         "sales": [7, 9, 8, 10],
...     }
... )
>>> df.with_columns(
...     cumulative_sales=pl.col("sales")
...     .cum_sum()
...     .over("store_id", order_by="date")
... )
shape: (4, 4)
┌──────────┬────────────┬───────┬──────────────────┐
│ store_id ┆ date       ┆ sales ┆ cumulative_sales │
│ ---      ┆ ---        ┆ ---   ┆ ---              │
│ str      ┆ date       ┆ i64   ┆ i64              │
╞══════════╪════════════╪═══════╪══════════════════╡
│ a        ┆ 2024-09-18 ┆ 7     ┆ 16               │
│ a        ┆ 2024-09-17 ┆ 9     ┆ 9                │
│ b        ┆ 2024-09-18 ┆ 8     ┆ 18               │
│ b        ┆ 2024-09-16 ┆ 10    ┆ 10               │
└──────────┴────────────┴───────┴──────────────────┘

If you don’t require that the group order be preserved, then the more performant option is to use mapping_strategy='explode' - be careful however to only ever use this in a select statement, not a with_columns one.

>>> window = {
...     "partition_by": "store_id",
...     "order_by": "date",
...     "mapping_strategy": "explode",
... }
>>> df.select(
...     pl.all().over(**window),
...     cumulative_sales=pl.col("sales").cum_sum().over(**window),
... )
shape: (4, 4)
┌──────────┬────────────┬───────┬──────────────────┐
│ store_id ┆ date       ┆ sales ┆ cumulative_sales │
│ ---      ┆ ---        ┆ ---   ┆ ---              │
│ str      ┆ date       ┆ i64   ┆ i64              │
╞══════════╪════════════╪═══════╪══════════════════╡
│ a        ┆ 2024-09-17 ┆ 9     ┆ 9                │
│ a        ┆ 2024-09-18 ┆ 7     ┆ 16               │
│ b        ┆ 2024-09-16 ┆ 10    ┆ 10               │
│ b        ┆ 2024-09-18 ┆ 8     ┆ 18               │
└──────────┴────────────┴───────┴──────────────────┘

pct_change(n: int | IntoExprColumn = 1) → Expr[source]

Computes percentage change between values.

Percentage change (as fraction) between current element and most-recent non-null element at least n period(s) before the current element.

Computes the change from the previous row by default.

Parameters:

n: periods to shift for forming percent change.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [10, 11, 12, None, 12],
...     }
... )
>>> df.with_columns(pl.col("a").pct_change().alias("pct_change"))
shape: (5, 2)
┌──────┬────────────┐
│ a    ┆ pct_change │
│ ---  ┆ ---        │
│ i64  ┆ f64        │
╞══════╪════════════╡
│ 10   ┆ null       │
│ 11   ┆ 0.1        │
│ 12   ┆ 0.090909   │
│ null ┆ 0.0        │
│ 12   ┆ 0.0        │
└──────┴────────────┘

peak_max() → Expr[source]

Get a boolean mask of the local maximum peaks.

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3, 4, 5]})
>>> df.select(pl.col("a").peak_max())
shape: (5, 1)
┌───────┐
│ a     │
│ ---   │
│ bool  │
╞═══════╡
│ false │
│ false │
│ false │
│ false │
│ true  │
└───────┘

peak_min() → Expr[source]

Get a boolean mask of the local minimum peaks.

Examples

>>> df = pl.DataFrame({"a": [4, 1, 3, 2, 5]})
>>> df.select(pl.col("a").peak_min())
shape: (5, 1)
┌───────┐
│ a     │
│ ---   │
│ bool  │
╞═══════╡
│ false │
│ true  │
│ false │
│ true  │
│ false │
└───────┘

pipe(

function: Callable[Concatenate[Expr, P], T],

*args: P.args,

**kwargs: P.kwargs,

) → T[source]

Offers a structured way to apply a sequence of user-defined functions (UDFs).

Parameters:

function: Callable; will receive the expression as the first parameter, followed by any given args/kwargs.
*args: Arguments to pass to the UDF.
**kwargs: Keyword arguments to pass to the UDF.

Examples

>>> def extract_number(expr: pl.Expr) -> pl.Expr:
...     """Extract the digits from a string."""
...     return expr.str.extract(r"\d+", 0).cast(pl.Int64)
>>>
>>> def scale_negative_even(expr: pl.Expr, *, n: int = 1) -> pl.Expr:
...     """Set even numbers negative, and scale by a user-supplied value."""
...     expr = pl.when(expr % 2 == 0).then(-expr).otherwise(expr)
...     return expr * n
>>>
>>> df = pl.DataFrame({"val": ["a: 1", "b: 2", "c: 3", "d: 4"]})
>>> df.with_columns(
...     udfs=(
...         pl.col("val").pipe(extract_number).pipe(scale_negative_even, n=5)
...     ),
... )
shape: (4, 2)
┌──────┬──────┐
│ val  ┆ udfs │
│ ---  ┆ ---  │
│ str  ┆ i64  │
╞══════╪══════╡
│ a: 1 ┆ 5    │
│ b: 2 ┆ -10  │
│ c: 3 ┆ 15   │
│ d: 4 ┆ -20  │
└──────┴──────┘

pow(exponent: IntoExprColumn | int | float) → Expr[source]

Method equivalent of exponentiation operator expr ** exponent.

If the exponent is float, the result follows the dtype of exponent. Otherwise, it follows dtype of base.

Parameters:

exponent: Numeric literal or expression exponent value.

Examples

>>> df = pl.DataFrame({"x": [1, 2, 4, 8]})
>>> df.with_columns(
...     pl.col("x").pow(3).alias("cube"),
...     pl.col("x").pow(pl.col("x").log(2)).alias("x ** xlog2"),
... )
shape: (4, 3)
┌─────┬──────┬────────────┐
│ x   ┆ cube ┆ x ** xlog2 │
│ --- ┆ ---  ┆ ---        │
│ i64 ┆ i64  ┆ f64        │
╞═════╪══════╪════════════╡
│ 1   ┆ 1    ┆ 1.0        │
│ 2   ┆ 8    ┆ 2.0        │
│ 4   ┆ 64   ┆ 16.0       │
│ 8   ┆ 512  ┆ 512.0      │
└─────┴──────┴────────────┘

Raising an integer to a positive integer results in an integer - in order to raise to a negative integer, you can cast either the base or the exponent to float first:

>>> df.with_columns(
...     x_squared=pl.col("x").pow(2),
...     x_inverse=pl.col("x").pow(-1.0),
... )
shape: (4, 3)
┌─────┬───────────┬───────────┐
│ x   ┆ x_squared ┆ x_inverse │
│ --- ┆ ---       ┆ ---       │
│ i64 ┆ i64       ┆ f64       │
╞═════╪═══════════╪═══════════╡
│ 1   ┆ 1         ┆ 1.0       │
│ 2   ┆ 4         ┆ 0.5       │
│ 4   ┆ 16        ┆ 0.25      │
│ 8   ┆ 64        ┆ 0.125     │
└─────┴───────────┴───────────┘

product() → Expr[source]

Compute the product of an expression.

Notes

If there are no non-null values, then the output is 1. If you would prefer empty products to return None, you can use pl.when(expr.count()>0).then(expr.product()) instead of expr.product().

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3]})
>>> df.select(pl.col("a").product())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 6   │
└─────┘

qcut( quantiles: Sequence[float] | int, *, labels: Sequence[str] | None = None, left_closed: bool = False, allow_duplicates: bool = False, include_breaks: bool = False, ) → Expr[source]

Bin continuous values into discrete categories based on their quantiles.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

Parameters:

quantiles: Either a list of quantile probabilities between 0 and 1 or a positive integer determining the number of bins with uniform probability.
labels: Names of the categories. The number of labels must be equal to the number of categories.
left_closed: Set the intervals to be left-closed instead of right-closed.
allow_duplicates: If set to True, duplicates in the resulting quantiles are dropped, rather than raising a DuplicateError. This can happen even with unique probabilities, depending on the data.
include_breaks: Include a column with the right endpoint of the bin each observation falls in. This will change the data type of the output from a Categorical to a Struct.

Returns:

Expr: Expression of data type Categorical if include_breaks is set to False (default), otherwise an expression of data type Struct.

See also

cut

Examples

Divide a column into three categories according to pre-defined quantile probabilities.

>>> df = pl.DataFrame({"foo": [-2, -1, 0, 1, 2]})
>>> df.with_columns(
...     pl.col("foo").qcut([0.25, 0.75], labels=["a", "b", "c"]).alias("qcut")
... )
shape: (5, 2)
┌─────┬──────┐
│ foo ┆ qcut │
│ --- ┆ ---  │
│ i64 ┆ cat  │
╞═════╪══════╡
│ -2  ┆ a    │
│ -1  ┆ a    │
│ 0   ┆ b    │
│ 1   ┆ b    │
│ 2   ┆ c    │
└─────┴──────┘

Divide a column into two categories using uniform quantile probabilities.

>>> df.with_columns(
...     pl.col("foo")
...     .qcut(2, labels=["low", "high"], left_closed=True)
...     .alias("qcut")
... )
shape: (5, 2)
┌─────┬──────┐
│ foo ┆ qcut │
│ --- ┆ ---  │
│ i64 ┆ cat  │
╞═════╪══════╡
│ -2  ┆ low  │
│ -1  ┆ low  │
│ 0   ┆ high │
│ 1   ┆ high │
│ 2   ┆ high │
└─────┴──────┘

Add both the category and the breakpoint.

>>> df.with_columns(
...     pl.col("foo").qcut([0.25, 0.75], include_breaks=True).alias("qcut")
... ).unnest("qcut")
shape: (5, 3)
┌─────┬────────────┬────────────┐
│ foo ┆ breakpoint ┆ category   │
│ --- ┆ ---        ┆ ---        │
│ i64 ┆ f64        ┆ cat        │
╞═════╪════════════╪════════════╡
│ -2  ┆ -1.0       ┆ (-inf, -1] │
│ -1  ┆ -1.0       ┆ (-inf, -1] │
│ 0   ┆ 1.0        ┆ (-1, 1]    │
│ 1   ┆ 1.0        ┆ (-1, 1]    │
│ 2   ┆ inf        ┆ (1, inf]   │
└─────┴────────────┴────────────┘

quantile(quantile: float | Expr, interpolation: QuantileMethod = 'nearest') → Expr[source]

Get quantile value.

Parameters:

quantile: Quantile between 0.0 and 1.0.
interpolation{‘nearest’, ‘higher’, ‘lower’, ‘midpoint’, ‘linear’, ‘equiprobable’}: Interpolation method.

Examples

>>> df = pl.DataFrame({"a": [0, 1, 2, 3, 4, 5]})
>>> df.select(pl.col("a").quantile(0.3))
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 2.0 │
└─────┘
>>> df.select(pl.col("a").quantile(0.3, interpolation="higher"))
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 2.0 │
└─────┘
>>> df.select(pl.col("a").quantile(0.3, interpolation="lower"))
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 1.0 │
└─────┘
>>> df.select(pl.col("a").quantile(0.3, interpolation="midpoint"))
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 1.5 │
└─────┘
>>> df.select(pl.col("a").quantile(0.3, interpolation="linear"))
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 1.5 │
└─────┘

radians() → Expr[source]

Convert from degrees to radians.

Returns:

Expr: Expression of data type Float64.

Examples

>>> df = pl.DataFrame({"a": [-720, -540, -360, -180, 0, 180, 360, 540, 720]})
>>> df.select(pl.col("a").radians())
shape: (9, 1)
┌────────────┐
│ a          │
│ ---        │
│ f64        │
╞════════════╡
│ -12.566371 │
│ -9.424778  │
│ -6.283185  │
│ -3.141593  │
│ 0.0        │
│ 3.141593   │
│ 6.283185   │
│ 9.424778   │
│ 12.566371  │
└────────────┘

rank( method: RankMethod = 'average', *, descending: bool = False, seed: int | None = None, ) → Expr[source]

Assign ranks to data, dealing with ties appropriately.

Parameters:

method{‘average’, ‘min’, ‘max’, ‘dense’, ‘ordinal’, ‘random’}

The method used to assign ranks to tied elements. The following methods are available (default is ‘average’):

‘average’ : The average of the ranks that would have been assigned to all the tied values is assigned to each value.
‘min’ : The minimum of the ranks that would have been assigned to all the tied values is assigned to each value. (This is also referred to as “competition” ranking.)
‘max’ : The maximum of the ranks that would have been assigned to all the tied values is assigned to each value.
‘dense’ : Like ‘min’, but the rank of the next highest element is assigned the rank immediately after those assigned to the tied elements.
‘ordinal’ : All values are given a distinct rank, corresponding to the order that the values occur in the Series.
‘random’ : Like ‘ordinal’, but the rank for ties is not dependent on the order that the values occur in the Series.

descending

Rank in descending order.

seed

If method="random", use this as seed.

Examples

The ‘average’ method:

>>> df = pl.DataFrame({"a": [3, 6, 1, 1, 6]})
>>> df.select(pl.col("a").rank())
shape: (5, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 3.0 │
│ 4.5 │
│ 1.5 │
│ 1.5 │
│ 4.5 │
└─────┘

The ‘ordinal’ method:

>>> df = pl.DataFrame({"a": [3, 6, 1, 1, 6]})
>>> df.select(pl.col("a").rank("ordinal"))
shape: (5, 1)
┌─────┐
│ a   │
│ --- │
│ u32 │
╞═════╡
│ 3   │
│ 4   │
│ 1   │
│ 2   │
│ 5   │
└─────┘

Use ‘rank’ with ‘over’ to rank within groups:

>>> df = pl.DataFrame({"a": [1, 1, 2, 2, 2], "b": [6, 7, 5, 14, 11]})
>>> df.with_columns(pl.col("b").rank().over("a").alias("rank"))
shape: (5, 3)
┌─────┬─────┬──────┐
│ a   ┆ b   ┆ rank │
│ --- ┆ --- ┆ ---  │
│ i64 ┆ i64 ┆ f64  │
╞═════╪═════╪══════╡
│ 1   ┆ 6   ┆ 1.0  │
│ 1   ┆ 7   ┆ 2.0  │
│ 2   ┆ 5   ┆ 1.0  │
│ 2   ┆ 14  ┆ 3.0  │
│ 2   ┆ 11  ┆ 2.0  │
└─────┴─────┴──────┘

Divide by the length or number of non-null values to compute the percentile rank.

>>> df = pl.DataFrame({"a": [6, 7, None, 14, 11]})
>>> df.with_columns(
...     pct=pl.col("a").rank() / pl.len(),
...     pct_valid=pl.col("a").rank() / pl.count("a"),
... )
shape: (5, 3)
┌──────┬──────┬───────────┐
│ a    ┆ pct  ┆ pct_valid │
│ ---  ┆ ---  ┆ ---       │
│ i64  ┆ f64  ┆ f64       │
╞══════╪══════╪═══════════╡
│ 6    ┆ 0.2  ┆ 0.25      │
│ 7    ┆ 0.4  ┆ 0.5       │
│ null ┆ null ┆ null      │
│ 14   ┆ 0.8  ┆ 1.0       │
│ 11   ┆ 0.6  ┆ 0.75      │
└──────┴──────┴───────────┘

rechunk() → Expr[source]

Create a single chunk of memory for this Series.

Examples

>>> df = pl.DataFrame({"a": [1, 1, 2]})

Create a Series with 3 nulls, append column a, then rechunk.

>>> df.select(pl.repeat(None, 3).append(pl.col("a")).rechunk())
shape: (6, 1)
┌────────┐
│ repeat │
│ ---    │
│ i64    │
╞════════╡
│ null   │
│ null   │
│ null   │
│ 1      │
│ 1      │
│ 2      │
└────────┘

register_plugin( *, lib: str, symbol: str, args: list[IntoExpr] | None = None, kwargs: dict[Any, Any] | None = None, is_elementwise: bool = False, input_wildcard_expansion: bool = False, returns_scalar: bool = False, cast_to_supertypes: bool = False, pass_name_to_apply: bool = False, changes_length: bool = False, ) → Expr[source]

Register a plugin function.

Deprecated since version 0.20.16: Use polars.plugins.register_plugin_function() instead.

See the user guide for more information about plugins.

Parameters:

lib: Library to load.
symbol: Function to load.
args: Arguments (other than self) passed to this function. These arguments have to be of type Expression.
kwargs: Non-expression arguments. They must be JSON serializable.
is_elementwise: If the function only operates on scalars this will trigger fast paths.
input_wildcard_expansion: Expand expressions as input of this function.
returns_scalar: Automatically explode on unit length if it ran as final aggregation. this is the case for aggregations like sum, min, covariance etc.
cast_to_supertypes: Cast the input datatypes to their supertype.
pass_name_to_apply: if set, then the Series passed to the function in the group_by operation will ensure the name is set. This is an extra heap allocation per group.
changes_length: For example a unique or a slice

Warning

This method is deprecated. Use the new polars.plugins.register_plugin_function function instead.

This is highly unsafe as this will call the C function loaded by lib::symbol.

The parameters you set dictate how Polars will handle the function. Make sure they are correct!

reinterpret(*, signed: bool = True) → Expr[source]

Reinterpret the underlying bits as a signed/unsigned integer.

This operation is only allowed for 64bit integers. For lower bits integers, you can safely use that cast operation.

Parameters:

signed: If True, reinterpret as pl.Int64. Otherwise, reinterpret as pl.UInt64.

Examples

>>> s = pl.Series("a", [1, 1, 2], dtype=pl.UInt64)
>>> df = pl.DataFrame([s])
>>> df.select(
...     [
...         pl.col("a").reinterpret(signed=True).alias("reinterpreted"),
...         pl.col("a").alias("original"),
...     ]
... )
shape: (3, 2)
┌───────────────┬──────────┐
│ reinterpreted ┆ original │
│ ---           ┆ ---      │
│ i64           ┆ u64      │
╞═══════════════╪══════════╡
│ 1             ┆ 1        │
│ 1             ┆ 1        │
│ 2             ┆ 2        │
└───────────────┴──────────┘

repeat_by( by: Series | Expr | str | int, ) → Expr[source]

Repeat the elements in this Series as specified in the given expression.

The repeated elements are expanded into a List.

Parameters:

by: Numeric column that determines how often the values will be repeated. The column will be coerced to UInt32. Give this dtype to make the coercion a no-op.

Returns:

Expr: Expression of data type List, where the inner data type is equal to the original data type.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": ["x", "y", "z"],
...         "n": [1, 2, 3],
...     }
... )
>>> df.select(pl.col("a").repeat_by("n"))
shape: (3, 1)
┌─────────────────┐
│ a               │
│ ---             │
│ list[str]       │
╞═════════════════╡
│ ["x"]           │
│ ["y", "y"]      │
│ ["z", "z", "z"] │
└─────────────────┘

Replace the given values by different values of the same data type.

Parameters:

old: Value or sequence of values to replace. Accepts expression input. Sequences are parsed as Series, other non-expression inputs are parsed as literals. Also accepts a mapping of values to their replacement as syntactic sugar for replace(old=Series(mapping.keys()), new=Series(mapping.values())).
new: Value or sequence of values to replace by. Accepts expression input. Sequences are parsed as Series, other non-expression inputs are parsed as literals. Length must match the length of old or have length 1.
default: Set values that were not replaced to this value. Defaults to keeping the original value. Accepts expression input. Non-expression inputs are parsed as literals.

Deprecated since version 1.0.0: Use replace_strict() instead to set a default while replacing values.
return_dtype: The data type of the resulting expression. If set to None (default), the data type of the original column is preserved.

Deprecated since version 1.0.0: Use replace_strict() instead to set a return data type while replacing values, or explicitly call cast() on the output.

See also

replace_strict
str.replace

Notes

The global string cache must be enabled when replacing categorical values.

Examples

Replace a single value by another value. Values that were not replaced remain unchanged.

>>> df = pl.DataFrame({"a": [1, 2, 2, 3]})
>>> df.with_columns(replaced=pl.col("a").replace(2, 100))
shape: (4, 2)
┌─────┬──────────┐
│ a   ┆ replaced │
│ --- ┆ ---      │
│ i64 ┆ i64      │
╞═════╪══════════╡
│ 1   ┆ 1        │
│ 2   ┆ 100      │
│ 2   ┆ 100      │
│ 3   ┆ 3        │
└─────┴──────────┘

Replace multiple values by passing sequences to the old and new parameters.

>>> df.with_columns(replaced=pl.col("a").replace([2, 3], [100, 200]))
shape: (4, 2)
┌─────┬──────────┐
│ a   ┆ replaced │
│ --- ┆ ---      │
│ i64 ┆ i64      │
╞═════╪══════════╡
│ 1   ┆ 1        │
│ 2   ┆ 100      │
│ 2   ┆ 100      │
│ 3   ┆ 200      │
└─────┴──────────┘

Passing a mapping with replacements is also supported as syntactic sugar.

>>> mapping = {2: 100, 3: 200}
>>> df.with_columns(replaced=pl.col("a").replace(mapping))
shape: (4, 2)
┌─────┬──────────┐
│ a   ┆ replaced │
│ --- ┆ ---      │
│ i64 ┆ i64      │
╞═════╪══════════╡
│ 1   ┆ 1        │
│ 2   ┆ 100      │
│ 2   ┆ 100      │
│ 3   ┆ 200      │
└─────┴──────────┘

The original data type is preserved when replacing by values of a different data type. Use replace_strict() to replace and change the return data type.

>>> df = pl.DataFrame({"a": ["x", "y", "z"]})
>>> mapping = {"x": 1, "y": 2, "z": 3}
>>> df.with_columns(replaced=pl.col("a").replace(mapping))
shape: (3, 2)
┌─────┬──────────┐
│ a   ┆ replaced │
│ --- ┆ ---      │
│ str ┆ str      │
╞═════╪══════════╡
│ x   ┆ 1        │
│ y   ┆ 2        │
│ z   ┆ 3        │
└─────┴──────────┘

Expression input is supported.

>>> df = pl.DataFrame({"a": [1, 2, 2, 3], "b": [1.5, 2.5, 5.0, 1.0]})
>>> df.with_columns(
...     replaced=pl.col("a").replace(
...         old=pl.col("a").max(),
...         new=pl.col("b").sum(),
...     )
... )
shape: (4, 3)
┌─────┬─────┬──────────┐
│ a   ┆ b   ┆ replaced │
│ --- ┆ --- ┆ ---      │
│ i64 ┆ f64 ┆ i64      │
╞═════╪═════╪══════════╡
│ 1   ┆ 1.5 ┆ 1        │
│ 2   ┆ 2.5 ┆ 2        │
│ 2   ┆ 5.0 ┆ 2        │
│ 3   ┆ 1.0 ┆ 10       │
└─────┴─────┴──────────┘

Replace all values by different values.

Parameters:

old: Value or sequence of values to replace. Accepts expression input. Sequences are parsed as Series, other non-expression inputs are parsed as literals. Also accepts a mapping of values to their replacement as syntactic sugar for replace_strict(old=Series(mapping.keys()), new=Series(mapping.values())).
new: Value or sequence of values to replace by. Accepts expression input. Sequences are parsed as Series, other non-expression inputs are parsed as literals. Length must match the length of old or have length 1.
default: Set values that were not replaced to this value. If no default is specified, (default), an error is raised if any values were not replaced. Accepts expression input. Non-expression inputs are parsed as literals.
return_dtype: The data type of the resulting expression. If set to None (default), the data type is determined automatically based on the other inputs.

Raises:

InvalidOperationError: If any non-null values in the original column were not replaced, and no default was specified.

See also

replace
str.replace

Notes

The global string cache must be enabled when replacing categorical values.

Examples

Replace values by passing sequences to the old and new parameters.

>>> df = pl.DataFrame({"a": [1, 2, 2, 3]})
>>> df.with_columns(
...     replaced=pl.col("a").replace_strict([1, 2, 3], [100, 200, 300])
... )
shape: (4, 2)
┌─────┬──────────┐
│ a   ┆ replaced │
│ --- ┆ ---      │
│ i64 ┆ i64      │
╞═════╪══════════╡
│ 1   ┆ 100      │
│ 2   ┆ 200      │
│ 2   ┆ 200      │
│ 3   ┆ 300      │
└─────┴──────────┘

Passing a mapping with replacements is also supported as syntactic sugar.

>>> mapping = {1: 100, 2: 200, 3: 300}
>>> df.with_columns(replaced=pl.col("a").replace_strict(mapping))
shape: (4, 2)
┌─────┬──────────┐
│ a   ┆ replaced │
│ --- ┆ ---      │
│ i64 ┆ i64      │
╞═════╪══════════╡
│ 1   ┆ 100      │
│ 2   ┆ 200      │
│ 2   ┆ 200      │
│ 3   ┆ 300      │
└─────┴──────────┘

By default, an error is raised if any non-null values were not replaced. Specify a default to set all values that were not matched.

>>> mapping = {2: 200, 3: 300}
>>> df.with_columns(
...     replaced=pl.col("a").replace_strict(mapping)
... )  
Traceback (most recent call last):
...
polars.exceptions.InvalidOperationError: incomplete mapping specified for `replace_strict`
>>> df.with_columns(replaced=pl.col("a").replace_strict(mapping, default=-1))
shape: (4, 2)
┌─────┬──────────┐
│ a   ┆ replaced │
│ --- ┆ ---      │
│ i64 ┆ i64      │
╞═════╪══════════╡
│ 1   ┆ -1       │
│ 2   ┆ 200      │
│ 2   ┆ 200      │
│ 3   ┆ 300      │
└─────┴──────────┘

Replacing by values of a different data type sets the return type based on a combination of the new data type and the default data type.

>>> df = pl.DataFrame({"a": ["x", "y", "z"]})
>>> mapping = {"x": 1, "y": 2, "z": 3}
>>> df.with_columns(replaced=pl.col("a").replace_strict(mapping))
shape: (3, 2)
┌─────┬──────────┐
│ a   ┆ replaced │
│ --- ┆ ---      │
│ str ┆ i64      │
╞═════╪══════════╡
│ x   ┆ 1        │
│ y   ┆ 2        │
│ z   ┆ 3        │
└─────┴──────────┘
>>> df.with_columns(replaced=pl.col("a").replace_strict(mapping, default="x"))
shape: (3, 2)
┌─────┬──────────┐
│ a   ┆ replaced │
│ --- ┆ ---      │
│ str ┆ str      │
╞═════╪══════════╡
│ x   ┆ 1        │
│ y   ┆ 2        │
│ z   ┆ 3        │
└─────┴──────────┘

Set the return_dtype parameter to control the resulting data type directly.

>>> df.with_columns(
...     replaced=pl.col("a").replace_strict(mapping, return_dtype=pl.UInt8)
... )
shape: (3, 2)
┌─────┬──────────┐
│ a   ┆ replaced │
│ --- ┆ ---      │
│ str ┆ u8       │
╞═════╪══════════╡
│ x   ┆ 1        │
│ y   ┆ 2        │
│ z   ┆ 3        │
└─────┴──────────┘

Expression input is supported for all parameters.

>>> df = pl.DataFrame({"a": [1, 2, 2, 3], "b": [1.5, 2.5, 5.0, 1.0]})
>>> df.with_columns(
...     replaced=pl.col("a").replace_strict(
...         old=pl.col("a").max(),
...         new=pl.col("b").sum(),
...         default=pl.col("b"),
...     )
... )
shape: (4, 3)
┌─────┬─────┬──────────┐
│ a   ┆ b   ┆ replaced │
│ --- ┆ --- ┆ ---      │
│ i64 ┆ f64 ┆ f64      │
╞═════╪═════╪══════════╡
│ 1   ┆ 1.5 ┆ 1.5      │
│ 2   ┆ 2.5 ┆ 2.5      │
│ 2   ┆ 5.0 ┆ 5.0      │
│ 3   ┆ 1.0 ┆ 10.0     │
└─────┴─────┴──────────┘

reshape(dimensions: tuple[int, ...]) → Expr[source]

Reshape this Expr to a flat column or an Array column.

Parameters:

dimensions: Tuple of the dimension sizes. If -1 is used as the value for the first dimension, that dimension is inferred. Because the size of the Column may not be known in advance, it is only possible to use -1 for the first dimension.

Returns:

Expr: If a single dimension is given, results in an expression of the original data type. If a multiple dimensions are given, results in an expression of data type Array with shape dimensions.

See also

Expr.list.explode: Explode a list column.

Examples

>>> df = pl.DataFrame({"foo": [1, 2, 3, 4, 5, 6, 7, 8, 9]})
>>> square = df.select(pl.col("foo").reshape((3, 3)))
>>> square
shape: (3, 1)
┌───────────────┐
│ foo           │
│ ---           │
│ array[i64, 3] │
╞═══════════════╡
│ [1, 2, 3]     │
│ [4, 5, 6]     │
│ [7, 8, 9]     │
└───────────────┘
>>> square.select(pl.col("foo").reshape((9,)))
shape: (9, 1)
┌─────┐
│ foo │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
│ 4   │
│ 5   │
│ 6   │
│ 7   │
│ 8   │
│ 9   │
└─────┘

reverse() → Expr[source]

Reverse the selection.

Examples

>>> df = pl.DataFrame(
...     {
...         "A": [1, 2, 3, 4, 5],
...         "fruits": ["banana", "banana", "apple", "apple", "banana"],
...         "B": [5, 4, 3, 2, 1],
...         "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
...     }
... )
>>> df.select(
...     [
...         pl.all(),
...         pl.all().reverse().name.suffix("_reverse"),
...     ]
... )
shape: (5, 8)
┌─────┬────────┬─────┬────────┬───────────┬────────────────┬───────────┬──────────────┐
│ A   ┆ fruits ┆ B   ┆ cars   ┆ A_reverse ┆ fruits_reverse ┆ B_reverse ┆ cars_reverse │
│ --- ┆ ---    ┆ --- ┆ ---    ┆ ---       ┆ ---            ┆ ---       ┆ ---          │
│ i64 ┆ str    ┆ i64 ┆ str    ┆ i64       ┆ str            ┆ i64       ┆ str          │
╞═════╪════════╪═════╪════════╪═══════════╪════════════════╪═══════════╪══════════════╡
│ 1   ┆ banana ┆ 5   ┆ beetle ┆ 5         ┆ banana         ┆ 1         ┆ beetle       │
│ 2   ┆ banana ┆ 4   ┆ audi   ┆ 4         ┆ apple          ┆ 2         ┆ beetle       │
│ 3   ┆ apple  ┆ 3   ┆ beetle ┆ 3         ┆ apple          ┆ 3         ┆ beetle       │
│ 4   ┆ apple  ┆ 2   ┆ beetle ┆ 2         ┆ banana         ┆ 4         ┆ audi         │
│ 5   ┆ banana ┆ 1   ┆ beetle ┆ 1         ┆ banana         ┆ 5         ┆ beetle       │
└─────┴────────┴─────┴────────┴───────────┴────────────────┴───────────┴──────────────┘

rle() → Expr[source]

Compress the column data using run-length encoding.

Run-length encoding (RLE) encodes data by storing each run of identical values as a single value and its length.

Returns:

Expr: Expression of data type Struct with fields len of data type UInt32 and value of the original data type.

See also

rle_id

Examples

>>> df = pl.DataFrame({"a": [1, 1, 2, 1, None, 1, 3, 3]})
>>> df.select(pl.col("a").rle()).unnest("a")
shape: (6, 2)
┌─────┬───────┐
│ len ┆ value │
│ --- ┆ ---   │
│ u32 ┆ i64   │
╞═════╪═══════╡
│ 2   ┆ 1     │
│ 1   ┆ 2     │
│ 1   ┆ 1     │
│ 1   ┆ null  │
│ 1   ┆ 1     │
│ 2   ┆ 3     │
└─────┴───────┘

rle_id() → Expr[source]

Get a distinct integer ID for each run of identical values.

The ID starts at 0 and increases by one each time the value of the column changes.

Returns:

Expr: Expression of data type UInt32.

See also

rle

Notes

This functionality is especially useful for defining a new group for every time a column’s value changes, rather than for every distinct value of that column.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 2, 1, 1, 1],
...         "b": ["x", "x", None, "y", "y"],
...     }
... )
>>> df.with_columns(
...     rle_id_a=pl.col("a").rle_id(),
...     rle_id_ab=pl.struct("a", "b").rle_id(),
... )
shape: (5, 4)
┌─────┬──────┬──────────┬───────────┐
│ a   ┆ b    ┆ rle_id_a ┆ rle_id_ab │
│ --- ┆ ---  ┆ ---      ┆ ---       │
│ i64 ┆ str  ┆ u32      ┆ u32       │
╞═════╪══════╪══════════╪═══════════╡
│ 1   ┆ x    ┆ 0        ┆ 0         │
│ 2   ┆ x    ┆ 1        ┆ 1         │
│ 1   ┆ null ┆ 2        ┆ 2         │
│ 1   ┆ y    ┆ 2        ┆ 3         │
│ 1   ┆ y    ┆ 2        ┆ 3         │
└─────┴──────┴──────────┴───────────┘

rolling( index_column: str, *, period: str | timedelta, offset: str | timedelta | None = None, closed: ClosedInterval = 'right', ) → Expr[source]

Create rolling groups based on a temporal or integer column.

If you have a time series <t_0, t_1, ..., t_n>, then by default the windows created will be

(t_0 - period, t_0]

(t_1 - period, t_1]

…

(t_n - period, t_n]

whereas if you pass a non-default offset, then the windows will be

(t_0 + offset, t_0 + offset + period]

(t_1 + offset, t_1 + offset + period]

…

(t_n + offset, t_n + offset + period]

The period and offset arguments are created either from a timedelta, or by using the following string language:

1ns (1 nanosecond)
1us (1 microsecond)
1ms (1 millisecond)
1s (1 second)
1m (1 minute)
1h (1 hour)
1d (1 calendar day)
1w (1 calendar week)
1mo (1 calendar month)
1q (1 calendar quarter)
1y (1 calendar year)
1i (1 index count)

Or combine them: “3d12h4m25s” # 3 days, 12 hours, 4 minutes, and 25 seconds

By “calendar day”, we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for “calendar week”, “calendar month”, “calendar quarter”, and “calendar year”.

Parameters:

index_column: Column used to group based on the time window. Often of type Date/Datetime. This column must be sorted in ascending order. In case of a rolling group by on indices, dtype needs to be one of {UInt32, UInt64, Int32, Int64}. Note that the first three get temporarily cast to Int64, so if performance matters use an Int64 column.
period: Length of the window - must be non-negative.
offset: Offset of the window. Default is -period.
closed{‘right’, ‘left’, ‘both’, ‘none’}: Define which sides of the temporal interval are closed (inclusive).

Examples

>>> dates = [
...     "2020-01-01 13:45:48",
...     "2020-01-01 16:42:13",
...     "2020-01-01 16:45:09",
...     "2020-01-02 18:12:48",
...     "2020-01-03 19:45:32",
...     "2020-01-08 23:16:43",
... ]
>>> df = pl.DataFrame({"dt": dates, "a": [3, 7, 5, 9, 2, 1]}).with_columns(
...     pl.col("dt").str.strptime(pl.Datetime).set_sorted()
... )
>>> df.with_columns(
...     sum_a=pl.sum("a").rolling(index_column="dt", period="2d"),
...     min_a=pl.min("a").rolling(index_column="dt", period="2d"),
...     max_a=pl.max("a").rolling(index_column="dt", period="2d"),
... )
shape: (6, 5)
┌─────────────────────┬─────┬───────┬───────┬───────┐
│ dt                  ┆ a   ┆ sum_a ┆ min_a ┆ max_a │
│ ---                 ┆ --- ┆ ---   ┆ ---   ┆ ---   │
│ datetime[μs]        ┆ i64 ┆ i64   ┆ i64   ┆ i64   │
╞═════════════════════╪═════╪═══════╪═══════╪═══════╡
│ 2020-01-01 13:45:48 ┆ 3   ┆ 3     ┆ 3     ┆ 3     │
│ 2020-01-01 16:42:13 ┆ 7   ┆ 10    ┆ 3     ┆ 7     │
│ 2020-01-01 16:45:09 ┆ 5   ┆ 15    ┆ 3     ┆ 7     │
│ 2020-01-02 18:12:48 ┆ 9   ┆ 24    ┆ 3     ┆ 9     │
│ 2020-01-03 19:45:32 ┆ 2   ┆ 11    ┆ 2     ┆ 9     │
│ 2020-01-08 23:16:43 ┆ 1   ┆ 1     ┆ 1     ┆ 1     │
└─────────────────────┴─────┴───────┴───────┴───────┘

rolling_kurtosis( window_size: int, *, fisher: bool = True, bias: bool = True, min_samples: int | None = None, center: bool = False, ) → Expr[source]

Compute a rolling kurtosis.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

The window at a given row will include the row itself, and the window_size - 1 elements before it.

Parameters:

window_size: Integer size of the rolling window.
fisherbool, optional: If True, Fisher’s definition is used (normal ==> 0.0). If False, Pearson’s definition is used (normal ==> 3.0).
biasbool, optional: If False, the calculations are corrected for statistical bias.
min_samples: The number of values in the window that should be non-null before computing a result. If set to None (default), it will be set equal to window_size.
center: Set the labels at the center of the window.

See also

Expr.kurtosis

Examples

>>> df = pl.DataFrame({"a": [1, 4, 2, 9]})
>>> df.select(pl.col("a").rolling_kurtosis(3))
shape: (4, 1)
┌──────┐
│ a    │
│ ---  │
│ f64  │
╞══════╡
│ null │
│ null │
│ -1.5 │
│ -1.5 │
└──────┘

rolling_map( function: Callable[[Series], Any], window_size: int, weights: list[float] | None = None, *, min_samples: int | None = None, center: bool = False, ) → Expr[source]

Compute a custom rolling window function.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

Changed in version 1.21.0: The min_periods parameter was renamed min_samples.

Parameters:

function: Custom aggregation function.
window_size: The length of the window in number of elements.
weights: An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.
min_samples: The number of values in the window that should be non-null before computing a result. If set to None (default), it will be set equal to window_size.
center: Set the labels at the center of the window.

Warning

Computing custom functions is extremely slow. Use specialized rolling functions such as Expr.rolling_sum() if at all possible.

Examples

>>> from numpy import nansum
>>> df = pl.DataFrame({"a": [11.0, 2.0, 9.0, float("nan"), 8.0]})
>>> df.select(pl.col("a").rolling_map(nansum, window_size=3))
shape: (5, 1)
┌──────┐
│ a    │
│ ---  │
│ f64  │
╞══════╡
│ null │
│ null │
│ 22.0 │
│ 11.0 │
│ 17.0 │
└──────┘

rolling_max( window_size: int, weights: list[float] | None = None, *, min_samples: int | None = None, center: bool = False, ) → Expr[source]

Apply a rolling max (moving max) over the values in this array.

A window of length window_size will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by the weights vector. The resulting values will be aggregated to their max.

The window at a given row will include the row itself, and the window_size - 1 elements before it.

Changed in version 1.21.0: The min_periods parameter was renamed min_samples.

Parameters:

window_size: The length of the window in number of elements.
weights: An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.
min_samples: The number of values in the window that should be non-null before computing a result. If set to None (default), it will be set equal to window_size.
center: Set the labels at the center of the window.

Notes

If you want to compute multiple aggregation statistics over the same dynamic window, consider using rolling - this method can cache the window size computation.

Examples

>>> df = pl.DataFrame({"A": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]})
>>> df.with_columns(
...     rolling_max=pl.col("A").rolling_max(window_size=2),
... )
shape: (6, 2)
┌─────┬─────────────┐
│ A   ┆ rolling_max │
│ --- ┆ ---         │
│ f64 ┆ f64         │
╞═════╪═════════════╡
│ 1.0 ┆ null        │
│ 2.0 ┆ 2.0         │
│ 3.0 ┆ 3.0         │
│ 4.0 ┆ 4.0         │
│ 5.0 ┆ 5.0         │
│ 6.0 ┆ 6.0         │
└─────┴─────────────┘

Specify weights to multiply the values in the window with:

>>> df.with_columns(
...     rolling_max=pl.col("A").rolling_max(
...         window_size=2, weights=[0.25, 0.75]
...     ),
... )
shape: (6, 2)
┌─────┬─────────────┐
│ A   ┆ rolling_max │
│ --- ┆ ---         │
│ f64 ┆ f64         │
╞═════╪═════════════╡
│ 1.0 ┆ null        │
│ 2.0 ┆ 1.5         │
│ 3.0 ┆ 2.25        │
│ 4.0 ┆ 3.0         │
│ 5.0 ┆ 3.75        │
│ 6.0 ┆ 4.5         │
└─────┴─────────────┘

Center the values in the window

>>> df.with_columns(
...     rolling_max=pl.col("A").rolling_max(window_size=3, center=True),
... )
shape: (6, 2)
┌─────┬─────────────┐
│ A   ┆ rolling_max │
│ --- ┆ ---         │
│ f64 ┆ f64         │
╞═════╪═════════════╡
│ 1.0 ┆ null        │
│ 2.0 ┆ 3.0         │
│ 3.0 ┆ 4.0         │
│ 4.0 ┆ 5.0         │
│ 5.0 ┆ 6.0         │
│ 6.0 ┆ null        │
└─────┴─────────────┘

rolling_max_by( by: IntoExpr, window_size: timedelta | str, *, min_samples: int = 1, closed: ClosedInterval = 'right', ) → Expr[source]

Apply a rolling max based on another column.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

Given a by column <t_0, t_1, ..., t_n>, then closed="right" (the default) means the windows will be:

(t_0 - window_size, t_0]

(t_1 - window_size, t_1]

…

(t_n - window_size, t_n]

Changed in version 1.21.0: The min_periods parameter was renamed min_samples.

Parameters:

by

Should be DateTime, Date, UInt64, UInt32, Int64, or Int32 data type (note that the integral ones require using 'i' in window size).

window_size

The length of the window. Can be a dynamic temporal size indicated by a timedelta or the following string language:

1ns (1 nanosecond)
1us (1 microsecond)
1ms (1 millisecond)
1s (1 second)
1m (1 minute)
1h (1 hour)
1d (1 calendar day)
1w (1 calendar week)
1mo (1 calendar month)
1q (1 calendar quarter)
1y (1 calendar year)
1i (1 index count)

By “calendar day”, we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for “calendar week”, “calendar month”, “calendar quarter”, and “calendar year”.

min_samples

The number of values in the window that should be non-null before computing a result.

closed{‘left’, ‘right’, ‘both’, ‘none’}

Define which sides of the temporal interval are closed (inclusive), defaults to 'right'.

Notes

If you want to compute multiple aggregation statistics over the same dynamic window, consider using rolling - this method can cache the window size computation.

Examples

Create a DataFrame with a datetime column and a row number column

>>> from datetime import timedelta, datetime
>>> start = datetime(2001, 1, 1)
>>> stop = datetime(2001, 1, 2)
>>> df_temporal = pl.DataFrame(
...     {"date": pl.datetime_range(start, stop, "1h", eager=True)}
... ).with_row_index()
>>> df_temporal
shape: (25, 2)
┌───────┬─────────────────────┐
│ index ┆ date                │
│ ---   ┆ ---                 │
│ u32   ┆ datetime[μs]        │
╞═══════╪═════════════════════╡
│ 0     ┆ 2001-01-01 00:00:00 │
│ 1     ┆ 2001-01-01 01:00:00 │
│ 2     ┆ 2001-01-01 02:00:00 │
│ 3     ┆ 2001-01-01 03:00:00 │
│ 4     ┆ 2001-01-01 04:00:00 │
│ …     ┆ …                   │
│ 20    ┆ 2001-01-01 20:00:00 │
│ 21    ┆ 2001-01-01 21:00:00 │
│ 22    ┆ 2001-01-01 22:00:00 │
│ 23    ┆ 2001-01-01 23:00:00 │
│ 24    ┆ 2001-01-02 00:00:00 │
└───────┴─────────────────────┘

Compute the rolling max with the temporal windows closed on the right (default)

>>> df_temporal.with_columns(
...     rolling_row_max=pl.col("index").rolling_max_by("date", window_size="2h")
... )
shape: (25, 3)
┌───────┬─────────────────────┬─────────────────┐
│ index ┆ date                ┆ rolling_row_max │
│ ---   ┆ ---                 ┆ ---             │
│ u32   ┆ datetime[μs]        ┆ u32             │
╞═══════╪═════════════════════╪═════════════════╡
│ 0     ┆ 2001-01-01 00:00:00 ┆ 0               │
│ 1     ┆ 2001-01-01 01:00:00 ┆ 1               │
│ 2     ┆ 2001-01-01 02:00:00 ┆ 2               │
│ 3     ┆ 2001-01-01 03:00:00 ┆ 3               │
│ 4     ┆ 2001-01-01 04:00:00 ┆ 4               │
│ …     ┆ …                   ┆ …               │
│ 20    ┆ 2001-01-01 20:00:00 ┆ 20              │
│ 21    ┆ 2001-01-01 21:00:00 ┆ 21              │
│ 22    ┆ 2001-01-01 22:00:00 ┆ 22              │
│ 23    ┆ 2001-01-01 23:00:00 ┆ 23              │
│ 24    ┆ 2001-01-02 00:00:00 ┆ 24              │
└───────┴─────────────────────┴─────────────────┘

Compute the rolling max with the closure of windows on both sides

>>> df_temporal.with_columns(
...     rolling_row_max=pl.col("index").rolling_max_by(
...         "date", window_size="2h", closed="both"
...     )
... )
shape: (25, 3)
┌───────┬─────────────────────┬─────────────────┐
│ index ┆ date                ┆ rolling_row_max │
│ ---   ┆ ---                 ┆ ---             │
│ u32   ┆ datetime[μs]        ┆ u32             │
╞═══════╪═════════════════════╪═════════════════╡
│ 0     ┆ 2001-01-01 00:00:00 ┆ 0               │
│ 1     ┆ 2001-01-01 01:00:00 ┆ 1               │
│ 2     ┆ 2001-01-01 02:00:00 ┆ 2               │
│ 3     ┆ 2001-01-01 03:00:00 ┆ 3               │
│ 4     ┆ 2001-01-01 04:00:00 ┆ 4               │
│ …     ┆ …                   ┆ …               │
│ 20    ┆ 2001-01-01 20:00:00 ┆ 20              │
│ 21    ┆ 2001-01-01 21:00:00 ┆ 21              │
│ 22    ┆ 2001-01-01 22:00:00 ┆ 22              │
│ 23    ┆ 2001-01-01 23:00:00 ┆ 23              │
│ 24    ┆ 2001-01-02 00:00:00 ┆ 24              │
└───────┴─────────────────────┴─────────────────┘

rolling_mean( window_size: int, weights: list[float] | None = None, *, min_samples: int | None = None, center: bool = False, ) → Expr[source]

Apply a rolling mean (moving mean) over the values in this array.

A window of length window_size will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by the weights vector. The resulting values will be aggregated to their mean. Weights are normalized to sum to 1.

The window at a given row will include the row itself, and the window_size - 1 elements before it.

Changed in version 1.21.0: The min_periods parameter was renamed min_samples.

Parameters:

window_size: The length of the window in number of elements.
weights: An optional slice with the same length as the window that will be multiplied elementwise with the values in the window, after being normalized to sum to 1.
min_samples: The number of values in the window that should be non-null before computing a result. If set to None (default), it will be set equal to window_size.
center: Set the labels at the center of the window.

Notes

If you want to compute multiple aggregation statistics over the same dynamic window, consider using rolling - this method can cache the window size computation.

Examples

>>> df = pl.DataFrame({"A": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]})
>>> df.with_columns(
...     rolling_mean=pl.col("A").rolling_mean(window_size=2),
... )
shape: (6, 2)
┌─────┬──────────────┐
│ A   ┆ rolling_mean │
│ --- ┆ ---          │
│ f64 ┆ f64          │
╞═════╪══════════════╡
│ 1.0 ┆ null         │
│ 2.0 ┆ 1.5          │
│ 3.0 ┆ 2.5          │
│ 4.0 ┆ 3.5          │
│ 5.0 ┆ 4.5          │
│ 6.0 ┆ 5.5          │
└─────┴──────────────┘

Specify weights to multiply the values in the window with:

>>> df.with_columns(
...     rolling_mean=pl.col("A").rolling_mean(
...         window_size=2, weights=[0.25, 0.75]
...     ),
... )
shape: (6, 2)
┌─────┬──────────────┐
│ A   ┆ rolling_mean │
│ --- ┆ ---          │
│ f64 ┆ f64          │
╞═════╪══════════════╡
│ 1.0 ┆ null         │
│ 2.0 ┆ 1.75         │
│ 3.0 ┆ 2.75         │
│ 4.0 ┆ 3.75         │
│ 5.0 ┆ 4.75         │
│ 6.0 ┆ 5.75         │
└─────┴──────────────┘

Center the values in the window

>>> df.with_columns(
...     rolling_mean=pl.col("A").rolling_mean(window_size=3, center=True),
... )
shape: (6, 2)
┌─────┬──────────────┐
│ A   ┆ rolling_mean │
│ --- ┆ ---          │
│ f64 ┆ f64          │
╞═════╪══════════════╡
│ 1.0 ┆ null         │
│ 2.0 ┆ 2.0          │
│ 3.0 ┆ 3.0          │
│ 4.0 ┆ 4.0          │
│ 5.0 ┆ 5.0          │
│ 6.0 ┆ null         │
└─────┴──────────────┘

rolling_mean_by( by: IntoExpr, window_size: timedelta | str, *, min_samples: int = 1, closed: ClosedInterval = 'right', ) → Expr[source]

Apply a rolling mean based on another column.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

Given a by column <t_0, t_1, ..., t_n>, then closed="right" (the default) means the windows will be:

(t_0 - window_size, t_0]

(t_1 - window_size, t_1]

…

(t_n - window_size, t_n]

Changed in version 1.21.0: The min_periods parameter was renamed min_samples.

Parameters:

by

Should be DateTime, Date, UInt64, UInt32, Int64, or Int32 data type (note that the integral ones require using 'i' in window size).

window_size

The length of the window. Can be a dynamic temporal size indicated by a timedelta or the following string language:

1ns (1 nanosecond)
1us (1 microsecond)
1ms (1 millisecond)
1s (1 second)
1m (1 minute)
1h (1 hour)
1d (1 calendar day)
1w (1 calendar week)
1mo (1 calendar month)
1q (1 calendar quarter)
1y (1 calendar year)
1i (1 index count)

By “calendar day”, we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for “calendar week”, “calendar month”, “calendar quarter”, and “calendar year”.

min_samples

The number of values in the window that should be non-null before computing a result.

closed{‘left’, ‘right’, ‘both’, ‘none’}

Define which sides of the temporal interval are closed (inclusive), defaults to 'right'.

Notes

If you want to compute multiple aggregation statistics over the same dynamic window, consider using rolling - this method can cache the window size computation.

Examples

Create a DataFrame with a datetime column and a row number column

>>> from datetime import timedelta, datetime
>>> start = datetime(2001, 1, 1)
>>> stop = datetime(2001, 1, 2)
>>> df_temporal = pl.DataFrame(
...     {"date": pl.datetime_range(start, stop, "1h", eager=True)}
... ).with_row_index()
>>> df_temporal
shape: (25, 2)
┌───────┬─────────────────────┐
│ index ┆ date                │
│ ---   ┆ ---                 │
│ u32   ┆ datetime[μs]        │
╞═══════╪═════════════════════╡
│ 0     ┆ 2001-01-01 00:00:00 │
│ 1     ┆ 2001-01-01 01:00:00 │
│ 2     ┆ 2001-01-01 02:00:00 │
│ 3     ┆ 2001-01-01 03:00:00 │
│ 4     ┆ 2001-01-01 04:00:00 │
│ …     ┆ …                   │
│ 20    ┆ 2001-01-01 20:00:00 │
│ 21    ┆ 2001-01-01 21:00:00 │
│ 22    ┆ 2001-01-01 22:00:00 │
│ 23    ┆ 2001-01-01 23:00:00 │
│ 24    ┆ 2001-01-02 00:00:00 │
└───────┴─────────────────────┘

Compute the rolling mean with the temporal windows closed on the right (default)

>>> df_temporal.with_columns(
...     rolling_row_mean=pl.col("index").rolling_mean_by(
...         "date", window_size="2h"
...     )
... )
shape: (25, 3)
┌───────┬─────────────────────┬──────────────────┐
│ index ┆ date                ┆ rolling_row_mean │
│ ---   ┆ ---                 ┆ ---              │
│ u32   ┆ datetime[μs]        ┆ f64              │
╞═══════╪═════════════════════╪══════════════════╡
│ 0     ┆ 2001-01-01 00:00:00 ┆ 0.0              │
│ 1     ┆ 2001-01-01 01:00:00 ┆ 0.5              │
│ 2     ┆ 2001-01-01 02:00:00 ┆ 1.5              │
│ 3     ┆ 2001-01-01 03:00:00 ┆ 2.5              │
│ 4     ┆ 2001-01-01 04:00:00 ┆ 3.5              │
│ …     ┆ …                   ┆ …                │
│ 20    ┆ 2001-01-01 20:00:00 ┆ 19.5             │
│ 21    ┆ 2001-01-01 21:00:00 ┆ 20.5             │
│ 22    ┆ 2001-01-01 22:00:00 ┆ 21.5             │
│ 23    ┆ 2001-01-01 23:00:00 ┆ 22.5             │
│ 24    ┆ 2001-01-02 00:00:00 ┆ 23.5             │
└───────┴─────────────────────┴──────────────────┘

Compute the rolling mean with the closure of windows on both sides

>>> df_temporal.with_columns(
...     rolling_row_mean=pl.col("index").rolling_mean_by(
...         "date", window_size="2h", closed="both"
...     )
... )
shape: (25, 3)
┌───────┬─────────────────────┬──────────────────┐
│ index ┆ date                ┆ rolling_row_mean │
│ ---   ┆ ---                 ┆ ---              │
│ u32   ┆ datetime[μs]        ┆ f64              │
╞═══════╪═════════════════════╪══════════════════╡
│ 0     ┆ 2001-01-01 00:00:00 ┆ 0.0              │
│ 1     ┆ 2001-01-01 01:00:00 ┆ 0.5              │
│ 2     ┆ 2001-01-01 02:00:00 ┆ 1.0              │
│ 3     ┆ 2001-01-01 03:00:00 ┆ 2.0              │
│ 4     ┆ 2001-01-01 04:00:00 ┆ 3.0              │
│ …     ┆ …                   ┆ …                │
│ 20    ┆ 2001-01-01 20:00:00 ┆ 19.0             │
│ 21    ┆ 2001-01-01 21:00:00 ┆ 20.0             │
│ 22    ┆ 2001-01-01 22:00:00 ┆ 21.0             │
│ 23    ┆ 2001-01-01 23:00:00 ┆ 22.0             │
│ 24    ┆ 2001-01-02 00:00:00 ┆ 23.0             │
└───────┴─────────────────────┴──────────────────┘

rolling_median( window_size: int, weights: list[float] | None = None, *, min_samples: int | None = None, center: bool = False, ) → Expr[source]

Compute a rolling median.

A window of length window_size will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by the weights vector. The resulting values will be aggregated to their median.

The window at a given row will include the row itself, and the window_size - 1 elements before it.

Changed in version 1.21.0: The min_periods parameter was renamed min_samples.

Parameters:

window_size: The length of the window in number of elements.
weights: An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.
min_samples: The number of values in the window that should be non-null before computing a result. If set to None (default), it will be set equal to window_size.
center: Set the labels at the center of the window.

Notes

If you want to compute multiple aggregation statistics over the same dynamic window, consider using rolling - this method can cache the window size computation.

Examples

>>> df = pl.DataFrame({"A": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]})
>>> df.with_columns(
...     rolling_median=pl.col("A").rolling_median(window_size=2),
... )
shape: (6, 2)
┌─────┬────────────────┐
│ A   ┆ rolling_median │
│ --- ┆ ---            │
│ f64 ┆ f64            │
╞═════╪════════════════╡
│ 1.0 ┆ null           │
│ 2.0 ┆ 1.5            │
│ 3.0 ┆ 2.5            │
│ 4.0 ┆ 3.5            │
│ 5.0 ┆ 4.5            │
│ 6.0 ┆ 5.5            │
└─────┴────────────────┘

Specify weights for the values in each window:

>>> df.with_columns(
...     rolling_median=pl.col("A").rolling_median(
...         window_size=2, weights=[0.25, 0.75]
...     ),
... )
shape: (6, 2)
┌─────┬────────────────┐
│ A   ┆ rolling_median │
│ --- ┆ ---            │
│ f64 ┆ f64            │
╞═════╪════════════════╡
│ 1.0 ┆ null           │
│ 2.0 ┆ 1.5            │
│ 3.0 ┆ 2.5            │
│ 4.0 ┆ 3.5            │
│ 5.0 ┆ 4.5            │
│ 6.0 ┆ 5.5            │
└─────┴────────────────┘

Center the values in the window

>>> df.with_columns(
...     rolling_median=pl.col("A").rolling_median(window_size=3, center=True),
... )
shape: (6, 2)
┌─────┬────────────────┐
│ A   ┆ rolling_median │
│ --- ┆ ---            │
│ f64 ┆ f64            │
╞═════╪════════════════╡
│ 1.0 ┆ null           │
│ 2.0 ┆ 2.0            │
│ 3.0 ┆ 3.0            │
│ 4.0 ┆ 4.0            │
│ 5.0 ┆ 5.0            │
│ 6.0 ┆ null           │
└─────┴────────────────┘

rolling_median_by( by: IntoExpr, window_size: timedelta | str, *, min_samples: int = 1, closed: ClosedInterval = 'right', ) → Expr[source]

Compute a rolling median based on another column.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

Given a by column <t_0, t_1, ..., t_n>, then closed="right" (the default) means the windows will be:

(t_0 - window_size, t_0]

(t_1 - window_size, t_1]

…

(t_n - window_size, t_n]

Changed in version 1.21.0: The min_periods parameter was renamed min_samples.

Parameters:

by

Should be DateTime, Date, UInt64, UInt32, Int64, or Int32 data type (note that the integral ones require using 'i' in window size).

window_size

The length of the window. Can be a dynamic temporal size indicated by a timedelta or the following string language:

1ns (1 nanosecond)
1us (1 microsecond)
1ms (1 millisecond)
1s (1 second)
1m (1 minute)
1h (1 hour)
1d (1 calendar day)
1w (1 calendar week)
1mo (1 calendar month)
1q (1 calendar quarter)
1y (1 calendar year)
1i (1 index count)

By “calendar day”, we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for “calendar week”, “calendar month”, “calendar quarter”, and “calendar year”.

min_samples

The number of values in the window that should be non-null before computing a result.

closed{‘left’, ‘right’, ‘both’, ‘none’}

Define which sides of the temporal interval are closed (inclusive), defaults to 'right'.

Notes

If you want to compute multiple aggregation statistics over the same dynamic window, consider using rolling - this method can cache the window size computation.

Examples

Create a DataFrame with a datetime column and a row number column

>>> from datetime import timedelta, datetime
>>> start = datetime(2001, 1, 1)
>>> stop = datetime(2001, 1, 2)
>>> df_temporal = pl.DataFrame(
...     {"date": pl.datetime_range(start, stop, "1h", eager=True)}
... ).with_row_index()
>>> df_temporal
shape: (25, 2)
┌───────┬─────────────────────┐
│ index ┆ date                │
│ ---   ┆ ---                 │
│ u32   ┆ datetime[μs]        │
╞═══════╪═════════════════════╡
│ 0     ┆ 2001-01-01 00:00:00 │
│ 1     ┆ 2001-01-01 01:00:00 │
│ 2     ┆ 2001-01-01 02:00:00 │
│ 3     ┆ 2001-01-01 03:00:00 │
│ 4     ┆ 2001-01-01 04:00:00 │
│ …     ┆ …                   │
│ 20    ┆ 2001-01-01 20:00:00 │
│ 21    ┆ 2001-01-01 21:00:00 │
│ 22    ┆ 2001-01-01 22:00:00 │
│ 23    ┆ 2001-01-01 23:00:00 │
│ 24    ┆ 2001-01-02 00:00:00 │
└───────┴─────────────────────┘

Compute the rolling median with the temporal windows closed on the right:

>>> df_temporal.with_columns(
...     rolling_row_median=pl.col("index").rolling_median_by(
...         "date", window_size="2h"
...     )
... )
shape: (25, 3)
┌───────┬─────────────────────┬────────────────────┐
│ index ┆ date                ┆ rolling_row_median │
│ ---   ┆ ---                 ┆ ---                │
│ u32   ┆ datetime[μs]        ┆ f64                │
╞═══════╪═════════════════════╪════════════════════╡
│ 0     ┆ 2001-01-01 00:00:00 ┆ 0.0                │
│ 1     ┆ 2001-01-01 01:00:00 ┆ 0.5                │
│ 2     ┆ 2001-01-01 02:00:00 ┆ 1.5                │
│ 3     ┆ 2001-01-01 03:00:00 ┆ 2.5                │
│ 4     ┆ 2001-01-01 04:00:00 ┆ 3.5                │
│ …     ┆ …                   ┆ …                  │
│ 20    ┆ 2001-01-01 20:00:00 ┆ 19.5               │
│ 21    ┆ 2001-01-01 21:00:00 ┆ 20.5               │
│ 22    ┆ 2001-01-01 22:00:00 ┆ 21.5               │
│ 23    ┆ 2001-01-01 23:00:00 ┆ 22.5               │
│ 24    ┆ 2001-01-02 00:00:00 ┆ 23.5               │
└───────┴─────────────────────┴────────────────────┘

rolling_min( window_size: int, weights: list[float] | None = None, *, min_samples: int | None = None, center: bool = False, ) → Expr[source]

Apply a rolling min (moving min) over the values in this array.

A window of length window_size will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by the weights vector. The resulting values will be aggregated to their min.

The window at a given row will include the row itself, and the window_size - 1 elements before it.

Changed in version 1.21.0: The min_periods parameter was renamed min_samples.

Parameters:

window_size: The length of the window in number of elements.
weights: An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.
min_samples: The number of values in the window that should be non-null before computing a result. If set to None (default), it will be set equal to window_size.
center: Set the labels at the center of the window.

Notes

If you want to compute multiple aggregation statistics over the same dynamic window, consider using rolling - this method can cache the window size computation.

Examples

>>> df = pl.DataFrame({"A": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]})
>>> df.with_columns(
...     rolling_min=pl.col("A").rolling_min(window_size=2),
... )
shape: (6, 2)
┌─────┬─────────────┐
│ A   ┆ rolling_min │
│ --- ┆ ---         │
│ f64 ┆ f64         │
╞═════╪═════════════╡
│ 1.0 ┆ null        │
│ 2.0 ┆ 1.0         │
│ 3.0 ┆ 2.0         │
│ 4.0 ┆ 3.0         │
│ 5.0 ┆ 4.0         │
│ 6.0 ┆ 5.0         │
└─────┴─────────────┘

Specify weights to multiply the values in the window with:

>>> df.with_columns(
...     rolling_min=pl.col("A").rolling_min(
...         window_size=2, weights=[0.25, 0.75]
...     ),
... )
shape: (6, 2)
┌─────┬─────────────┐
│ A   ┆ rolling_min │
│ --- ┆ ---         │
│ f64 ┆ f64         │
╞═════╪═════════════╡
│ 1.0 ┆ null        │
│ 2.0 ┆ 0.25        │
│ 3.0 ┆ 0.5         │
│ 4.0 ┆ 0.75        │
│ 5.0 ┆ 1.0         │
│ 6.0 ┆ 1.25        │
└─────┴─────────────┘

Center the values in the window

>>> df.with_columns(
...     rolling_min=pl.col("A").rolling_min(window_size=3, center=True),
... )
shape: (6, 2)
┌─────┬─────────────┐
│ A   ┆ rolling_min │
│ --- ┆ ---         │
│ f64 ┆ f64         │
╞═════╪═════════════╡
│ 1.0 ┆ null        │
│ 2.0 ┆ 1.0         │
│ 3.0 ┆ 2.0         │
│ 4.0 ┆ 3.0         │
│ 5.0 ┆ 4.0         │
│ 6.0 ┆ null        │
└─────┴─────────────┘

rolling_min_by( by: IntoExpr, window_size: timedelta | str, *, min_samples: int = 1, closed: ClosedInterval = 'right', ) → Expr[source]

Apply a rolling min based on another column.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

Given a by column <t_0, t_1, ..., t_n>, then closed="right" (the default) means the windows will be:

(t_0 - window_size, t_0]

(t_1 - window_size, t_1]

…

(t_n - window_size, t_n]

Changed in version 1.21.0: The min_periods parameter was renamed min_samples.

Parameters:

by

Should be DateTime, Date, UInt64, UInt32, Int64, or Int32 data type (note that the integral ones require using 'i' in window size).

window_size

The length of the window. Can be a dynamic temporal size indicated by a timedelta or the following string language:

1ns (1 nanosecond)
1us (1 microsecond)
1ms (1 millisecond)
1s (1 second)
1m (1 minute)
1h (1 hour)
1d (1 calendar day)
1w (1 calendar week)
1mo (1 calendar month)
1q (1 calendar quarter)
1y (1 calendar year)
1i (1 index count)

By “calendar day”, we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for “calendar week”, “calendar month”, “calendar quarter”, and “calendar year”.

min_samples

The number of values in the window that should be non-null before computing a result.

closed{‘left’, ‘right’, ‘both’, ‘none’}

Define which sides of the temporal interval are closed (inclusive), defaults to 'right'.

Notes

If you want to compute multiple aggregation statistics over the same dynamic window, consider using rolling - this method can cache the window size computation.

Examples

Create a DataFrame with a datetime column and a row number column

>>> from datetime import timedelta, datetime
>>> start = datetime(2001, 1, 1)
>>> stop = datetime(2001, 1, 2)
>>> df_temporal = pl.DataFrame(
...     {"date": pl.datetime_range(start, stop, "1h", eager=True)}
... ).with_row_index()
>>> df_temporal
shape: (25, 2)
┌───────┬─────────────────────┐
│ index ┆ date                │
│ ---   ┆ ---                 │
│ u32   ┆ datetime[μs]        │
╞═══════╪═════════════════════╡
│ 0     ┆ 2001-01-01 00:00:00 │
│ 1     ┆ 2001-01-01 01:00:00 │
│ 2     ┆ 2001-01-01 02:00:00 │
│ 3     ┆ 2001-01-01 03:00:00 │
│ 4     ┆ 2001-01-01 04:00:00 │
│ …     ┆ …                   │
│ 20    ┆ 2001-01-01 20:00:00 │
│ 21    ┆ 2001-01-01 21:00:00 │
│ 22    ┆ 2001-01-01 22:00:00 │
│ 23    ┆ 2001-01-01 23:00:00 │
│ 24    ┆ 2001-01-02 00:00:00 │
└───────┴─────────────────────┘

Compute the rolling min with the temporal windows closed on the right (default)

>>> df_temporal.with_columns(
...     rolling_row_min=pl.col("index").rolling_min_by("date", window_size="2h")
... )
shape: (25, 3)
┌───────┬─────────────────────┬─────────────────┐
│ index ┆ date                ┆ rolling_row_min │
│ ---   ┆ ---                 ┆ ---             │
│ u32   ┆ datetime[μs]        ┆ u32             │
╞═══════╪═════════════════════╪═════════════════╡
│ 0     ┆ 2001-01-01 00:00:00 ┆ 0               │
│ 1     ┆ 2001-01-01 01:00:00 ┆ 0               │
│ 2     ┆ 2001-01-01 02:00:00 ┆ 1               │
│ 3     ┆ 2001-01-01 03:00:00 ┆ 2               │
│ 4     ┆ 2001-01-01 04:00:00 ┆ 3               │
│ …     ┆ …                   ┆ …               │
│ 20    ┆ 2001-01-01 20:00:00 ┆ 19              │
│ 21    ┆ 2001-01-01 21:00:00 ┆ 20              │
│ 22    ┆ 2001-01-01 22:00:00 ┆ 21              │
│ 23    ┆ 2001-01-01 23:00:00 ┆ 22              │
│ 24    ┆ 2001-01-02 00:00:00 ┆ 23              │
└───────┴─────────────────────┴─────────────────┘

rolling_quantile( quantile: float, interpolation: QuantileMethod = 'nearest', window_size: int = 2, weights: list[float] | None = None, *, min_samples: int | None = None, center: bool = False, ) → Expr[source]

Compute a rolling quantile.

A window of length window_size will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by the weights vector. The resulting values will be aggregated to their quantile.

The window at a given row will include the row itself, and the window_size - 1 elements before it.

Changed in version 1.21.0: The min_periods parameter was renamed min_samples.

Parameters:

quantile: Quantile between 0.0 and 1.0.
interpolation{‘nearest’, ‘higher’, ‘lower’, ‘midpoint’, ‘linear’, ‘equiprobable’}: Interpolation method.
window_size: The length of the window in number of elements.
weights: An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.
min_samples: The number of values in the window that should be non-null before computing a result. If set to None (default), it will be set equal to window_size.
center: Set the labels at the center of the window.

Notes

If you want to compute multiple aggregation statistics over the same dynamic window, consider using rolling - this method can cache the window size computation.

Examples

>>> df = pl.DataFrame({"A": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]})
>>> df.with_columns(
...     rolling_quantile=pl.col("A").rolling_quantile(
...         quantile=0.25, window_size=4
...     ),
... )
shape: (6, 2)
┌─────┬──────────────────┐
│ A   ┆ rolling_quantile │
│ --- ┆ ---              │
│ f64 ┆ f64              │
╞═════╪══════════════════╡
│ 1.0 ┆ null             │
│ 2.0 ┆ null             │
│ 3.0 ┆ null             │
│ 4.0 ┆ 2.0              │
│ 5.0 ┆ 3.0              │
│ 6.0 ┆ 4.0              │
└─────┴──────────────────┘

Specify weights for the values in each window:

>>> df.with_columns(
...     rolling_quantile=pl.col("A").rolling_quantile(
...         quantile=0.25, window_size=4, weights=[0.2, 0.4, 0.4, 0.2]
...     ),
... )
shape: (6, 2)
┌─────┬──────────────────┐
│ A   ┆ rolling_quantile │
│ --- ┆ ---              │
│ f64 ┆ f64              │
╞═════╪══════════════════╡
│ 1.0 ┆ null             │
│ 2.0 ┆ null             │
│ 3.0 ┆ null             │
│ 4.0 ┆ 2.0              │
│ 5.0 ┆ 3.0              │
│ 6.0 ┆ 4.0              │
└─────┴──────────────────┘

Specify weights and interpolation method

>>> df.with_columns(
...     rolling_quantile=pl.col("A").rolling_quantile(
...         quantile=0.25,
...         window_size=4,
...         weights=[0.2, 0.4, 0.4, 0.2],
...         interpolation="linear",
...     ),
... )
shape: (6, 2)
┌─────┬──────────────────┐
│ A   ┆ rolling_quantile │
│ --- ┆ ---              │
│ f64 ┆ f64              │
╞═════╪══════════════════╡
│ 1.0 ┆ null             │
│ 2.0 ┆ null             │
│ 3.0 ┆ null             │
│ 4.0 ┆ 1.625            │
│ 5.0 ┆ 2.625            │
│ 6.0 ┆ 3.625            │
└─────┴──────────────────┘

Center the values in the window

>>> df.with_columns(
...     rolling_quantile=pl.col("A").rolling_quantile(
...         quantile=0.2, window_size=5, center=True
...     ),
... )
shape: (6, 2)
┌─────┬──────────────────┐
│ A   ┆ rolling_quantile │
│ --- ┆ ---              │
│ f64 ┆ f64              │
╞═════╪══════════════════╡
│ 1.0 ┆ null             │
│ 2.0 ┆ null             │
│ 3.0 ┆ 2.0              │
│ 4.0 ┆ 3.0              │
│ 5.0 ┆ null             │
│ 6.0 ┆ null             │
└─────┴──────────────────┘

rolling_quantile_by( by: IntoExpr, window_size: timedelta | str, *, quantile: float, interpolation: QuantileMethod = 'nearest', min_samples: int = 1, closed: ClosedInterval = 'right', ) → Expr[source]

Compute a rolling quantile based on another column.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

Given a by column <t_0, t_1, ..., t_n>, then closed="right" (the default) means the windows will be:

(t_0 - window_size, t_0]

(t_1 - window_size, t_1]

…

(t_n - window_size, t_n]

Changed in version 1.21.0: The min_periods parameter was renamed min_samples.

Parameters:

by

Should be DateTime, Date, UInt64, UInt32, Int64, or Int32 data type (note that the integral ones require using 'i' in window size).

quantile

Quantile between 0.0 and 1.0.

interpolation{‘nearest’, ‘higher’, ‘lower’, ‘midpoint’, ‘linear’, ‘equiprobable’}

Interpolation method.

window_size

The length of the window. Can be a dynamic temporal size indicated by a timedelta or the following string language:

1ns (1 nanosecond)
1us (1 microsecond)
1ms (1 millisecond)
1s (1 second)
1m (1 minute)
1h (1 hour)
1d (1 calendar day)
1w (1 calendar week)
1mo (1 calendar month)
1q (1 calendar quarter)
1y (1 calendar year)
1i (1 index count)

By “calendar day”, we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for “calendar week”, “calendar month”, “calendar quarter”, and “calendar year”.

min_samples

The number of values in the window that should be non-null before computing a result.

closed{‘left’, ‘right’, ‘both’, ‘none’}

Define which sides of the temporal interval are closed (inclusive), defaults to 'right'.

Notes

If you want to compute multiple aggregation statistics over the same dynamic window, consider using rolling - this method can cache the window size computation.

Examples

Create a DataFrame with a datetime column and a row number column

>>> from datetime import timedelta, datetime
>>> start = datetime(2001, 1, 1)
>>> stop = datetime(2001, 1, 2)
>>> df_temporal = pl.DataFrame(
...     {"date": pl.datetime_range(start, stop, "1h", eager=True)}
... ).with_row_index()
>>> df_temporal
shape: (25, 2)
┌───────┬─────────────────────┐
│ index ┆ date                │
│ ---   ┆ ---                 │
│ u32   ┆ datetime[μs]        │
╞═══════╪═════════════════════╡
│ 0     ┆ 2001-01-01 00:00:00 │
│ 1     ┆ 2001-01-01 01:00:00 │
│ 2     ┆ 2001-01-01 02:00:00 │
│ 3     ┆ 2001-01-01 03:00:00 │
│ 4     ┆ 2001-01-01 04:00:00 │
│ …     ┆ …                   │
│ 20    ┆ 2001-01-01 20:00:00 │
│ 21    ┆ 2001-01-01 21:00:00 │
│ 22    ┆ 2001-01-01 22:00:00 │
│ 23    ┆ 2001-01-01 23:00:00 │
│ 24    ┆ 2001-01-02 00:00:00 │
└───────┴─────────────────────┘

Compute the rolling quantile with the temporal windows closed on the right:

>>> df_temporal.with_columns(
...     rolling_row_quantile=pl.col("index").rolling_quantile_by(
...         "date", window_size="2h", quantile=0.3
...     )
... )
shape: (25, 3)
┌───────┬─────────────────────┬──────────────────────┐
│ index ┆ date                ┆ rolling_row_quantile │
│ ---   ┆ ---                 ┆ ---                  │
│ u32   ┆ datetime[μs]        ┆ f64                  │
╞═══════╪═════════════════════╪══════════════════════╡
│ 0     ┆ 2001-01-01 00:00:00 ┆ 0.0                  │
│ 1     ┆ 2001-01-01 01:00:00 ┆ 0.0                  │
│ 2     ┆ 2001-01-01 02:00:00 ┆ 1.0                  │
│ 3     ┆ 2001-01-01 03:00:00 ┆ 2.0                  │
│ 4     ┆ 2001-01-01 04:00:00 ┆ 3.0                  │
│ …     ┆ …                   ┆ …                    │
│ 20    ┆ 2001-01-01 20:00:00 ┆ 19.0                 │
│ 21    ┆ 2001-01-01 21:00:00 ┆ 20.0                 │
│ 22    ┆ 2001-01-01 22:00:00 ┆ 21.0                 │
│ 23    ┆ 2001-01-01 23:00:00 ┆ 22.0                 │
│ 24    ┆ 2001-01-02 00:00:00 ┆ 23.0                 │
└───────┴─────────────────────┴──────────────────────┘

rolling_skew( window_size: int, *, bias: bool = True, min_samples: int | None = None, center: bool = False, ) → Expr[source]

Compute a rolling skew.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

The window at a given row will include the row itself, and the window_size - 1 elements before it.

Parameters:

window_size

Integer size of the rolling window.

bias

If False, the calculations are corrected for statistical bias.: bias: bool = True,

min_samples

The number of values in the window that should be non-null before computing a result. If set to None (default), it will be set equal to window_size.

center

Set the labels at the center of the window.

See also

Expr.skew

Examples

>>> df = pl.DataFrame({"a": [1, 4, 2, 9]})
>>> df.select(pl.col("a").rolling_skew(3))
shape: (4, 1)
┌──────────┐
│ a        │
│ ---      │
│ f64      │
╞══════════╡
│ null     │
│ null     │
│ 0.381802 │
│ 0.47033  │
└──────────┘

Note how the values match the following:

>>> pl.Series([1, 4, 2]).skew(), pl.Series([4, 2, 9]).skew()
(0.38180177416060584, 0.47033046033698594)

rolling_std( window_size: int, weights: list[float] | None = None, *, min_samples: int | None = None, center: bool = False, ddof: int = 1, ) → Expr[source]

Compute a rolling standard deviation.

A window of length window_size will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by the weights vector. The resulting values will be aggregated to their std. Weights are normalized to sum to 1.

The window at a given row will include the row itself, and the window_size - 1 elements before it.

Changed in version 1.21.0: The min_periods parameter was renamed min_samples.

Parameters:

window_size: The length of the window in number of elements.
weights: An optional slice with the same length as the window that will be multiplied elementwise with the values in the window after being normalized to sum to 1.
min_samples: The number of values in the window that should be non-null before computing a result. If set to None (default), it will be set equal to window_size.
center: Set the labels at the center of the window.
ddof: “Delta Degrees of Freedom”: The divisor for a length N window is N - ddof

Notes

If you want to compute multiple aggregation statistics over the same dynamic window, consider using rolling - this method can cache the window size computation.

Examples

>>> df = pl.DataFrame({"A": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]})
>>> df.with_columns(
...     rolling_std=pl.col("A").rolling_std(window_size=2),
... )
shape: (6, 2)
┌─────┬─────────────┐
│ A   ┆ rolling_std │
│ --- ┆ ---         │
│ f64 ┆ f64         │
╞═════╪═════════════╡
│ 1.0 ┆ null        │
│ 2.0 ┆ 0.707107    │
│ 3.0 ┆ 0.707107    │
│ 4.0 ┆ 0.707107    │
│ 5.0 ┆ 0.707107    │
│ 6.0 ┆ 0.707107    │
└─────┴─────────────┘

Specify weights to multiply the values in the window with:

>>> df.with_columns(
...     rolling_std=pl.col("A").rolling_std(
...         window_size=2, weights=[0.25, 0.75]
...     ),
... )
shape: (6, 2)
┌─────┬─────────────┐
│ A   ┆ rolling_std │
│ --- ┆ ---         │
│ f64 ┆ f64         │
╞═════╪═════════════╡
│ 1.0 ┆ null        │
│ 2.0 ┆ 0.433013    │
│ 3.0 ┆ 0.433013    │
│ 4.0 ┆ 0.433013    │
│ 5.0 ┆ 0.433013    │
│ 6.0 ┆ 0.433013    │
└─────┴─────────────┘

Center the values in the window

>>> df.with_columns(
...     rolling_std=pl.col("A").rolling_std(window_size=3, center=True),
... )
shape: (6, 2)
┌─────┬─────────────┐
│ A   ┆ rolling_std │
│ --- ┆ ---         │
│ f64 ┆ f64         │
╞═════╪═════════════╡
│ 1.0 ┆ null        │
│ 2.0 ┆ 1.0         │
│ 3.0 ┆ 1.0         │
│ 4.0 ┆ 1.0         │
│ 5.0 ┆ 1.0         │
│ 6.0 ┆ null        │
└─────┴─────────────┘

rolling_std_by( by: IntoExpr, window_size: timedelta | str, *, min_samples: int = 1, closed: ClosedInterval = 'right', ddof: int = 1, ) → Expr[source]

Compute a rolling standard deviation based on another column.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

Given a by column <t_0, t_1, ..., t_n>, then closed="right" (the default) means the windows will be:

(t_0 - window_size, t_0]

(t_1 - window_size, t_1]

…

(t_n - window_size, t_n]

Changed in version 1.21.0: The min_periods parameter was renamed min_samples.

Parameters:

by

Should be DateTime, Date, UInt64, UInt32, Int64, or Int32 data type (note that the integral ones require using 'i' in window size).

window_size

The length of the window. Can be a dynamic temporal size indicated by a timedelta or the following string language:

1ns (1 nanosecond)
1us (1 microsecond)
1ms (1 millisecond)
1s (1 second)
1m (1 minute)
1h (1 hour)
1d (1 calendar day)
1w (1 calendar week)
1mo (1 calendar month)
1q (1 calendar quarter)
1y (1 calendar year)
1i (1 index count)

By “calendar day”, we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for “calendar week”, “calendar month”, “calendar quarter”, and “calendar year”.

min_samples

The number of values in the window that should be non-null before computing a result.

closed{‘left’, ‘right’, ‘both’, ‘none’}

Define which sides of the temporal interval are closed (inclusive), defaults to 'right'.

ddof

“Delta Degrees of Freedom”: The divisor for a length N window is N - ddof

Notes

If you want to compute multiple aggregation statistics over the same dynamic window, consider using rolling - this method can cache the window size computation.

Examples

Create a DataFrame with a datetime column and a row number column

>>> from datetime import timedelta, datetime
>>> start = datetime(2001, 1, 1)
>>> stop = datetime(2001, 1, 2)
>>> df_temporal = pl.DataFrame(
...     {"date": pl.datetime_range(start, stop, "1h", eager=True)}
... ).with_row_index()
>>> df_temporal
shape: (25, 2)
┌───────┬─────────────────────┐
│ index ┆ date                │
│ ---   ┆ ---                 │
│ u32   ┆ datetime[μs]        │
╞═══════╪═════════════════════╡
│ 0     ┆ 2001-01-01 00:00:00 │
│ 1     ┆ 2001-01-01 01:00:00 │
│ 2     ┆ 2001-01-01 02:00:00 │
│ 3     ┆ 2001-01-01 03:00:00 │
│ 4     ┆ 2001-01-01 04:00:00 │
│ …     ┆ …                   │
│ 20    ┆ 2001-01-01 20:00:00 │
│ 21    ┆ 2001-01-01 21:00:00 │
│ 22    ┆ 2001-01-01 22:00:00 │
│ 23    ┆ 2001-01-01 23:00:00 │
│ 24    ┆ 2001-01-02 00:00:00 │
└───────┴─────────────────────┘

Compute the rolling std with the temporal windows closed on the right (default)

>>> df_temporal.with_columns(
...     rolling_row_std=pl.col("index").rolling_std_by("date", window_size="2h")
... )
shape: (25, 3)
┌───────┬─────────────────────┬─────────────────┐
│ index ┆ date                ┆ rolling_row_std │
│ ---   ┆ ---                 ┆ ---             │
│ u32   ┆ datetime[μs]        ┆ f64             │
╞═══════╪═════════════════════╪═════════════════╡
│ 0     ┆ 2001-01-01 00:00:00 ┆ null            │
│ 1     ┆ 2001-01-01 01:00:00 ┆ 0.707107        │
│ 2     ┆ 2001-01-01 02:00:00 ┆ 0.707107        │
│ 3     ┆ 2001-01-01 03:00:00 ┆ 0.707107        │
│ 4     ┆ 2001-01-01 04:00:00 ┆ 0.707107        │
│ …     ┆ …                   ┆ …               │
│ 20    ┆ 2001-01-01 20:00:00 ┆ 0.707107        │
│ 21    ┆ 2001-01-01 21:00:00 ┆ 0.707107        │
│ 22    ┆ 2001-01-01 22:00:00 ┆ 0.707107        │
│ 23    ┆ 2001-01-01 23:00:00 ┆ 0.707107        │
│ 24    ┆ 2001-01-02 00:00:00 ┆ 0.707107        │
└───────┴─────────────────────┴─────────────────┘

Compute the rolling std with the closure of windows on both sides

>>> df_temporal.with_columns(
...     rolling_row_std=pl.col("index").rolling_std_by(
...         "date", window_size="2h", closed="both"
...     )
... )
shape: (25, 3)
┌───────┬─────────────────────┬─────────────────┐
│ index ┆ date                ┆ rolling_row_std │
│ ---   ┆ ---                 ┆ ---             │
│ u32   ┆ datetime[μs]        ┆ f64             │
╞═══════╪═════════════════════╪═════════════════╡
│ 0     ┆ 2001-01-01 00:00:00 ┆ null            │
│ 1     ┆ 2001-01-01 01:00:00 ┆ 0.707107        │
│ 2     ┆ 2001-01-01 02:00:00 ┆ 1.0             │
│ 3     ┆ 2001-01-01 03:00:00 ┆ 1.0             │
│ 4     ┆ 2001-01-01 04:00:00 ┆ 1.0             │
│ …     ┆ …                   ┆ …               │
│ 20    ┆ 2001-01-01 20:00:00 ┆ 1.0             │
│ 21    ┆ 2001-01-01 21:00:00 ┆ 1.0             │
│ 22    ┆ 2001-01-01 22:00:00 ┆ 1.0             │
│ 23    ┆ 2001-01-01 23:00:00 ┆ 1.0             │
│ 24    ┆ 2001-01-02 00:00:00 ┆ 1.0             │
└───────┴─────────────────────┴─────────────────┘

rolling_sum( window_size: int, weights: list[float] | None = None, *, min_samples: int | None = None, center: bool = False, ) → Expr[source]

Apply a rolling sum (moving sum) over the values in this array.

A window of length window_size will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by the weights vector. The resulting values will be aggregated to their sum.

The window at a given row will include the row itself, and the window_size - 1 elements before it.

Changed in version 1.21.0: The min_periods parameter was renamed min_samples.

Parameters:

window_size: The length of the window in number of elements.
weights: An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.
min_samples: The number of values in the window that should be non-null before computing a result. If set to None (default), it will be set equal to window_size.
center: Set the labels at the center of the window.

Notes

If you want to compute multiple aggregation statistics over the same dynamic window, consider using rolling - this method can cache the window size computation.

Examples

>>> df = pl.DataFrame({"A": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]})
>>> df.with_columns(
...     rolling_sum=pl.col("A").rolling_sum(window_size=2),
... )
shape: (6, 2)
┌─────┬─────────────┐
│ A   ┆ rolling_sum │
│ --- ┆ ---         │
│ f64 ┆ f64         │
╞═════╪═════════════╡
│ 1.0 ┆ null        │
│ 2.0 ┆ 3.0         │
│ 3.0 ┆ 5.0         │
│ 4.0 ┆ 7.0         │
│ 5.0 ┆ 9.0         │
│ 6.0 ┆ 11.0        │
└─────┴─────────────┘

Specify weights to multiply the values in the window with:

>>> df.with_columns(
...     rolling_sum=pl.col("A").rolling_sum(
...         window_size=2, weights=[0.25, 0.75]
...     ),
... )
shape: (6, 2)
┌─────┬─────────────┐
│ A   ┆ rolling_sum │
│ --- ┆ ---         │
│ f64 ┆ f64         │
╞═════╪═════════════╡
│ 1.0 ┆ null        │
│ 2.0 ┆ 1.75        │
│ 3.0 ┆ 2.75        │
│ 4.0 ┆ 3.75        │
│ 5.0 ┆ 4.75        │
│ 6.0 ┆ 5.75        │
└─────┴─────────────┘

Center the values in the window

>>> df.with_columns(
...     rolling_sum=pl.col("A").rolling_sum(window_size=3, center=True),
... )
shape: (6, 2)
┌─────┬─────────────┐
│ A   ┆ rolling_sum │
│ --- ┆ ---         │
│ f64 ┆ f64         │
╞═════╪═════════════╡
│ 1.0 ┆ null        │
│ 2.0 ┆ 6.0         │
│ 3.0 ┆ 9.0         │
│ 4.0 ┆ 12.0        │
│ 5.0 ┆ 15.0        │
│ 6.0 ┆ null        │
└─────┴─────────────┘

rolling_sum_by( by: IntoExpr, window_size: timedelta | str, *, min_samples: int = 1, closed: ClosedInterval = 'right', ) → Expr[source]

Apply a rolling sum based on another column.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

Given a by column <t_0, t_1, ..., t_n>, then closed="right" (the default) means the windows will be:

(t_0 - window_size, t_0]

(t_1 - window_size, t_1]

…

(t_n - window_size, t_n]

Changed in version 1.21.0: The min_periods parameter was renamed min_samples.

Parameters:

window_size

The length of the window. Can be a dynamic temporal size indicated by a timedelta or the following string language:

1ns (1 nanosecond)
1us (1 microsecond)
1ms (1 millisecond)
1s (1 second)
1m (1 minute)
1h (1 hour)
1d (1 calendar day)
1w (1 calendar week)
1mo (1 calendar month)
1q (1 calendar quarter)
1y (1 calendar year)
1i (1 index count)

By “calendar day”, we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for “calendar week”, “calendar month”, “calendar quarter”, and “calendar year”.

min_samples

The number of values in the window that should be non-null before computing a result.

by

Should be DateTime, Date, UInt64, UInt32, Int64, or Int32 data type (note that the integral ones require using 'i' in window size).

closed{‘left’, ‘right’, ‘both’, ‘none’}

Define which sides of the temporal interval are closed (inclusive), defaults to 'right'.

Notes

If you want to compute multiple aggregation statistics over the same dynamic window, consider using rolling - this method can cache the window size computation.

Examples

Create a DataFrame with a datetime column and a row number column

>>> from datetime import timedelta, datetime
>>> start = datetime(2001, 1, 1)
>>> stop = datetime(2001, 1, 2)
>>> df_temporal = pl.DataFrame(
...     {"date": pl.datetime_range(start, stop, "1h", eager=True)}
... ).with_row_index()
>>> df_temporal
shape: (25, 2)
┌───────┬─────────────────────┐
│ index ┆ date                │
│ ---   ┆ ---                 │
│ u32   ┆ datetime[μs]        │
╞═══════╪═════════════════════╡
│ 0     ┆ 2001-01-01 00:00:00 │
│ 1     ┆ 2001-01-01 01:00:00 │
│ 2     ┆ 2001-01-01 02:00:00 │
│ 3     ┆ 2001-01-01 03:00:00 │
│ 4     ┆ 2001-01-01 04:00:00 │
│ …     ┆ …                   │
│ 20    ┆ 2001-01-01 20:00:00 │
│ 21    ┆ 2001-01-01 21:00:00 │
│ 22    ┆ 2001-01-01 22:00:00 │
│ 23    ┆ 2001-01-01 23:00:00 │
│ 24    ┆ 2001-01-02 00:00:00 │
└───────┴─────────────────────┘

Compute the rolling sum with the temporal windows closed on the right (default)

>>> df_temporal.with_columns(
...     rolling_row_sum=pl.col("index").rolling_sum_by("date", window_size="2h")
... )
shape: (25, 3)
┌───────┬─────────────────────┬─────────────────┐
│ index ┆ date                ┆ rolling_row_sum │
│ ---   ┆ ---                 ┆ ---             │
│ u32   ┆ datetime[μs]        ┆ u32             │
╞═══════╪═════════════════════╪═════════════════╡
│ 0     ┆ 2001-01-01 00:00:00 ┆ 0               │
│ 1     ┆ 2001-01-01 01:00:00 ┆ 1               │
│ 2     ┆ 2001-01-01 02:00:00 ┆ 3               │
│ 3     ┆ 2001-01-01 03:00:00 ┆ 5               │
│ 4     ┆ 2001-01-01 04:00:00 ┆ 7               │
│ …     ┆ …                   ┆ …               │
│ 20    ┆ 2001-01-01 20:00:00 ┆ 39              │
│ 21    ┆ 2001-01-01 21:00:00 ┆ 41              │
│ 22    ┆ 2001-01-01 22:00:00 ┆ 43              │
│ 23    ┆ 2001-01-01 23:00:00 ┆ 45              │
│ 24    ┆ 2001-01-02 00:00:00 ┆ 47              │
└───────┴─────────────────────┴─────────────────┘

Compute the rolling sum with the closure of windows on both sides

>>> df_temporal.with_columns(
...     rolling_row_sum=pl.col("index").rolling_sum_by(
...         "date", window_size="2h", closed="both"
...     )
... )
shape: (25, 3)
┌───────┬─────────────────────┬─────────────────┐
│ index ┆ date                ┆ rolling_row_sum │
│ ---   ┆ ---                 ┆ ---             │
│ u32   ┆ datetime[μs]        ┆ u32             │
╞═══════╪═════════════════════╪═════════════════╡
│ 0     ┆ 2001-01-01 00:00:00 ┆ 0               │
│ 1     ┆ 2001-01-01 01:00:00 ┆ 1               │
│ 2     ┆ 2001-01-01 02:00:00 ┆ 3               │
│ 3     ┆ 2001-01-01 03:00:00 ┆ 6               │
│ 4     ┆ 2001-01-01 04:00:00 ┆ 9               │
│ …     ┆ …                   ┆ …               │
│ 20    ┆ 2001-01-01 20:00:00 ┆ 57              │
│ 21    ┆ 2001-01-01 21:00:00 ┆ 60              │
│ 22    ┆ 2001-01-01 22:00:00 ┆ 63              │
│ 23    ┆ 2001-01-01 23:00:00 ┆ 66              │
│ 24    ┆ 2001-01-02 00:00:00 ┆ 69              │
└───────┴─────────────────────┴─────────────────┘

rolling_var( window_size: int, weights: list[float] | None = None, *, min_samples: int | None = None, center: bool = False, ddof: int = 1, ) → Expr[source]

Compute a rolling variance.

A window of length window_size will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by the weights vector. The resulting values will be aggregated to their var. Weights are normalized to sum to 1.

The window at a given row will include the row itself, and the window_size - 1 elements before it.

Changed in version 1.21.0: The min_periods parameter was renamed min_samples.

Parameters:

window_size: The length of the window in number of elements.
weights: An optional slice with the same length as the window that will be multiplied elementwise with the values in the window after being normalized to sum to 1.
min_samples: The number of values in the window that should be non-null before computing a result. If set to None (default), it will be set equal to window_size.
center: Set the labels at the center of the window.
ddof: “Delta Degrees of Freedom”: The divisor for a length N window is N - ddof

Notes

If you want to compute multiple aggregation statistics over the same dynamic window, consider using rolling - this method can cache the window size computation.

Examples

>>> df = pl.DataFrame({"A": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]})
>>> df.with_columns(
...     rolling_var=pl.col("A").rolling_var(window_size=2),
... )
shape: (6, 2)
┌─────┬─────────────┐
│ A   ┆ rolling_var │
│ --- ┆ ---         │
│ f64 ┆ f64         │
╞═════╪═════════════╡
│ 1.0 ┆ null        │
│ 2.0 ┆ 0.5         │
│ 3.0 ┆ 0.5         │
│ 4.0 ┆ 0.5         │
│ 5.0 ┆ 0.5         │
│ 6.0 ┆ 0.5         │
└─────┴─────────────┘

Specify weights to multiply the values in the window with:

>>> df.with_columns(
...     rolling_var=pl.col("A").rolling_var(
...         window_size=2, weights=[0.25, 0.75]
...     ),
... )
shape: (6, 2)
┌─────┬─────────────┐
│ A   ┆ rolling_var │
│ --- ┆ ---         │
│ f64 ┆ f64         │
╞═════╪═════════════╡
│ 1.0 ┆ null        │
│ 2.0 ┆ 0.1875      │
│ 3.0 ┆ 0.1875      │
│ 4.0 ┆ 0.1875      │
│ 5.0 ┆ 0.1875      │
│ 6.0 ┆ 0.1875      │
└─────┴─────────────┘

Center the values in the window

>>> df.with_columns(
...     rolling_var=pl.col("A").rolling_var(window_size=3, center=True),
... )
shape: (6, 2)
┌─────┬─────────────┐
│ A   ┆ rolling_var │
│ --- ┆ ---         │
│ f64 ┆ f64         │
╞═════╪═════════════╡
│ 1.0 ┆ null        │
│ 2.0 ┆ 1.0         │
│ 3.0 ┆ 1.0         │
│ 4.0 ┆ 1.0         │
│ 5.0 ┆ 1.0         │
│ 6.0 ┆ null        │
└─────┴─────────────┘

rolling_var_by( by: IntoExpr, window_size: timedelta | str, *, min_samples: int = 1, closed: ClosedInterval = 'right', ddof: int = 1, ) → Expr[source]

Compute a rolling variance based on another column.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

Given a by column <t_0, t_1, ..., t_n>, then closed="right" (the default) means the windows will be:

(t_0 - window_size, t_0]

(t_1 - window_size, t_1]

…

(t_n - window_size, t_n]

Changed in version 1.21.0: The min_periods parameter was renamed min_samples.

Parameters:

by

Should be DateTime, Date, UInt64, UInt32, Int64, or Int32 data type (note that the integral ones require using 'i' in window size).

window_size

The length of the window. Can be a dynamic temporal size indicated by a timedelta or the following string language:

1ns (1 nanosecond)
1us (1 microsecond)
1ms (1 millisecond)
1s (1 second)
1m (1 minute)
1h (1 hour)
1d (1 calendar day)
1w (1 calendar week)
1mo (1 calendar month)
1q (1 calendar quarter)
1y (1 calendar year)
1i (1 index count)

By “calendar day”, we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for “calendar week”, “calendar month”, “calendar quarter”, and “calendar year”.

min_samples

The number of values in the window that should be non-null before computing a result.

closed{‘left’, ‘right’, ‘both’, ‘none’}

Define which sides of the temporal interval are closed (inclusive), defaults to 'right'.

ddof

“Delta Degrees of Freedom”: The divisor for a length N window is N - ddof

Notes

If you want to compute multiple aggregation statistics over the same dynamic window, consider using rolling - this method can cache the window size computation.

Examples

Create a DataFrame with a datetime column and a row number column

>>> from datetime import timedelta, datetime
>>> start = datetime(2001, 1, 1)
>>> stop = datetime(2001, 1, 2)
>>> df_temporal = pl.DataFrame(
...     {"date": pl.datetime_range(start, stop, "1h", eager=True)}
... ).with_row_index()
>>> df_temporal
shape: (25, 2)
┌───────┬─────────────────────┐
│ index ┆ date                │
│ ---   ┆ ---                 │
│ u32   ┆ datetime[μs]        │
╞═══════╪═════════════════════╡
│ 0     ┆ 2001-01-01 00:00:00 │
│ 1     ┆ 2001-01-01 01:00:00 │
│ 2     ┆ 2001-01-01 02:00:00 │
│ 3     ┆ 2001-01-01 03:00:00 │
│ 4     ┆ 2001-01-01 04:00:00 │
│ …     ┆ …                   │
│ 20    ┆ 2001-01-01 20:00:00 │
│ 21    ┆ 2001-01-01 21:00:00 │
│ 22    ┆ 2001-01-01 22:00:00 │
│ 23    ┆ 2001-01-01 23:00:00 │
│ 24    ┆ 2001-01-02 00:00:00 │
└───────┴─────────────────────┘

Compute the rolling var with the temporal windows closed on the right (default)

>>> df_temporal.with_columns(
...     rolling_row_var=pl.col("index").rolling_var_by("date", window_size="2h")
... )
shape: (25, 3)
┌───────┬─────────────────────┬─────────────────┐
│ index ┆ date                ┆ rolling_row_var │
│ ---   ┆ ---                 ┆ ---             │
│ u32   ┆ datetime[μs]        ┆ f64             │
╞═══════╪═════════════════════╪═════════════════╡
│ 0     ┆ 2001-01-01 00:00:00 ┆ null            │
│ 1     ┆ 2001-01-01 01:00:00 ┆ 0.5             │
│ 2     ┆ 2001-01-01 02:00:00 ┆ 0.5             │
│ 3     ┆ 2001-01-01 03:00:00 ┆ 0.5             │
│ 4     ┆ 2001-01-01 04:00:00 ┆ 0.5             │
│ …     ┆ …                   ┆ …               │
│ 20    ┆ 2001-01-01 20:00:00 ┆ 0.5             │
│ 21    ┆ 2001-01-01 21:00:00 ┆ 0.5             │
│ 22    ┆ 2001-01-01 22:00:00 ┆ 0.5             │
│ 23    ┆ 2001-01-01 23:00:00 ┆ 0.5             │
│ 24    ┆ 2001-01-02 00:00:00 ┆ 0.5             │
└───────┴─────────────────────┴─────────────────┘

Compute the rolling var with the closure of windows on both sides

>>> df_temporal.with_columns(
...     rolling_row_var=pl.col("index").rolling_var_by(
...         "date", window_size="2h", closed="both"
...     )
... )
shape: (25, 3)
┌───────┬─────────────────────┬─────────────────┐
│ index ┆ date                ┆ rolling_row_var │
│ ---   ┆ ---                 ┆ ---             │
│ u32   ┆ datetime[μs]        ┆ f64             │
╞═══════╪═════════════════════╪═════════════════╡
│ 0     ┆ 2001-01-01 00:00:00 ┆ null            │
│ 1     ┆ 2001-01-01 01:00:00 ┆ 0.5             │
│ 2     ┆ 2001-01-01 02:00:00 ┆ 1.0             │
│ 3     ┆ 2001-01-01 03:00:00 ┆ 1.0             │
│ 4     ┆ 2001-01-01 04:00:00 ┆ 1.0             │
│ …     ┆ …                   ┆ …               │
│ 20    ┆ 2001-01-01 20:00:00 ┆ 1.0             │
│ 21    ┆ 2001-01-01 21:00:00 ┆ 1.0             │
│ 22    ┆ 2001-01-01 22:00:00 ┆ 1.0             │
│ 23    ┆ 2001-01-01 23:00:00 ┆ 1.0             │
│ 24    ┆ 2001-01-02 00:00:00 ┆ 1.0             │
└───────┴─────────────────────┴─────────────────┘

round(decimals: int = 0, mode: RoundMode = 'half_to_even') → Expr[source]

Round underlying floating point data by decimals digits.

The default rounding mode is “half to even” (also known as “bankers’ rounding”).

Parameters:

decimals

Number of decimals to round by.

mode{‘half_to_even’, ‘half_away_from_zero’}

RoundMode.

half_to_even
round to the nearest even number
half_away_from_zero
round to the nearest number away from zero

Examples

>>> df = pl.DataFrame({"a": [0.33, 0.52, 1.02, 1.17]})
>>> df.select(pl.col("a").round(1))
shape: (4, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 0.3 │
│ 0.5 │
│ 1.0 │
│ 1.2 │
└─────┘

>>> df = pl.DataFrame(
...     {
...         "f64": [-3.5, -2.5, -1.5, -0.5, 0.5, 1.5, 2.5, 3.5],
...         "d": ["-3.5", "-2.5", "-1.5", "-0.5", "0.5", "1.5", "2.5", "3.5"],
...     },
...     schema_overrides={"d": pl.Decimal(scale=1)},
... )
>>> df.with_columns(
...     pl.all().round(mode="half_away_from_zero").name.suffix("_away"),
...     pl.all().round(mode="half_to_even").name.suffix("_to_even"),
... )
shape: (8, 6)
┌──────┬───────────────┬──────────┬───────────────┬─────────────┬───────────────┐
│ f64  ┆ d             ┆ f64_away ┆ d_away        ┆ f64_to_even ┆ d_to_even     │
│ ---  ┆ ---           ┆ ---      ┆ ---           ┆ ---         ┆ ---           │
│ f64  ┆ decimal[38,1] ┆ f64      ┆ decimal[38,1] ┆ f64         ┆ decimal[38,1] │
╞══════╪═══════════════╪══════════╪═══════════════╪═════════════╪═══════════════╡
│ -3.5 ┆ -3.5          ┆ -4.0     ┆ -4.0          ┆ -4.0        ┆ -4.0          │
│ -2.5 ┆ -2.5          ┆ -3.0     ┆ -3.0          ┆ -2.0        ┆ -2.0          │
│ -1.5 ┆ -1.5          ┆ -2.0     ┆ -2.0          ┆ -2.0        ┆ -2.0          │
│ -0.5 ┆ -0.5          ┆ -1.0     ┆ -1.0          ┆ -0.0        ┆ 0.0           │
│ 0.5  ┆ 0.5           ┆ 1.0      ┆ 1.0           ┆ 0.0         ┆ 0.0           │
│ 1.5  ┆ 1.5           ┆ 2.0      ┆ 2.0           ┆ 2.0         ┆ 2.0           │
│ 2.5  ┆ 2.5           ┆ 3.0      ┆ 3.0           ┆ 2.0         ┆ 2.0           │
│ 3.5  ┆ 3.5           ┆ 4.0      ┆ 4.0           ┆ 4.0         ┆ 4.0           │
└──────┴───────────────┴──────────┴───────────────┴─────────────┴───────────────┘

round_sig_figs(digits: int) → Expr[source]

Round to a number of significant figures.

Parameters:

digits: Number of significant figures to round to.

Examples

>>> df = pl.DataFrame({"a": [0.01234, 3.333, 1234.0]})
>>> df.with_columns(pl.col("a").round_sig_figs(2).alias("round_sig_figs"))
shape: (3, 2)
┌─────────┬────────────────┐
│ a       ┆ round_sig_figs │
│ ---     ┆ ---            │
│ f64     ┆ f64            │
╞═════════╪════════════════╡
│ 0.01234 ┆ 0.012          │
│ 3.333   ┆ 3.3            │
│ 1234.0  ┆ 1200.0         │
└─────────┴────────────────┘

Sample from this expression.

Parameters:

n: Number of items to return. Cannot be used with fraction. Defaults to 1 if fraction is None.
fraction: Fraction of items to return. Cannot be used with n.
with_replacement: Allow values to be sampled more than once.
shuffle: Shuffle the order of sampled data points.
seed: Seed for the random number generator. If set to None (default), a random seed is generated for each sample operation.

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3]})
>>> df.select(pl.col("a").sample(fraction=1.0, with_replacement=True, seed=1))
shape: (3, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 3   │
│ 3   │
│ 1   │
└─────┘

search_sorted( element: IntoExpr | np.ndarray[Any, Any], side: SearchSortedSide = 'any', *, descending: bool = False, ) → Expr[source]

Find indices where elements should be inserted to maintain order.

\[a[i-1] < v <= a[i]\]

Parameters:

element: Expression or scalar value.
side{‘any’, ‘left’, ‘right’}: If ‘any’, the index of the first suitable location found is given. If ‘left’, the index of the leftmost suitable location found is given. If ‘right’, return the rightmost suitable location found is given.
descending: Boolean indicating whether the values are descending or not (they are required to be sorted either way).

Examples

>>> df = pl.DataFrame(
...     {
...         "values": [1, 2, 3, 5],
...     }
... )
>>> df.select(
...     [
...         pl.col("values").search_sorted(0).alias("zero"),
...         pl.col("values").search_sorted(3).alias("three"),
...         pl.col("values").search_sorted(6).alias("six"),
...     ]
... )
shape: (1, 3)
┌──────┬───────┬─────┐
│ zero ┆ three ┆ six │
│ ---  ┆ ---   ┆ --- │
│ u32  ┆ u32   ┆ u32 │
╞══════╪═══════╪═════╡
│ 0    ┆ 2     ┆ 4   │
└──────┴───────┴─────┘

set_sorted(*, descending: bool = False) → Expr[source]

Flags the expression as ‘sorted’.

Enables downstream code to user fast paths for sorted arrays.

Parameters:

descending: Whether the Series order is descending.

Warning

This can lead to incorrect results if the data is NOT sorted!! Use with care!

Examples

>>> df = pl.DataFrame({"values": [1, 2, 3]})
>>> df.select(pl.col("values").set_sorted().max())
shape: (1, 1)
┌────────┐
│ values │
│ ---    │
│ i64    │
╞════════╡
│ 3      │
└────────┘

shift(n: int | IntoExprColumn = 1, *, fill_value: IntoExpr | None = None) → Expr[source]

Shift values by the given number of indices.

Parameters:

n: Number of indices to shift forward. If a negative value is passed, values are shifted in the opposite direction instead.
fill_value: Fill the resulting null values with this scalar value.

See also

fill_null

Notes

This method is similar to the LAG operation in SQL when the value for n is positive. With a negative value for n, it is similar to LEAD.

Examples

By default, values are shifted forward by one index.

>>> df = pl.DataFrame({"a": [1, 2, 3, 4]})
>>> df.with_columns(shift=pl.col("a").shift())
shape: (4, 2)
┌─────┬───────┐
│ a   ┆ shift │
│ --- ┆ ---   │
│ i64 ┆ i64   │
╞═════╪═══════╡
│ 1   ┆ null  │
│ 2   ┆ 1     │
│ 3   ┆ 2     │
│ 4   ┆ 3     │
└─────┴───────┘

Pass a negative value to shift in the opposite direction instead.

>>> df.with_columns(shift=pl.col("a").shift(-2))
shape: (4, 2)
┌─────┬───────┐
│ a   ┆ shift │
│ --- ┆ ---   │
│ i64 ┆ i64   │
╞═════╪═══════╡
│ 1   ┆ 3     │
│ 2   ┆ 4     │
│ 3   ┆ null  │
│ 4   ┆ null  │
└─────┴───────┘

Specify fill_value to fill the resulting null values.

>>> df.with_columns(shift=pl.col("a").shift(-2, fill_value=100))
shape: (4, 2)
┌─────┬───────┐
│ a   ┆ shift │
│ --- ┆ ---   │
│ i64 ┆ i64   │
╞═════╪═══════╡
│ 1   ┆ 3     │
│ 2   ┆ 4     │
│ 3   ┆ 100   │
│ 4   ┆ 100   │
└─────┴───────┘

shrink_dtype() → Expr[source]

Shrink numeric columns to the minimal required datatype.

Shrink to the dtype needed to fit the extrema of this [Series]. This can be used to reduce memory pressure.

Changed in version 1.33.0: Deprecated and turned into a no-op. The operation does not match the Polars data-model during lazy execution since the output datatype cannot be known without inspecting the data.

Use Series.shrink_dtype instead.

Examples

>>> pl.DataFrame(
...     {
...         "a": [1, 2, 3],
...         "b": [1, 2, 2 << 32],
...         "c": [-1, 2, 1 << 30],
...         "d": [-112, 2, 112],
...         "e": [-112, 2, 129],
...         "f": ["a", "b", "c"],
...         "g": [0.1, 1.32, 0.12],
...         "h": [True, None, False],
...     }
... ).select(pl.all().shrink_dtype())  
shape: (3, 8)
┌─────┬────────────┬────────────┬──────┬──────┬─────┬──────┬───────┐
│ a   ┆ b          ┆ c          ┆ d    ┆ e    ┆ f   ┆ g    ┆ h     │
│ --- ┆ ---        ┆ ---        ┆ ---  ┆ ---  ┆ --- ┆ ---  ┆ ---   │
│ i8  ┆ i64        ┆ i32        ┆ i8   ┆ i16  ┆ str ┆ f32  ┆ bool  │
╞═════╪════════════╪════════════╪══════╪══════╪═════╪══════╪═══════╡
│ 1   ┆ 1          ┆ -1         ┆ -112 ┆ -112 ┆ a   ┆ 0.1  ┆ true  │
│ 2   ┆ 2          ┆ 2          ┆ 2    ┆ 2    ┆ b   ┆ 1.32 ┆ null  │
│ 3   ┆ 8589934592 ┆ 1073741824 ┆ 112  ┆ 129  ┆ c   ┆ 0.12 ┆ false │
└─────┴────────────┴────────────┴──────┴──────┴─────┴──────┴───────┘

shuffle(seed: int | None = None) → Expr[source]

Shuffle the contents of this expression.

Note this is shuffled independently of any other column or Expression. If you want each row to stay the same use df.sample(shuffle=True)

Parameters:

seed: Seed for the random number generator. If set to None (default), a random seed is generated each time the shuffle is called.

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3]})
>>> df.select(pl.col("a").shuffle(seed=1))
shape: (3, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 2   │
│ 3   │
│ 1   │
└─────┘

sign() → Expr[source]

Compute the element-wise sign function on numeric types.

The returned value is computed as follows:

-1 if x < 0.
1 if x > 0.
x otherwise (typically 0, but could be NaN if the input is).

Null values are preserved as-is, and the dtype of the input is preserved.

Examples

>>> df = pl.DataFrame({"a": [-9.0, -0.0, 0.0, 4.0, float("nan"), None]})
>>> df.select(pl.col.a.sign())
shape: (6, 1)
┌──────┐
│ a    │
│ ---  │
│ f64  │
╞══════╡
│ -1.0 │
│ -0.0 │
│ 0.0  │
│ 1.0  │
│ NaN  │
│ null │
└──────┘

sin() → Expr[source]

Compute the element-wise value for the sine.

Returns:

Expr: Expression of data type Float64.

Examples

>>> df = pl.DataFrame({"a": [0.0]})
>>> df.select(pl.col("a").sin())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 0.0 │
└─────┘

sinh() → Expr[source]

Compute the element-wise value for the hyperbolic sine.

Returns:

Expr: Expression of data type Float64.

Examples

>>> df = pl.DataFrame({"a": [1.0]})
>>> df.select(pl.col("a").sinh())
shape: (1, 1)
┌──────────┐
│ a        │
│ ---      │
│ f64      │
╞══════════╡
│ 1.175201 │
└──────────┘

skew(*, bias: bool = True) → Expr[source]

Compute the sample skewness of a data set.

For normally distributed data, the skewness should be about zero. For unimodal continuous distributions, a skewness value greater than zero means that there is more weight in the right tail of the distribution. The function skewtest can be used to determine if the skewness value is close enough to zero, statistically speaking.

See scipy.stats for more information.

Parameters:

biasbool, optional: If False, the calculations are corrected for statistical bias.

Notes

The sample skewness is computed as the Fisher-Pearson coefficient of skewness, i.e.

\[g_1=\frac{m_3}{m_2^{3/2}}\]

where

\[m_i=\frac{1}{N}\sum_{n=1}^N(x[n]-\bar{x})^i\]

is the biased sample $i\texttt{th}$ central moment, and $\bar{x}$ is the sample mean. If bias is False, the calculations are corrected for bias and the value computed is the adjusted Fisher-Pearson standardized moment coefficient, i.e.

\[G_1 = \frac{k_3}{k_2^{3/2}} = \frac{\sqrt{N(N-1)}}{N-2}\frac{m_3}{m_2^{3/2}}\]

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3, 2, 1]})
>>> df.select(pl.col("a").skew())
shape: (1, 1)
┌──────────┐
│ a        │
│ ---      │
│ f64      │
╞══════════╡
│ 0.343622 │
└──────────┘

slice( offset: int | Expr, length: int | Expr | None = None, ) → Expr[source]

Get a slice of this expression.

Parameters:

offset: Start index. Negative indexing is supported.
length: Length of the slice. If set to None, all rows starting at the offset will be selected.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [8, 9, 10, 11],
...         "b": [None, 4, 4, 4],
...     }
... )
>>> df.select(pl.all().slice(1, 2))
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 9   ┆ 4   │
│ 10  ┆ 4   │
└─────┴─────┘

sort( *, descending: bool = False, nulls_last: bool = False, ) → Expr[source]

Sort this column.

When used in a projection/selection context, the whole column is sorted. When used in a group by context, the groups are sorted.

Parameters:

descending: Sort in descending order.
nulls_last: Place null values last.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, None, 3, 2],
...     }
... )
>>> df.select(pl.col("a").sort())
shape: (4, 1)
┌──────┐
│ a    │
│ ---  │
│ i64  │
╞══════╡
│ null │
│ 1    │
│ 2    │
│ 3    │
└──────┘
>>> df.select(pl.col("a").sort(descending=True))
shape: (4, 1)
┌──────┐
│ a    │
│ ---  │
│ i64  │
╞══════╡
│ null │
│ 3    │
│ 2    │
│ 1    │
└──────┘
>>> df.select(pl.col("a").sort(nulls_last=True))
shape: (4, 1)
┌──────┐
│ a    │
│ ---  │
│ i64  │
╞══════╡
│ 1    │
│ 2    │
│ 3    │
│ null │
└──────┘

When sorting in a group by context, the groups are sorted.

>>> df = pl.DataFrame(
...     {
...         "group": ["one", "one", "one", "two", "two", "two"],
...         "value": [1, 98, 2, 3, 99, 4],
...     }
... )
>>> df.group_by("group").agg(pl.col("value").sort())  
shape: (2, 2)
┌───────┬────────────┐
│ group ┆ value      │
│ ---   ┆ ---        │
│ str   ┆ list[i64]  │
╞═══════╪════════════╡
│ two   ┆ [3, 4, 99] │
│ one   ┆ [1, 2, 98] │
└───────┴────────────┘

sort_by( by: IntoExpr | Iterable[IntoExpr], *more_by: IntoExpr, descending: bool | Sequence[bool] = False, nulls_last: bool | Sequence[bool] = False, multithreaded: bool = True, maintain_order: bool = False, ) → Expr[source]

Sort this column by the ordering of other columns.

When used in a projection/selection context, the whole column is sorted. When used in a group by context, the groups are sorted.

Parameters:

by: Column(s) to sort by. Accepts expression input. Strings are parsed as column names.
*more_by: Additional columns to sort by, specified as positional arguments.
descending: Sort in descending order. When sorting by multiple columns, can be specified per column by passing a sequence of booleans.
nulls_last: Place null values last; can specify a single boolean applying to all columns or a sequence of booleans for per-column control.
multithreaded: Sort using multiple threads.
maintain_order: Whether the order should be maintained if elements are equal.

Examples

Pass a single column name to sort by that column.

>>> df = pl.DataFrame(
...     {
...         "group": ["a", "a", "b", "b"],
...         "value1": [1, 3, 4, 2],
...         "value2": [8, 7, 6, 5],
...     }
... )
>>> df.select(pl.col("group").sort_by("value1"))
shape: (4, 1)
┌───────┐
│ group │
│ ---   │
│ str   │
╞═══════╡
│ a     │
│ b     │
│ a     │
│ b     │
└───────┘

Sorting by expressions is also supported.

>>> df.select(pl.col("group").sort_by(pl.col("value1") + pl.col("value2")))
shape: (4, 1)
┌───────┐
│ group │
│ ---   │
│ str   │
╞═══════╡
│ b     │
│ a     │
│ a     │
│ b     │
└───────┘

Sort by multiple columns by passing a list of columns.

>>> df.select(pl.col("group").sort_by(["value1", "value2"], descending=True))
shape: (4, 1)
┌───────┐
│ group │
│ ---   │
│ str   │
╞═══════╡
│ b     │
│ a     │
│ b     │
│ a     │
└───────┘

Or use positional arguments to sort by multiple columns in the same way.

>>> df.select(pl.col("group").sort_by("value1", "value2"))
shape: (4, 1)
┌───────┐
│ group │
│ ---   │
│ str   │
╞═══════╡
│ a     │
│ b     │
│ a     │
│ b     │
└───────┘

When sorting in a group by context, the groups are sorted.

>>> df.group_by("group").agg(
...     pl.col("value1").sort_by("value2")
... )  
shape: (2, 2)
┌───────┬───────────┐
│ group ┆ value1    │
│ ---   ┆ ---       │
│ str   ┆ list[i64] │
╞═══════╪═══════════╡
│ a     ┆ [3, 1]    │
│ b     ┆ [2, 4]    │
└───────┴───────────┘

Take a single row from each group where a column attains its minimal value within that group.

>>> df.group_by("group").agg(
...     pl.all().sort_by("value2").first()
... )  
shape: (2, 3)
┌───────┬────────┬────────┐
│ group ┆ value1 ┆ value2 |
│ ---   ┆ ---    ┆ ---    │
│ str   ┆ i64    ┆ i64    |
╞═══════╪════════╪════════╡
│ a     ┆ 3      ┆ 7      |
│ b     ┆ 2      ┆ 5      |
└───────┴────────┴────────┘

sqrt() → Expr[source]

Compute the square root of the elements.

Examples

>>> df = pl.DataFrame({"values": [1.0, 2.0, 4.0]})
>>> df.select(pl.col("values").sqrt())
shape: (3, 1)
┌──────────┐
│ values   │
│ ---      │
│ f64      │
╞══════════╡
│ 1.0      │
│ 1.414214 │
│ 2.0      │
└──────────┘

std(ddof: int = 1) → Expr[source]

Get standard deviation.

Parameters:

ddof: “Delta Degrees of Freedom”: the divisor used in the calculation is N - ddof, where N represents the number of elements. By default ddof is 1.

Examples

>>> df = pl.DataFrame({"a": [-1, 0, 1]})
>>> df.select(pl.col("a").std())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 1.0 │
└─────┘

sub(other: Any) → Expr[source]

Method equivalent of subtraction operator expr - other.

Parameters:

other: Numeric literal or expression value.

Examples

>>> df = pl.DataFrame({"x": [0, 1, 2, 3, 4]})
>>> df.with_columns(
...     pl.col("x").sub(2).alias("x-2"),
...     pl.col("x").sub(pl.col("x").cum_sum()).alias("x-expr"),
... )
shape: (5, 3)
┌─────┬─────┬────────┐
│ x   ┆ x-2 ┆ x-expr │
│ --- ┆ --- ┆ ---    │
│ i64 ┆ i64 ┆ i64    │
╞═════╪═════╪════════╡
│ 0   ┆ -2  ┆ 0      │
│ 1   ┆ -1  ┆ 0      │
│ 2   ┆ 0   ┆ -1     │
│ 3   ┆ 1   ┆ -3     │
│ 4   ┆ 2   ┆ -6     │
└─────┴─────┴────────┘

sum() → Expr[source]

Get sum value.

Notes

Dtypes in {Int8, UInt8, Int16, UInt16} are cast to Int64 before summing to prevent overflow issues.
If there are no non-null values, then the output is 0. If you would prefer empty sums to return None, you can use pl.when(expr.count()>0).then(expr.sum()) instead of expr.sum().

Examples

>>> df = pl.DataFrame({"a": [-1, 0, 1]})
>>> df.select(pl.col("a").sum())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│  0  │
└─────┘

tail(n: int | Expr = 10) → Expr[source]

Get the last n rows.

Parameters:

n: Number of rows to return.

Examples

>>> df = pl.DataFrame({"foo": [1, 2, 3, 4, 5, 6, 7]})
>>> df.select(pl.col("foo").tail(3))
shape: (3, 1)
┌─────┐
│ foo │
│ --- │
│ i64 │
╞═════╡
│ 5   │
│ 6   │
│ 7   │
└─────┘

tan() → Expr[source]

Compute the element-wise value for the tangent.

Returns:

Expr: Expression of data type Float64.

Examples

>>> df = pl.DataFrame({"a": [1.0]})
>>> df.select(pl.col("a").tan().round(2))
shape: (1, 1)
┌──────┐
│ a    │
│ ---  │
│ f64  │
╞══════╡
│ 1.56 │
└──────┘

tanh() → Expr[source]

Compute the element-wise value for the hyperbolic tangent.

Returns:

Expr: Expression of data type Float64.

Examples

>>> df = pl.DataFrame({"a": [1.0]})
>>> df.select(pl.col("a").tanh())
shape: (1, 1)
┌──────────┐
│ a        │
│ ---      │
│ f64      │
╞══════════╡
│ 0.761594 │
└──────────┘

to_physical() → Expr[source]

Cast to physical representation of the logical dtype.

polars.datatypes.Date() -> polars.datatypes.Int32()
polars.datatypes.Datetime() -> polars.datatypes.Int64()
polars.datatypes.Time() -> polars.datatypes.Int64()
polars.datatypes.Duration() -> polars.datatypes.Int64()
polars.datatypes.Categorical() -> polars.datatypes.UInt32()
List(inner) -> List(physical of inner)
Array(inner) -> Struct(physical of inner)
Struct(fields) -> Array(physical of fields)

Other data types will be left unchanged.

Warning

The physical representations are an implementation detail and not guaranteed to be stable.

Examples

Replicating the pandas pd.factorize function.

>>> pl.DataFrame({"vals": ["a", "x", None, "a"]}).with_columns(
...     pl.col("vals").cast(pl.Categorical),
...     pl.col("vals")
...     .cast(pl.Categorical)
...     .to_physical()
...     .alias("vals_physical"),
... )
shape: (4, 2)
┌──────┬───────────────┐
│ vals ┆ vals_physical │
│ ---  ┆ ---           │
│ cat  ┆ u32           │
╞══════╪═══════════════╡
│ a    ┆ 0             │
│ x    ┆ 1             │
│ null ┆ null          │
│ a    ┆ 0             │
└──────┴───────────────┘

top_k(k: int | IntoExprColumn = 5) → Expr[source]

Return the k largest elements.

Non-null elements are always preferred over null elements. The output is not guaranteed to be in any particular order, call sort() after this function if you wish the output to be sorted.

This has time complexity:

\[O(n)\]

Parameters:

k: Number of elements to return.

See also

top_k_by
bottom_k
bottom_k_by

Examples

Get the 5 largest values in series.

>>> df = pl.DataFrame({"value": [1, 98, 2, 3, 99, 4]})
>>> df.select(
...     pl.col("value").top_k().alias("top_k"),
...     pl.col("value").bottom_k().alias("bottom_k"),
... )
shape: (5, 2)
┌───────┬──────────┐
│ top_k ┆ bottom_k │
│ ---   ┆ ---      │
│ i64   ┆ i64      │
╞═══════╪══════════╡
│ 4     ┆ 1        │
│ 98    ┆ 98       │
│ 2     ┆ 2        │
│ 3     ┆ 3        │
│ 99    ┆ 4        │
└───────┴──────────┘

top_k_by( by: IntoExpr | Iterable[IntoExpr], k: int | IntoExprColumn = 5, *, reverse: bool | Sequence[bool] = False, ) → Expr[source]

Return the elements corresponding to the k largest elements of the by column(s).

Non-null elements are always preferred over null elements, regardless of the value of reverse. The output is not guaranteed to be in any particular order, call sort() after this function if you wish the output to be sorted.

This has time complexity:

\[O(n \log{n})\]

Changed in version 1.0.0: The descending parameter was renamed to reverse.

Parameters:

by: Column(s) used to determine the largest elements. Accepts expression input. Strings are parsed as column names.
k: Number of elements to return.
reverse: Consider the k smallest elements of the by column(s) (instead of the k largest). This can be specified per column by passing a sequence of booleans.

See also

top_k
bottom_k
bottom_k_by

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 2, 3, 4, 5, 6],
...         "b": [6, 5, 4, 3, 2, 1],
...         "c": ["Apple", "Orange", "Apple", "Apple", "Banana", "Banana"],
...     }
... )
>>> df
shape: (6, 3)
┌─────┬─────┬────────┐
│ a   ┆ b   ┆ c      │
│ --- ┆ --- ┆ ---    │
│ i64 ┆ i64 ┆ str    │
╞═════╪═════╪════════╡
│ 1   ┆ 6   ┆ Apple  │
│ 2   ┆ 5   ┆ Orange │
│ 3   ┆ 4   ┆ Apple  │
│ 4   ┆ 3   ┆ Apple  │
│ 5   ┆ 2   ┆ Banana │
│ 6   ┆ 1   ┆ Banana │
└─────┴─────┴────────┘

Get the top 2 rows by column a or b.

>>> df.select(
...     pl.all().top_k_by("a", 2).name.suffix("_top_by_a"),
...     pl.all().top_k_by("b", 2).name.suffix("_top_by_b"),
... )
shape: (2, 6)
┌────────────┬────────────┬────────────┬────────────┬────────────┬────────────┐
│ a_top_by_a ┆ b_top_by_a ┆ c_top_by_a ┆ a_top_by_b ┆ b_top_by_b ┆ c_top_by_b │
│ ---        ┆ ---        ┆ ---        ┆ ---        ┆ ---        ┆ ---        │
│ i64        ┆ i64        ┆ str        ┆ i64        ┆ i64        ┆ str        │
╞════════════╪════════════╪════════════╪════════════╪════════════╪════════════╡
│ 6          ┆ 1          ┆ Banana     ┆ 1          ┆ 6          ┆ Apple      │
│ 5          ┆ 2          ┆ Banana     ┆ 2          ┆ 5          ┆ Orange     │
└────────────┴────────────┴────────────┴────────────┴────────────┴────────────┘

Get the top 2 rows by multiple columns with given order.

>>> df.select(
...     pl.all()
...     .top_k_by(["c", "a"], 2, reverse=[False, True])
...     .name.suffix("_by_ca"),
...     pl.all()
...     .top_k_by(["c", "b"], 2, reverse=[False, True])
...     .name.suffix("_by_cb"),
... )
shape: (2, 6)
┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
│ a_by_ca ┆ b_by_ca ┆ c_by_ca ┆ a_by_cb ┆ b_by_cb ┆ c_by_cb │
│ ---     ┆ ---     ┆ ---     ┆ ---     ┆ ---     ┆ ---     │
│ i64     ┆ i64     ┆ str     ┆ i64     ┆ i64     ┆ str     │
╞═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ 2       ┆ 5       ┆ Orange  ┆ 2       ┆ 5       ┆ Orange  │
│ 5       ┆ 2       ┆ Banana  ┆ 6       ┆ 1       ┆ Banana  │
└─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘

Get the top 2 rows by column a in each group.

>>> (
...     df.group_by("c", maintain_order=True)
...     .agg(pl.all().top_k_by("a", 2))
...     .explode(pl.all().exclude("c"))
... )
shape: (5, 3)
┌────────┬─────┬─────┐
│ c      ┆ a   ┆ b   │
│ ---    ┆ --- ┆ --- │
│ str    ┆ i64 ┆ i64 │
╞════════╪═════╪═════╡
│ Apple  ┆ 4   ┆ 3   │
│ Apple  ┆ 3   ┆ 4   │
│ Orange ┆ 2   ┆ 5   │
│ Banana ┆ 6   ┆ 1   │
│ Banana ┆ 5   ┆ 2   │
└────────┴─────┴─────┘

truediv(other: Any) → Expr[source]

Method equivalent of float division operator expr / other.

Parameters:

other: Numeric literal or expression value.

See also

floordiv

Notes

Zero-division behaviour follows IEEE-754:

0/0: Invalid operation - mathematically undefined, returns NaN. n/0: On finite operands gives an exact infinite result, eg: ±infinity.

Examples

>>> df = pl.DataFrame(
...     data={"x": [-2, -1, 0, 1, 2], "y": [0.5, 0.0, 0.0, -4.0, -0.5]}
... )
>>> df.with_columns(
...     pl.col("x").truediv(2).alias("x/2"),
...     pl.col("x").truediv(pl.col("y")).alias("x/y"),
... )
shape: (5, 4)
┌─────┬──────┬──────┬───────┐
│ x   ┆ y    ┆ x/2  ┆ x/y   │
│ --- ┆ ---  ┆ ---  ┆ ---   │
│ i64 ┆ f64  ┆ f64  ┆ f64   │
╞═════╪══════╪══════╪═══════╡
│ -2  ┆ 0.5  ┆ -1.0 ┆ -4.0  │
│ -1  ┆ 0.0  ┆ -0.5 ┆ -inf  │
│ 0   ┆ 0.0  ┆ 0.0  ┆ NaN   │
│ 1   ┆ -4.0 ┆ 0.5  ┆ -0.25 │
│ 2   ┆ -0.5 ┆ 1.0  ┆ -4.0  │
└─────┴──────┴──────┴───────┘

unique(*, maintain_order: bool = False) → Expr[source]

Get unique values of this expression.

Parameters:

maintain_order: Maintain order of data. This requires more work.

Examples

>>> df = pl.DataFrame({"a": [1, 1, 2]})
>>> df.select(pl.col("a").unique())  
shape: (2, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 2   │
│ 1   │
└─────┘
>>> df.select(pl.col("a").unique(maintain_order=True))
shape: (2, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
└─────┘

unique_counts() → Expr[source]

Return a count of the unique values in the order of appearance.

This method differs from value_counts in that it does not return the values, only the counts and might be faster

Examples

>>> df = pl.DataFrame(
...     {
...         "id": ["a", "b", "b", "c", "c", "c"],
...     }
... )
>>> df.select(
...     [
...         pl.col("id").unique_counts(),
...     ]
... )
shape: (3, 1)
┌─────┐
│ id  │
│ --- │
│ u32 │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
└─────┘

upper_bound() → Expr[source]

Calculate the upper bound.

Returns a unit Series with the highest value possible for the dtype of this expression.

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3, 2, 1]})
>>> df.select(pl.col("a").upper_bound())
shape: (1, 1)
┌─────────────────────┐
│ a                   │
│ ---                 │
│ i64                 │
╞═════════════════════╡
│ 9223372036854775807 │
└─────────────────────┘

value_counts( *, sort: bool = False, parallel: bool = False, name: str | None = None, normalize: bool = False, ) → Expr[source]

Count the occurrence of unique values.

Parameters:

sort: Sort the output by count, in descending order. If set to False (default), the order is non-deterministic.
parallel: Execute the computation in parallel.

Note

This option should likely not be enabled in a group_by context, as the computation will already be parallelized per group.
name: Give the resulting count column a specific name; if normalize is True this defaults to “proportion”, otherwise defaults to “count”.
normalize: If True, the count is returned as the relative frequency of unique values normalized to 1.0.

Returns:

Expr: Expression of type Struct, mapping unique values to their count (or proportion).

Examples

>>> df = pl.DataFrame(
...     {"color": ["red", "blue", "red", "green", "blue", "blue"]}
... )
>>> df_count = df.select(pl.col("color").value_counts())
>>> df_count  
shape: (3, 1)
┌─────────────┐
│ color       │
│ ---         │
│ struct[2]   │
╞═════════════╡
│ {"green",1} │
│ {"blue",3}  │
│ {"red",2}   │
└─────────────┘

>>> df_count.unnest("color")  
shape: (3, 2)
┌───────┬───────┐
│ color ┆ count │
│ ---   ┆ ---   │
│ str   ┆ u32   │
╞═══════╪═══════╡
│ green ┆ 1     │
│ blue  ┆ 3     │
│ red   ┆ 2     │
└───────┴───────┘

Sort the output by (descending) count, customize the field name, and normalize the count to its relative proportion (of 1.0).

>>> df_count = df.select(
...     pl.col("color").value_counts(
...         name="fraction",
...         normalize=True,
...         sort=True,
...     )
... )
>>> df_count
shape: (3, 1)
┌────────────────────┐
│ color              │
│ ---                │
│ struct[2]          │
╞════════════════════╡
│ {"blue",0.5}       │
│ {"red",0.333333}   │
│ {"green",0.166667} │
└────────────────────┘

>>> df_count.unnest("color")
shape: (3, 2)
┌───────┬──────────┐
│ color ┆ fraction │
│ ---   ┆ ---      │
│ str   ┆ f64      │
╞═══════╪══════════╡
│ blue  ┆ 0.5      │
│ red   ┆ 0.333333 │
│ green ┆ 0.166667 │
└───────┴──────────┘

var(ddof: int = 1) → Expr[source]

Get variance.

Parameters:

ddof: “Delta Degrees of Freedom”: the divisor used in the calculation is N - ddof, where N represents the number of elements. By default ddof is 1.

Examples

>>> df = pl.DataFrame({"a": [-1, 0, 1]})
>>> df.select(pl.col("a").var())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 1.0 │
└─────┘

where(predicate: Expr) → Expr[source]

Filter a single column.

Deprecated since version 0.20.4: Use the filter() method instead.

Alias for filter().

Parameters:

predicate: Boolean expression.

Examples

>>> df = pl.DataFrame(
...     {
...         "group_col": ["g1", "g1", "g2"],
...         "b": [1, 2, 3],
...     }
... )
>>> df.group_by("group_col").agg(  
...     [
...         pl.col("b").where(pl.col("b") < 2).sum().alias("lt"),
...         pl.col("b").where(pl.col("b") >= 2).sum().alias("gte"),
...     ]
... ).sort("group_col")
shape: (2, 3)
┌───────────┬─────┬─────┐
│ group_col ┆ lt  ┆ gte │
│ ---       ┆ --- ┆ --- │
│ str       ┆ i64 ┆ i64 │
╞═══════════╪═════╪═════╡
│ g1        ┆ 1   ┆ 2   │
│ g2        ┆ 0   ┆ 3   │
└───────────┴─────┴─────┘

xor(other: Any) → Expr[source]

Method equivalent of bitwise exclusive-or operator expr ^ other.

Parameters:

other: Integer or boolean value; accepts expression input.

Examples

>>> df = pl.DataFrame(
...     {"x": [True, False, True, False], "y": [True, True, False, False]}
... )
>>> df.with_columns(pl.col("x").xor(pl.col("y")).alias("x ^ y"))
shape: (4, 3)
┌───────┬───────┬───────┐
│ x     ┆ y     ┆ x ^ y │
│ ---   ┆ ---   ┆ ---   │
│ bool  ┆ bool  ┆ bool  │
╞═══════╪═══════╪═══════╡
│ true  ┆ true  ┆ false │
│ false ┆ true  ┆ true  │
│ true  ┆ false ┆ true  │
│ false ┆ false ┆ false │
└───────┴───────┴───────┘

>>> def binary_string(n: int) -> str:
...     return bin(n)[2:].zfill(8)
>>>
>>> df = pl.DataFrame(
...     data={"x": [10, 8, 250, 66], "y": [1, 2, 3, 4]},
...     schema={"x": pl.UInt8, "y": pl.UInt8},
... )
>>> df.with_columns(
...     pl.col("x")
...     .map_elements(binary_string, return_dtype=pl.String)
...     .alias("bin_x"),
...     pl.col("y")
...     .map_elements(binary_string, return_dtype=pl.String)
...     .alias("bin_y"),
...     pl.col("x").xor(pl.col("y")).alias("xor_xy"),
...     pl.col("x")
...     .xor(pl.col("y"))
...     .map_elements(binary_string, return_dtype=pl.String)
...     .alias("bin_xor_xy"),
... )
shape: (4, 6)
┌─────┬─────┬──────────┬──────────┬────────┬────────────┐
│ x   ┆ y   ┆ bin_x    ┆ bin_y    ┆ xor_xy ┆ bin_xor_xy │
│ --- ┆ --- ┆ ---      ┆ ---      ┆ ---    ┆ ---        │
│ u8  ┆ u8  ┆ str      ┆ str      ┆ u8     ┆ str        │
╞═════╪═════╪══════════╪══════════╪════════╪════════════╡
│ 10  ┆ 1   ┆ 00001010 ┆ 00000001 ┆ 11     ┆ 00001011   │
│ 8   ┆ 2   ┆ 00001000 ┆ 00000010 ┆ 10     ┆ 00001010   │
│ 250 ┆ 3   ┆ 11111010 ┆ 00000011 ┆ 249    ┆ 11111001   │
│ 66  ┆ 4   ┆ 01000010 ┆ 00000100 ┆ 70     ┆ 01000110   │
└─────┴─────┴──────────┴──────────┴────────┴────────────┘