Expressions#
This page gives an overview of all public polars expressions.
- class polars.Expr[source]
Expressions that can be used in various contexts.
Methods:
Compute absolute values.
Method equivalent of addition operator
expr + other
.Get the group indexes of the group by operation.
Rename the expression.
Check if all boolean values in a Boolean column are True.
Method equivalent of bitwise "and" operator
expr & other & ...
.Check if any boolean value in a Boolean column is True.
Append expressions.
Apply a custom/user-defined function (UDF) in a GroupBy or Projection context.
Approximate count of unique values.
Compute the element-wise value for the inverse cosine.
Compute the element-wise value for the inverse hyperbolic cosine.
Compute the element-wise value for the inverse sine.
Compute the element-wise value for the inverse hyperbolic sine.
Compute the element-wise value for the inverse tangent.
Compute the element-wise value for the inverse hyperbolic tangent.
Get the index of the maximal value.
Get the index of the minimal value.
Get the index values that would sort this column.
Return indices where expression evaluates True.
Get index of first unique value.
Fill missing values with the next to be seen values.
Return the k smallest elements.
Cache this expression so that it only is executed once per context.
Cast between data types.
Compute the cube root of the elements.
Rounds up to the nearest integer value.
Clip (limit) the values in an array to a min and max boundary.
Clip (limit) the values in an array to a max boundary.
Clip (limit) the values in an array to a min boundary.
Compute the element-wise value for the cosine.
Compute the element-wise value for the hyperbolic cosine.
Count the number of values in this expression.
Get an array with the cumulative count computed at every element.
Get an array with the cumulative max computed at every element.
Get an array with the cumulative min computed at every element.
Get an array with the cumulative product computed at every element.
Get an array with the cumulative sum computed at every element.
Run an expression over a sliding window that increases 1 slot every iteration.
Bin continuous values into discrete categories.
Convert from radians to degrees.
Calculate the n-th discrete difference.
Compute the dot/inner product between two Expressions.
Drop floating point NaN values.
Drop all null values.
Computes the entropy.
Method equivalent of equality operator
expr == other
.Method equivalent of equality operator
expr == other
where None == None`.Exponentially-weighted moving average.
Exponentially-weighted moving standard deviation.
Exponentially-weighted moving variance.
Exclude columns from a multi-column expression.
Compute the exponential, element-wise.
Explode a list expression.
Extremely fast method for extending the Series with 'n' copies of a value.
Fill floating point NaN value with a fill value.
Fill null values using the specified value or strategy.
Filter a single column.
Get the first value.
Flatten a list or string column.
Rounds down to the nearest integer value.
Method equivalent of integer division operator
expr // other
.Fill missing values with the latest seen values.
Read an expression from a JSON encoded string to construct an Expression.
Method equivalent of "greater than or equal" operator
expr >= other
.Method equivalent of "greater than" operator
expr > other
.Hash the elements in the selection.
Get the first n rows.
Aggregate values into a list.
Print the value that this expression evaluates to and pass on the value.
Fill null values using interpolation.
Check if this expression is between the given start and end values.
Get mask of duplicated values.
Returns a boolean Series indicating which values are finite.
Get a mask of the first unique value.
Check if elements of this expression are present in the other Series.
Returns a boolean Series indicating which values are infinite.
Returns a boolean Series indicating which values are NaN.
Negate a boolean expression.
Returns a boolean Series indicating which values are not NaN.
Returns a boolean Series indicating which values are not null.
Returns a boolean Series indicating which values are null.
Get mask of unique values.
Keep the original root name of the expression.
Compute the kurtosis (Fisher or Pearson) of a dataset.
Get the last value.
Method equivalent of "less than or equal" operator
expr <= other
.Count the number of values in this expression.
Get the first n rows (alias for
Expr.head()
).Compute the logarithm to a given base.
Compute the base 10 logarithm of the input array, element-wise.
Compute the natural logarithm of each element plus one.
Calculate the lower bound.
Method equivalent of "less than" operator
expr < other
.Apply a custom python function to a Series or sequence of Series.
Rename the output of an expression by mapping a function over the root name.
Replace values in column according to remapping dictionary.
Get maximum value.
Get mean value.
Get median value using linear interpolation.
Get minimum value.
Method equivalent of modulus operator
expr % other
.Compute the most occurring value(s).
Method equivalent of multiplication operator
expr * other
.Count unique values.
Get maximum value, but propagate/poison encountered NaN values.
Get minimum value, but propagate/poison encountered NaN values.
Method equivalent of inequality operator
expr != other
.Method equivalent of equality operator
expr != other
where None == None`.Count null values.
Method equivalent of bitwise "or" operator
expr | other | ...
.Compute expressions over the given groups.
Computes percentage change between values.
Offers a structured way to apply a sequence of user-defined functions (UDFs).
Method equivalent of exponentiation operator
expr ** exponent
.Add a prefix to the root column name of the expression.
Compute the product of an expression.
Bin continuous values into discrete categories based on their quantiles.
Get quantile value.
Convert from degrees to radians.
Assign ranks to data, dealing with ties appropriately.
Create a single chunk of memory for this Series.
Reinterpret the underlying bits as a signed/unsigned integer.
Repeat the elements in this Series as specified in the given expression.
Reshape this Expr to a flat Series or a Series of Lists.
Reverse the selection.
Get the lengths of runs of identical values.
Map values to run IDs.
Apply a custom rolling window function.
Apply a rolling max (moving max) over the values in this array.
Apply a rolling mean (moving mean) over the values in this array.
Compute a rolling median.
Apply a rolling min (moving min) over the values in this array.
Compute a rolling quantile.
Compute a rolling skew.
Compute a rolling standard deviation.
Apply a rolling sum (moving sum) over the values in this array.
Compute a rolling variance.
Round underlying floating point data by decimals digits.
Sample from this expression.
Find indices where elements should be inserted to maintain order.
Flags the expression as 'sorted'.
Shift the values by a given period.
Shift the values by a given period and fill the resulting null values.
Shrink numeric columns to the minimal required datatype.
Shuffle the contents of this expression.
Compute the element-wise indication of the sign.
Compute the element-wise value for the sine.
Compute the element-wise value for the hyperbolic sine.
Compute the sample skewness of a data set.
Get a slice of this expression.
Sort this column.
Sort this column by the ordering of other columns.
Compute the square root of the elements.
Get standard deviation.
Method equivalent of subtraction operator
expr - other
.Add a suffix to the root column name of the expression.
Get sum value.
Get the last n rows.
Take values by index.
Take every nth value in the Series and return as a new Series.
Compute the element-wise value for the tangent.
Compute the element-wise value for the hyperbolic tangent.
Cast to physical representation of the logical dtype.
Return the k largest elements.
Method equivalent of float division operator
expr / other
.Get unique values of this expression.
Return a count of the unique values in the order of appearance.
Calculate the upper bound.
Count all unique values and create a struct mapping value to count.
Get variance.
Filter a single column.
Method equivalent of bitwise exclusive-or operator
expr ^ other
.- abs() Self [source]
Compute absolute values.
Same as abs(expr).
Examples
>>> df = pl.DataFrame( ... { ... "A": [-1.0, 0.0, 1.0, 2.0], ... } ... ) >>> df.select(pl.col("A").abs()) shape: (4, 1) βββββββ β A β β --- β β f64 β βββββββ‘ β 1.0 β β 0.0 β β 1.0 β β 2.0 β βββββββ
- add(other: Any) Self [source]
Method equivalent of addition operator
expr + other
.- Parameters:
- other
numeric or string value; accepts expression input.
Examples
>>> df = pl.DataFrame({"x": [1, 2, 3, 4, 5]}) >>> df.with_columns( ... pl.col("x").add(2).alias("x+int"), ... pl.col("x").add(pl.col("x").cumprod()).alias("x+expr"), ... ) shape: (5, 3) βββββββ¬ββββββββ¬βββββββββ β x β x+int β x+expr β β --- β --- β --- β β i64 β i64 β i64 β βββββββͺββββββββͺβββββββββ‘ β 1 β 3 β 2 β β 2 β 4 β 4 β β 3 β 5 β 9 β β 4 β 6 β 28 β β 5 β 7 β 125 β βββββββ΄ββββββββ΄βββββββββ
>>> df = pl.DataFrame( ... {"x": ["a", "d", "g"], "y": ["b", "e", "h"], "z": ["c", "f", "i"]} ... ) >>> df.with_columns(pl.col("x").add(pl.col("y")).add(pl.col("z")).alias("xyz")) shape: (3, 4) βββββββ¬ββββββ¬ββββββ¬ββββββ β x β y β z β xyz β β --- β --- β --- β --- β β str β str β str β str β βββββββͺββββββͺββββββͺββββββ‘ β a β b β c β abc β β d β e β f β def β β g β h β i β ghi β βββββββ΄ββββββ΄ββββββ΄ββββββ
- agg_groups() Self [source]
Get the group indexes of the group by operation.
Should be used in aggregation context only.
Examples
>>> df = pl.DataFrame( ... { ... "group": [ ... "one", ... "one", ... "one", ... "two", ... "two", ... "two", ... ], ... "value": [94, 95, 96, 97, 97, 99], ... } ... ) >>> df.groupby("group", maintain_order=True).agg(pl.col("value").agg_groups()) shape: (2, 2) βββββββββ¬ββββββββββββ β group β value β β --- β --- β β str β list[u32] β βββββββββͺββββββββββββ‘ β one β [0, 1, 2] β β two β [3, 4, 5] β βββββββββ΄ββββββββββββ
- alias(name: str) Self [source]
Rename the expression.
- Parameters:
- name
The new name.
Examples
Rename an expression to avoid overwriting an existing column.
>>> df = pl.DataFrame( ... { ... "a": [1, 2, 3], ... "b": ["x", "y", "z"], ... } ... ) >>> df.with_columns( ... pl.col("a") + 10, ... pl.col("b").str.to_uppercase().alias("c"), ... ) shape: (3, 3) βββββββ¬ββββββ¬ββββββ β a β b β c β β --- β --- β --- β β i64 β str β str β βββββββͺββββββͺββββββ‘ β 11 β x β X β β 12 β y β Y β β 13 β z β Z β βββββββ΄ββββββ΄ββββββ
Overwrite the default name of literal columns to prevent errors due to duplicate column names.
>>> df.with_columns( ... pl.lit(True).alias("c"), ... pl.lit(4.0).alias("d"), ... ) shape: (3, 4) βββββββ¬ββββββ¬βββββββ¬ββββββ β a β b β c β d β β --- β --- β --- β --- β β i64 β str β bool β f64 β βββββββͺββββββͺβββββββͺββββββ‘ β 1 β x β true β 4.0 β β 2 β y β true β 4.0 β β 3 β z β true β 4.0 β βββββββ΄ββββββ΄βββββββ΄ββββββ
- all(drop_nulls: bool = True) Self [source]
Check if all boolean values in a Boolean column are True.
This method is an expression - not to be confused with
polars.all()
which is a function to select all columns.- Parameters:
- drop_nulls
If False, return None if there are any nulls.
- Returns:
- Expr
Expression of data type
Boolean
.
Examples
>>> df = pl.DataFrame( ... {"TT": [True, True], "TF": [True, False], "FF": [False, False]} ... ) >>> df.select(pl.col("*").all()) shape: (1, 3) ββββββββ¬ββββββββ¬ββββββββ β TT β TF β FF β β --- β --- β --- β β bool β bool β bool β ββββββββͺββββββββͺββββββββ‘ β true β false β false β ββββββββ΄ββββββββ΄ββββββββ >>> df = pl.DataFrame(dict(x=[None, False], y=[None, True])) >>> df.select(pl.col("x").all(True), pl.col("y").all(True)) shape: (1, 2) βββββββββ¬ββββββββ β x β y β β --- β --- β β bool β bool β βββββββββͺββββββββ‘ β false β false β βββββββββ΄ββββββββ >>> df.select(pl.col("x").all(False), pl.col("y").all(False)) shape: (1, 2) ββββββββ¬βββββββ β x β y β β --- β --- β β bool β bool β ββββββββͺβββββββ‘ β null β null β ββββββββ΄βββββββ
- and_(*others: Any) Self [source]
Method equivalent of bitwise βandβ operator
expr & other & ...
.- Parameters:
- *others
One or more integer or boolean expressions to evaluate/combine.
Examples
>>> df = pl.DataFrame( ... data={ ... "x": [5, 6, 7, 4, 8], ... "y": [1.5, 2.5, 1.0, 4.0, -5.75], ... "z": [-9, 2, -1, 4, 8], ... } ... ) >>> df.select( ... (pl.col("x") >= pl.col("z")) ... .and_( ... pl.col("y") >= pl.col("z"), ... pl.col("y") == pl.col("y"), ... pl.col("z") <= pl.col("x"), ... pl.col("y") != pl.col("x"), ... ) ... .alias("all") ... ) shape: (5, 1) βββββββββ β all β β --- β β bool β βββββββββ‘ β true β β true β β true β β false β β false β βββββββββ
- any(drop_nulls: bool = True) Self [source]
Check if any boolean value in a Boolean column is True.
- Parameters:
- drop_nulls
If False, return None if there are nulls but no Trues.
- Returns:
- Expr
Expression of data type
Boolean
.
Examples
>>> df = pl.DataFrame({"TF": [True, False], "FF": [False, False]}) >>> df.select(pl.all().any()) shape: (1, 2) ββββββββ¬ββββββββ β TF β FF β β --- β --- β β bool β bool β ββββββββͺββββββββ‘ β true β false β ββββββββ΄ββββββββ >>> df = pl.DataFrame(dict(x=[None, False], y=[None, True])) >>> df.select(pl.col("x").any(True), pl.col("y").any(True)) shape: (1, 2) βββββββββ¬βββββββ β x β y β β --- β --- β β bool β bool β βββββββββͺβββββββ‘ β false β true β βββββββββ΄βββββββ >>> df.select(pl.col("x").any(False), pl.col("y").any(False)) shape: (1, 2) ββββββββ¬βββββββ β x β y β β --- β --- β β bool β bool β ββββββββͺβββββββ‘ β null β true β ββββββββ΄βββββββ
- append(other: IntoExpr, *, upcast: bool = True) Self [source]
Append expressions.
This is done by adding the chunks of other to this Series.
- Parameters:
- other
Expression to append.
- upcast
Cast both Series to the same supertype.
Examples
>>> df = pl.DataFrame( ... { ... "a": [8, 9, 10], ... "b": [None, 4, 4], ... } ... ) >>> df.select(pl.all().head(1).append(pl.all().tail(1))) shape: (2, 2) βββββββ¬βββββββ β a β b β β --- β --- β β i64 β i64 β βββββββͺβββββββ‘ β 8 β null β β 10 β 4 β βββββββ΄βββββββ
- apply(
- function: Callable[[Series], Series] | Callable[[Any], Any],
- return_dtype: PolarsDataType | None = None,
- *,
- skip_nulls: bool = True,
- pass_name: bool = False,
- strategy: ApplyStrategy = 'thread_local',
Apply a custom/user-defined function (UDF) in a GroupBy or Projection context.
Warning
This method is much slower than the native expressions API. Only use it if you cannot implement your logic otherwise.
Depending on the context it has the following behavior:
- Selection
Expects f to be of type Callable[[Any], Any]. Applies a python function over each individual value in the column.
- GroupBy
Expects f to be of type Callable[[Series], Series]. Applies a python function over each group.
- Parameters:
- function
Lambda/ function to apply.
- return_dtype
Dtype of the output Series. If not set, the dtype will be
polars.Unknown
.- skip_nulls
Donβt apply the function over values that contain nulls. This is faster.
- pass_name
Pass the Series name to the custom function This is more expensive.
- strategy{βthread_localβ, βthreadingβ}
This functionality is in alpha stage. This may be removed /changed without it being considered a breaking change.
βthread_localβ: run the python function on a single thread.
- βthreadingβ: run the python function on separate threads. Use with
care as this can slow performance. This might only speed up your code if the amount of work per element is significant and the python function releases the GIL (e.g. via calling a c function)
Warning
If
return_dtype
is not provided, this may lead to unexpected results. We allow this, but it is considered a bug in the userβs query.Notes
Using
apply
is strongly discouraged as you will be effectively running python βforβ loops. This will be very slow. Wherever possible you should strongly prefer the native expression API to achieve the best performance.If your function is expensive and you donβt want it to be called more than once for a given input, consider applying an
@lru_cache
decorator to it. With suitable data you may achieve order-of-magnitude speedups (or more).
Examples
>>> df = pl.DataFrame( ... { ... "a": [1, 2, 3, 1], ... "b": ["a", "b", "c", "c"], ... } ... )
In a selection context, the function is applied by row.
>>> df.with_columns( ... pl.col("a").apply(lambda x: x * 2).alias("a_times_2"), ... ) shape: (4, 3) βββββββ¬ββββββ¬ββββββββββββ β a β b β a_times_2 β β --- β --- β --- β β i64 β str β i64 β βββββββͺββββββͺββββββββββββ‘ β 1 β a β 2 β β 2 β b β 4 β β 3 β c β 6 β β 1 β c β 2 β βββββββ΄ββββββ΄ββββββββββββ
It is better to implement this with an expression:
>>> df.with_columns( ... (pl.col("a") * 2).alias("a_times_2"), ... )
In a GroupBy context the function is applied by group:
>>> df.lazy().groupby("b", maintain_order=True).agg( ... pl.col("a").apply(lambda x: x.sum()) ... ).collect() shape: (3, 2) βββββββ¬ββββββ β b β a β β --- β --- β β str β i64 β βββββββͺββββββ‘ β a β 1 β β b β 2 β β c β 4 β βββββββ΄ββββββ
It is better to implement this with an expression:
>>> df.groupby("b", maintain_order=True).agg( ... pl.col("a").sum(), ... )
- approx_n_unique() Self [source]
Approximate count of unique values.
This is done using the HyperLogLog++ algorithm for cardinality estimation.
Examples
>>> df = pl.DataFrame({"a": [1, 1, 2]}) >>> df.select(pl.col("a").approx_n_unique()) shape: (1, 1) βββββββ β a β β --- β β u32 β βββββββ‘ β 2 β βββββββ
- arccos() Self [source]
Compute the element-wise value for the inverse cosine.
- Returns:
- Expr
Expression of data type
Float64
.
Examples
>>> df = pl.DataFrame({"a": [0.0]}) >>> df.select(pl.col("a").arccos()) shape: (1, 1) ββββββββββββ β a β β --- β β f64 β ββββββββββββ‘ β 1.570796 β ββββββββββββ
- arccosh() Self [source]
Compute the element-wise value for the inverse hyperbolic cosine.
- Returns:
- Expr
Expression of data type
Float64
.
Examples
>>> df = pl.DataFrame({"a": [1.0]}) >>> df.select(pl.col("a").arccosh()) shape: (1, 1) βββββββ β a β β --- β β f64 β βββββββ‘ β 0.0 β βββββββ
- arcsin() Self [source]
Compute the element-wise value for the inverse sine.
- Returns:
- Expr
Expression of data type
Float64
.
Examples
>>> df = pl.DataFrame({"a": [1.0]}) >>> df.select(pl.col("a").arcsin()) shape: (1, 1) ββββββββββββ β a β β --- β β f64 β ββββββββββββ‘ β 1.570796 β ββββββββββββ
- arcsinh() Self [source]
Compute the element-wise value for the inverse hyperbolic sine.
- Returns:
- Expr
Expression of data type
Float64
.
Examples
>>> df = pl.DataFrame({"a": [1.0]}) >>> df.select(pl.col("a").arcsinh()) shape: (1, 1) ββββββββββββ β a β β --- β β f64 β ββββββββββββ‘ β 0.881374 β ββββββββββββ
- arctan() Self [source]
Compute the element-wise value for the inverse tangent.
- Returns:
- Expr
Expression of data type
Float64
.
Examples
>>> df = pl.DataFrame({"a": [1.0]}) >>> df.select(pl.col("a").arctan()) shape: (1, 1) ββββββββββββ β a β β --- β β f64 β ββββββββββββ‘ β 0.785398 β ββββββββββββ
- arctanh() Self [source]
Compute the element-wise value for the inverse hyperbolic tangent.
- Returns:
- Expr
Expression of data type
Float64
.
Examples
>>> df = pl.DataFrame({"a": [1.0]}) >>> df.select(pl.col("a").arctanh()) shape: (1, 1) βββββββ β a β β --- β β f64 β βββββββ‘ β inf β βββββββ
- arg_max() Self [source]
Get the index of the maximal value.
Examples
>>> df = pl.DataFrame( ... { ... "a": [20, 10, 30], ... } ... ) >>> df.select(pl.col("a").arg_max()) shape: (1, 1) βββββββ β a β β --- β β u32 β βββββββ‘ β 2 β βββββββ
- arg_min() Self [source]
Get the index of the minimal value.
Examples
>>> df = pl.DataFrame( ... { ... "a": [20, 10, 30], ... } ... ) >>> df.select(pl.col("a").arg_min()) shape: (1, 1) βββββββ β a β β --- β β u32 β βββββββ‘ β 1 β βββββββ
- arg_sort(*, descending: bool = False, nulls_last: bool = False) Self [source]
Get the index values that would sort this column.
- Parameters:
- descending
Sort in descending (descending) order.
- nulls_last
Place null values last instead of first.
- Returns:
- Expr
Expression of data type
UInt32
.
Examples
>>> df = pl.DataFrame( ... { ... "a": [20, 10, 30], ... } ... ) >>> df.select(pl.col("a").arg_sort()) shape: (3, 1) βββββββ β a β β --- β β u32 β βββββββ‘ β 1 β β 0 β β 2 β βββββββ
- arg_true() Self [source]
Return indices where expression evaluates True.
Warning
Modifies number of rows returned, so will fail in combination with other expressions. Use as only expression in select / with_columns.
See also
Series.arg_true
Return indices where Series is True
polars.arg_where
Examples
>>> df = pl.DataFrame({"a": [1, 1, 2, 1]}) >>> df.select((pl.col("a") == 1).arg_true()) shape: (3, 1) βββββββ β a β β --- β β u32 β βββββββ‘ β 0 β β 1 β β 3 β βββββββ
- arg_unique() Self [source]
Get index of first unique value.
Examples
>>> df = pl.DataFrame( ... { ... "a": [8, 9, 10], ... "b": [None, 4, 4], ... } ... ) >>> df.select(pl.col("a").arg_unique()) shape: (3, 1) βββββββ β a β β --- β β u32 β βββββββ‘ β 0 β β 1 β β 2 β βββββββ >>> df.select(pl.col("b").arg_unique()) shape: (2, 1) βββββββ β b β β --- β β u32 β βββββββ‘ β 0 β β 1 β βββββββ
- backward_fill(limit: int | None = None) Self [source]
Fill missing values with the next to be seen values.
- Parameters:
- limit
The number of consecutive null values to backward fill.
Examples
>>> df = pl.DataFrame( ... { ... "a": [1, 2, None], ... "b": [4, None, 6], ... "c": [None, None, 2], ... } ... ) >>> df.select(pl.all().backward_fill()) shape: (3, 3) ββββββββ¬ββββββ¬ββββββ β a β b β c β β --- β --- β --- β β i64 β i64 β i64 β ββββββββͺββββββͺββββββ‘ β 1 β 4 β 2 β β 2 β 6 β 2 β β null β 6 β 2 β ββββββββ΄ββββββ΄ββββββ >>> df.select(pl.all().backward_fill(limit=1)) shape: (3, 3) ββββββββ¬ββββββ¬βββββββ β a β b β c β β --- β --- β --- β β i64 β i64 β i64 β ββββββββͺββββββͺβββββββ‘ β 1 β 4 β null β β 2 β 6 β 2 β β null β 6 β 2 β ββββββββ΄ββββββ΄βββββββ
- bottom_k(k: int = 5) Self [source]
Return the k smallest elements.
This has time complexity:
\[\begin{split}O(n + k \\log{}n - \frac{k}{2})\end{split}\]- Parameters:
- k
Number of elements to return.
See also
Examples
>>> df = pl.DataFrame( ... { ... "value": [1, 98, 2, 3, 99, 4], ... } ... ) >>> df.select( ... [ ... pl.col("value").top_k().alias("top_k"), ... pl.col("value").bottom_k().alias("bottom_k"), ... ] ... ) shape: (5, 2) βββββββββ¬βββββββββββ β top_k β bottom_k β β --- β --- β β i64 β i64 β βββββββββͺβββββββββββ‘ β 99 β 1 β β 98 β 2 β β 4 β 3 β β 3 β 4 β β 2 β 98 β βββββββββ΄βββββββββββ
- cache() Self [source]
Cache this expression so that it only is executed once per context.
Deprecated since version 0.18.9: This method now does nothing. It has been superseded by the comm_subexpr_elim setting on LazyFrame.collect, which automatically caches expressions that are equal.
- cast(dtype: PolarsDataType | type[Any], *, strict: bool = True) Self [source]
Cast between data types.
- Parameters:
- dtype
DataType to cast to.
- strict
Throw an error if a cast could not be done. For instance, due to an overflow.
Examples
>>> df = pl.DataFrame( ... { ... "a": [1, 2, 3], ... "b": ["4", "5", "6"], ... } ... ) >>> df.with_columns( ... [ ... pl.col("a").cast(pl.Float64), ... pl.col("b").cast(pl.Int32), ... ] ... ) shape: (3, 2) βββββββ¬ββββββ β a β b β β --- β --- β β f64 β i32 β βββββββͺββββββ‘ β 1.0 β 4 β β 2.0 β 5 β β 3.0 β 6 β βββββββ΄ββββββ
- cbrt() Self [source]
Compute the cube root of the elements.
Examples
>>> df = pl.DataFrame({"values": [1.0, 2.0, 4.0]}) >>> df.select(pl.col("values").cbrt()) shape: (3, 1) ββββββββββββ β values β β --- β β f64 β ββββββββββββ‘ β 1.0 β β 1.259921 β β 1.587401 β ββββββββββββ
- ceil() Self [source]
Rounds up to the nearest integer value.
Only works on floating point Series.
Examples
>>> df = pl.DataFrame({"a": [0.3, 0.5, 1.0, 1.1]}) >>> df.select(pl.col("a").ceil()) shape: (4, 1) βββββββ β a β β --- β β f64 β βββββββ‘ β 1.0 β β 1.0 β β 1.0 β β 2.0 β βββββββ
- clip(lower_bound: int | float, upper_bound: int | float) Self [source]
Clip (limit) the values in an array to a min and max boundary.
Only works for numerical types.
If you want to clip other dtypes, consider writing a βwhen, then, otherwiseβ expression. See
when()
for more information.- Parameters:
- lower_bound
Lower bound.
- upper_bound
Upper bound.
Examples
>>> df = pl.DataFrame({"foo": [-50, 5, None, 50]}) >>> df.with_columns(pl.col("foo").clip(1, 10).alias("foo_clipped")) shape: (4, 2) ββββββββ¬ββββββββββββββ β foo β foo_clipped β β --- β --- β β i64 β i64 β ββββββββͺββββββββββββββ‘ β -50 β 1 β β 5 β 5 β β null β null β β 50 β 10 β ββββββββ΄ββββββββββββββ
- clip_max(upper_bound: int | float) Self [source]
Clip (limit) the values in an array to a max boundary.
Only works for numerical types.
If you want to clip other dtypes, consider writing a βwhen, then, otherwiseβ expression. See
when()
for more information.- Parameters:
- upper_bound
Upper bound.
Examples
>>> df = pl.DataFrame({"foo": [-50, 5, None, 50]}) >>> df.with_columns(pl.col("foo").clip_max(0).alias("foo_clipped")) shape: (4, 2) ββββββββ¬ββββββββββββββ β foo β foo_clipped β β --- β --- β β i64 β i64 β ββββββββͺββββββββββββββ‘ β -50 β -50 β β 5 β 0 β β null β null β β 50 β 0 β ββββββββ΄ββββββββββββββ
- clip_min(lower_bound: int | float) Self [source]
Clip (limit) the values in an array to a min boundary.
Only works for numerical types.
If you want to clip other dtypes, consider writing a βwhen, then, otherwiseβ expression. See
when()
for more information.- Parameters:
- lower_bound
Lower bound.
Examples
>>> df = pl.DataFrame({"foo": [-50, 5, None, 50]}) >>> df.with_columns(pl.col("foo").clip_min(0).alias("foo_clipped")) shape: (4, 2) ββββββββ¬ββββββββββββββ β foo β foo_clipped β β --- β --- β β i64 β i64 β ββββββββͺββββββββββββββ‘ β -50 β 0 β β 5 β 5 β β null β null β β 50 β 50 β ββββββββ΄ββββββββββββββ
- cos() Self [source]
Compute the element-wise value for the cosine.
- Returns:
- Expr
Expression of data type
Float64
.
Examples
>>> df = pl.DataFrame({"a": [0.0]}) >>> df.select(pl.col("a").cos()) shape: (1, 1) βββββββ β a β β --- β β f64 β βββββββ‘ β 1.0 β βββββββ
- cosh() Self [source]
Compute the element-wise value for the hyperbolic cosine.
- Returns:
- Expr
Expression of data type
Float64
.
Examples
>>> df = pl.DataFrame({"a": [1.0]}) >>> df.select(pl.col("a").cosh()) shape: (1, 1) ββββββββββββ β a β β --- β β f64 β ββββββββββββ‘ β 1.543081 β ββββββββββββ
- count() Self [source]
Count the number of values in this expression.
Warning
null is deemed a value in this context.
Examples
>>> df = pl.DataFrame({"a": [8, 9, 10], "b": [None, 4, 4]}) >>> df.select(pl.all().count()) # counts nulls shape: (1, 2) βββββββ¬ββββββ β a β b β β --- β --- β β u32 β u32 β βββββββͺββββββ‘ β 3 β 3 β βββββββ΄ββββββ
- cumcount(*, reverse: bool = False) Self [source]
Get an array with the cumulative count computed at every element.
Counting from 0 to len
- Parameters:
- reverse
Reverse the operation.
Examples
>>> df = pl.DataFrame({"a": [1, 2, 3, 4]}) >>> df.select( ... [ ... pl.col("a").cumcount(), ... pl.col("a").cumcount(reverse=True).alias("a_reverse"), ... ] ... ) shape: (4, 2) βββββββ¬ββββββββββββ β a β a_reverse β β --- β --- β β u32 β u32 β βββββββͺββββββββββββ‘ β 0 β 3 β β 1 β 2 β β 2 β 1 β β 3 β 0 β βββββββ΄ββββββββββββ
- cummax(*, reverse: bool = False) Self [source]
Get an array with the cumulative max computed at every element.
- Parameters:
- reverse
Reverse the operation.
Examples
>>> df = pl.DataFrame({"a": [1, 2, 3, 4]}) >>> df.select( ... [ ... pl.col("a").cummax(), ... pl.col("a").cummax(reverse=True).alias("a_reverse"), ... ] ... ) shape: (4, 2) βββββββ¬ββββββββββββ β a β a_reverse β β --- β --- β β i64 β i64 β βββββββͺββββββββββββ‘ β 1 β 4 β β 2 β 4 β β 3 β 4 β β 4 β 4 β βββββββ΄ββββββββββββ
Null values are excluded, but can also be filled by calling
forward_fill
.>>> df = pl.DataFrame({"values": [None, 10, None, 8, 9, None, 16, None]}) >>> df.with_columns( ... [ ... pl.col("values").cummax().alias("value_cummax"), ... pl.col("values") ... .cummax() ... .forward_fill() ... .alias("value_cummax_all_filled"), ... ] ... ) shape: (8, 3) ββββββββββ¬βββββββββββββββ¬ββββββββββββββββββββββββββ β values β value_cummax β value_cummax_all_filled β β --- β --- β --- β β i64 β i64 β i64 β ββββββββββͺβββββββββββββββͺββββββββββββββββββββββββββ‘ β null β null β null β β 10 β 10 β 10 β β null β null β 10 β β 8 β 10 β 10 β β 9 β 10 β 10 β β null β null β 10 β β 16 β 16 β 16 β β null β null β 16 β ββββββββββ΄βββββββββββββββ΄ββββββββββββββββββββββββββ
- cummin(*, reverse: bool = False) Self [source]
Get an array with the cumulative min computed at every element.
- Parameters:
- reverse
Reverse the operation.
Examples
>>> df = pl.DataFrame({"a": [1, 2, 3, 4]}) >>> df.select( ... [ ... pl.col("a").cummin(), ... pl.col("a").cummin(reverse=True).alias("a_reverse"), ... ] ... ) shape: (4, 2) βββββββ¬ββββββββββββ β a β a_reverse β β --- β --- β β i64 β i64 β βββββββͺββββββββββββ‘ β 1 β 1 β β 1 β 2 β β 1 β 3 β β 1 β 4 β βββββββ΄ββββββββββββ
- cumprod(*, reverse: bool = False) Self [source]
Get an array with the cumulative product computed at every element.
- Parameters:
- reverse
Reverse the operation.
Notes
Dtypes in {Int8, UInt8, Int16, UInt16} are cast to Int64 before summing to prevent overflow issues.
Examples
>>> df = pl.DataFrame({"a": [1, 2, 3, 4]}) >>> df.select( ... [ ... pl.col("a").cumprod(), ... pl.col("a").cumprod(reverse=True).alias("a_reverse"), ... ] ... ) shape: (4, 2) βββββββ¬ββββββββββββ β a β a_reverse β β --- β --- β β i64 β i64 β βββββββͺββββββββββββ‘ β 1 β 24 β β 2 β 24 β β 6 β 12 β β 24 β 4 β βββββββ΄ββββββββββββ
- cumsum(*, reverse: bool = False) Self [source]
Get an array with the cumulative sum computed at every element.
- Parameters:
- reverse
Reverse the operation.
Notes
Dtypes in {Int8, UInt8, Int16, UInt16} are cast to Int64 before summing to prevent overflow issues.
Examples
>>> df = pl.DataFrame({"a": [1, 2, 3, 4]}) >>> df.select( ... [ ... pl.col("a").cumsum(), ... pl.col("a").cumsum(reverse=True).alias("a_reverse"), ... ] ... ) shape: (4, 2) βββββββ¬ββββββββββββ β a β a_reverse β β --- β --- β β i64 β i64 β βββββββͺββββββββββββ‘ β 1 β 10 β β 3 β 9 β β 6 β 7 β β 10 β 4 β βββββββ΄ββββββββββββ
Null values are excluded, but can also be filled by calling
forward_fill
.>>> df = pl.DataFrame({"values": [None, 10, None, 8, 9, None, 16, None]}) >>> df.with_columns( ... [ ... pl.col("values").cumsum().alias("value_cumsum"), ... pl.col("values") ... .cumsum() ... .forward_fill() ... .alias("value_cumsum_all_filled"), ... ] ... ) shape: (8, 3) ββββββββββ¬βββββββββββββββ¬ββββββββββββββββββββββββββ β values β value_cumsum β value_cumsum_all_filled β β --- β --- β --- β β i64 β i64 β i64 β ββββββββββͺβββββββββββββββͺββββββββββββββββββββββββββ‘ β null β null β null β β 10 β 10 β 10 β β null β null β 10 β β 8 β 18 β 18 β β 9 β 27 β 27 β β null β null β 27 β β 16 β 43 β 43 β β null β null β 43 β ββββββββββ΄βββββββββββββββ΄ββββββββββββββββββββββββββ
- cumulative_eval( ) Self [source]
Run an expression over a sliding window that increases 1 slot every iteration.
- Parameters:
- expr
Expression to evaluate
- min_periods
Number of valid values there should be in the window before the expression is evaluated. valid values = length - null_count
- parallel
Run in parallel. Donβt do this in a groupby or another operation that already has much parallelization.
Warning
This functionality is experimental and may change without it being considered a breaking change.
This can be really slow as it can have O(n^2) complexity. Donβt use this for operations that visit all elements.
Examples
>>> df = pl.DataFrame({"values": [1, 2, 3, 4, 5]}) >>> df.select( ... [ ... pl.col("values").cumulative_eval( ... pl.element().first() - pl.element().last() ** 2 ... ) ... ] ... ) shape: (5, 1) ββββββββββ β values β β --- β β f64 β ββββββββββ‘ β 0.0 β β -3.0 β β -8.0 β β -15.0 β β -24.0 β ββββββββββ
- cut(
- breaks: Sequence[float],
- *,
- labels: Sequence[str] | None = None,
- left_closed: bool = False,
- include_breaks: bool = False,
Bin continuous values into discrete categories.
- Parameters:
- breaks
List of unique cut points.
- labels
Names of the categories. The number of labels must be equal to the number of cut points plus one.
- left_closed
Set the intervals to be left-closed instead of right-closed.
- include_breaks
Include a column with the right endpoint of the bin each observation falls in. This will change the data type of the output from a
Categorical
to aStruct
.
- Returns:
- Expr
Expression of data type
Categorical
ifinclude_breaks
is set toFalse
(default), otherwise an expression of data typeStruct
.
See also
Examples
Divide a column into three categories.
>>> df = pl.DataFrame({"foo": [-2, -1, 0, 1, 2]}) >>> df.with_columns( ... pl.col("foo").cut([-1, 1], labels=["a", "b", "c"]).alias("cut") ... ) shape: (5, 2) βββββββ¬ββββββ β foo β cut β β --- β --- β β i64 β cat β βββββββͺββββββ‘ β -2 β a β β -1 β a β β 0 β b β β 1 β b β β 2 β c β βββββββ΄ββββββ
Add both the category and the breakpoint.
>>> df.with_columns( ... pl.col("foo").cut([-1, 1], include_breaks=True).alias("cut") ... ).unnest("cut") shape: (5, 3) βββββββ¬βββββββ¬βββββββββββββ β foo β brk β foo_bin β β --- β --- β --- β β i64 β f64 β cat β βββββββͺβββββββͺβββββββββββββ‘ β -2 β -1.0 β (-inf, -1] β β -1 β -1.0 β (-inf, -1] β β 0 β 1.0 β (-1, 1] β β 1 β 1.0 β (-1, 1] β β 2 β inf β (1, inf] β βββββββ΄βββββββ΄βββββββββββββ
- degrees() Self [source]
Convert from radians to degrees.
- Returns:
- Expr
Expression of data type
Float64
.
Examples
>>> import math >>> df = pl.DataFrame({"a": [x * math.pi for x in range(-4, 5)]}) >>> df.select(pl.col("a").degrees()) shape: (9, 1) ββββββββββ β a β β --- β β f64 β ββββββββββ‘ β -720.0 β β -540.0 β β -360.0 β β -180.0 β β 0.0 β β 180.0 β β 360.0 β β 540.0 β β 720.0 β ββββββββββ
- diff(n: int = 1, null_behavior: NullBehavior = 'ignore') Self [source]
Calculate the n-th discrete difference.
- Parameters:
- n
Number of slots to shift.
- null_behavior{βignoreβ, βdropβ}
How to handle null values.
Examples
>>> df = pl.DataFrame({"int": [20, 10, 30, 25, 35]}) >>> df.with_columns(change=pl.col("int").diff()) shape: (5, 2) βββββββ¬βββββββββ β int β change β β --- β --- β β i64 β i64 β βββββββͺβββββββββ‘ β 20 β null β β 10 β -10 β β 30 β 20 β β 25 β -5 β β 35 β 10 β βββββββ΄βββββββββ
>>> df.with_columns(change=pl.col("int").diff(n=2)) shape: (5, 2) βββββββ¬βββββββββ β int β change β β --- β --- β β i64 β i64 β βββββββͺβββββββββ‘ β 20 β null β β 10 β null β β 30 β 10 β β 25 β 15 β β 35 β 5 β βββββββ΄βββββββββ
>>> df.select(pl.col("int").diff(n=2, null_behavior="drop").alias("diff")) shape: (3, 1) ββββββββ β diff β β --- β β i64 β ββββββββ‘ β 10 β β 15 β β 5 β ββββββββ
- dot(other: Expr | str) Self [source]
Compute the dot/inner product between two Expressions.
- Parameters:
- other
Expression to compute dot product with.
Examples
>>> df = pl.DataFrame( ... { ... "a": [1, 3, 5], ... "b": [2, 4, 6], ... } ... ) >>> df.select(pl.col("a").dot(pl.col("b"))) shape: (1, 1) βββββββ β a β β --- β β i64 β βββββββ‘ β 44 β βββββββ
- drop_nans() Self [source]
Drop floating point NaN values.
Warning
Note that NaN values are not null values! To drop null values, use
drop_nulls()
.Examples
>>> df = pl.DataFrame( ... { ... "a": [8, 9, 10, 11], ... "b": [None, 4.0, 4.0, float("nan")], ... } ... ) >>> df.select(pl.col("b").drop_nans()) shape: (3, 1) ββββββββ β b β β --- β β f64 β ββββββββ‘ β null β β 4.0 β β 4.0 β ββββββββ
- drop_nulls() Self [source]
Drop all null values.
Warning
Note that null values are not floating point NaN values! To drop NaN values, use
drop_nans()
.Examples
>>> df = pl.DataFrame( ... { ... "a": [8, 9, 10, 11], ... "b": [None, 4.0, 4.0, float("nan")], ... } ... ) >>> df.select(pl.col("b").drop_nulls()) shape: (3, 1) βββββββ β b β β --- β β f64 β βββββββ‘ β 4.0 β β 4.0 β β NaN β βββββββ
- entropy(base: float = 2.718281828459045, *, normalize: bool = True) Self [source]
Computes the entropy.
Uses the formula
-sum(pk * log(pk)
wherepk
are discrete probabilities.- Parameters:
- base
Given base, defaults to e
- normalize
Normalize pk if it doesnβt sum to 1.
Examples
>>> df = pl.DataFrame({"a": [1, 2, 3]}) >>> df.select(pl.col("a").entropy(base=2)) shape: (1, 1) ββββββββββββ β a β β --- β β f64 β ββββββββββββ‘ β 1.459148 β ββββββββββββ >>> df.select(pl.col("a").entropy(base=2, normalize=False)) shape: (1, 1) βββββββββββββ β a β β --- β β f64 β βββββββββββββ‘ β -6.754888 β βββββββββββββ
- eq(other: Any) Self [source]
Method equivalent of equality operator
expr == other
.- Parameters:
- other
A literal or expression value to compare with.
Examples
>>> df = pl.DataFrame( ... data={ ... "x": [1.0, 2.0, float("nan"), 4.0], ... "y": [2.0, 2.0, float("nan"), 4.0], ... } ... ) >>> df.with_columns( ... pl.col("x").eq(pl.col("y")).alias("x == y"), ... ) shape: (4, 3) βββββββ¬ββββββ¬βββββββββ β x β y β x == y β β --- β --- β --- β β f64 β f64 β bool β βββββββͺββββββͺβββββββββ‘ β 1.0 β 2.0 β false β β 2.0 β 2.0 β true β β NaN β NaN β false β β 4.0 β 4.0 β true β βββββββ΄ββββββ΄βββββββββ
- eq_missing(other: Any) Self [source]
Method equivalent of equality operator
expr == other
where None == None`.This differs from default
eq
where null values are propagated.- Parameters:
- other
A literal or expression value to compare with.
Examples
>>> df = pl.DataFrame( ... data={ ... "x": [1.0, 2.0, float("nan"), 4.0, None, None], ... "y": [2.0, 2.0, float("nan"), 4.0, 5.0, None], ... } ... ) >>> df.with_columns( ... pl.col("x").eq_missing(pl.col("y")).alias("x == y"), ... ) shape: (6, 3) ββββββββ¬βββββββ¬βββββββββ β x β y β x == y β β --- β --- β --- β β f64 β f64 β bool β ββββββββͺβββββββͺβββββββββ‘ β 1.0 β 2.0 β false β β 2.0 β 2.0 β true β β NaN β NaN β false β β 4.0 β 4.0 β true β β null β 5.0 β false β β null β null β true β ββββββββ΄βββββββ΄βββββββββ
- ewm_mean(
- com: float | None = None,
- span: float | None = None,
- half_life: float | None = None,
- alpha: float | None = None,
- *,
- adjust: bool = True,
- min_periods: int = 1,
- ignore_nulls: bool = True,
Exponentially-weighted moving average.
- Parameters:
- com
Specify decay in terms of center of mass, \(\gamma\), with
\[\alpha = \frac{1}{1 + \gamma} \; \forall \; \gamma \geq 0\]- span
Specify decay in terms of span, \(\theta\), with
\[\alpha = \frac{2}{\theta + 1} \; \forall \; \theta \geq 1\]- half_life
Specify decay in terms of half-life, \(\lambda\), with
\[\alpha = 1 - \exp \left\{ \frac{ -\ln(2) }{ \lambda } \right\} \; \forall \; \lambda > 0\]- alpha
Specify smoothing factor alpha directly, \(0 < \alpha \leq 1\).
- adjust
Divide by decaying adjustment factor in beginning periods to account for imbalance in relative weightings
When
adjust=True
the EW function is calculated using weights \(w_i = (1 - \alpha)^i\)When
adjust=False
the EW function is calculated recursively by\[\begin{split}y_0 &= x_0 \\ y_t &= (1 - \alpha)y_{t - 1} + \alpha x_t\end{split}\]
- min_periods
Minimum number of observations in window required to have a value (otherwise result is null).
- ignore_nulls
Ignore missing values when calculating weights.
When
ignore_nulls=False
(default), weights are based on absolute positions. For example, the weights of \(x_0\) and \(x_2\) used in calculating the final weighted average of [\(x_0\), None, \(x_2\)] are \((1-\alpha)^2\) and \(1\) ifadjust=True
, and \((1-\alpha)^2\) and \(\alpha\) ifadjust=False
.When
ignore_nulls=True
, weights are based on relative positions. For example, the weights of \(x_0\) and \(x_2\) used in calculating the final weighted average of [\(x_0\), None, \(x_2\)] are \(1-\alpha\) and \(1\) ifadjust=True
, and \(1-\alpha\) and \(\alpha\) ifadjust=False
.
Examples
>>> df = pl.DataFrame({"a": [1, 2, 3]}) >>> df.select(pl.col("a").ewm_mean(com=1)) shape: (3, 1) ββββββββββββ β a β β --- β β f64 β ββββββββββββ‘ β 1.0 β β 1.666667 β β 2.428571 β ββββββββββββ
- ewm_std(
- com: float | None = None,
- span: float | None = None,
- half_life: float | None = None,
- alpha: float | None = None,
- *,
- adjust: bool = True,
- bias: bool = False,
- min_periods: int = 1,
- ignore_nulls: bool = True,
Exponentially-weighted moving standard deviation.
- Parameters:
- com
Specify decay in terms of center of mass, \(\gamma\), with
\[\alpha = \frac{1}{1 + \gamma} \; \forall \; \gamma \geq 0\]- span
Specify decay in terms of span, \(\theta\), with
\[\alpha = \frac{2}{\theta + 1} \; \forall \; \theta \geq 1\]- half_life
Specify decay in terms of half-life, \(\lambda\), with
\[\alpha = 1 - \exp \left\{ \frac{ -\ln(2) }{ \lambda } \right\} \; \forall \; \lambda > 0\]- alpha
Specify smoothing factor alpha directly, \(0 < \alpha \leq 1\).
- adjust
Divide by decaying adjustment factor in beginning periods to account for imbalance in relative weightings
When
adjust=True
the EW function is calculated using weights \(w_i = (1 - \alpha)^i\)When
adjust=False
the EW function is calculated recursively by\[\begin{split}y_0 &= x_0 \\ y_t &= (1 - \alpha)y_{t - 1} + \alpha x_t\end{split}\]
- bias
When
bias=False
, apply a correction to make the estimate statistically unbiased.- min_periods
Minimum number of observations in window required to have a value (otherwise result is null).
- ignore_nulls
Ignore missing values when calculating weights.
When
ignore_nulls=False
(default), weights are based on absolute positions. For example, the weights of \(x_0\) and \(x_2\) used in calculating the final weighted average of [\(x_0\), None, \(x_2\)] are \((1-\alpha)^2\) and \(1\) ifadjust=True
, and \((1-\alpha)^2\) and \(\alpha\) ifadjust=False
.When
ignore_nulls=True
, weights are based on relative positions. For example, the weights of \(x_0\) and \(x_2\) used in calculating the final weighted average of [\(x_0\), None, \(x_2\)] are \(1-\alpha\) and \(1\) ifadjust=True
, and \(1-\alpha\) and \(\alpha\) ifadjust=False
.
Examples
>>> df = pl.DataFrame({"a": [1, 2, 3]}) >>> df.select(pl.col("a").ewm_std(com=1)) shape: (3, 1) ββββββββββββ β a β β --- β β f64 β ββββββββββββ‘ β 0.0 β β 0.707107 β β 0.963624 β ββββββββββββ
- ewm_var(
- com: float | None = None,
- span: float | None = None,
- half_life: float | None = None,
- alpha: float | None = None,
- *,
- adjust: bool = True,
- bias: bool = False,
- min_periods: int = 1,
- ignore_nulls: bool = True,
Exponentially-weighted moving variance.
- Parameters:
- com
Specify decay in terms of center of mass, \(\gamma\), with
\[\alpha = \frac{1}{1 + \gamma} \; \forall \; \gamma \geq 0\]- span
Specify decay in terms of span, \(\theta\), with
\[\alpha = \frac{2}{\theta + 1} \; \forall \; \theta \geq 1\]- half_life
Specify decay in terms of half-life, \(\lambda\), with
\[\alpha = 1 - \exp \left\{ \frac{ -\ln(2) }{ \lambda } \right\} \; \forall \; \lambda > 0\]- alpha
Specify smoothing factor alpha directly, \(0 < \alpha \leq 1\).
- adjust
Divide by decaying adjustment factor in beginning periods to account for imbalance in relative weightings
When
adjust=True
the EW function is calculated using weights \(w_i = (1 - \alpha)^i\)When
adjust=False
the EW function is calculated recursively by\[\begin{split}y_0 &= x_0 \\ y_t &= (1 - \alpha)y_{t - 1} + \alpha x_t\end{split}\]
- bias
When
bias=False
, apply a correction to make the estimate statistically unbiased.- min_periods
Minimum number of observations in window required to have a value (otherwise result is null).
- ignore_nulls
Ignore missing values when calculating weights.
When
ignore_nulls=False
(default), weights are based on absolute positions. For example, the weights of \(x_0\) and \(x_2\) used in calculating the final weighted average of [\(x_0\), None, \(x_2\)] are \((1-\alpha)^2\) and \(1\) ifadjust=True
, and \((1-\alpha)^2\) and \(\alpha\) ifadjust=False
.When
ignore_nulls=True
, weights are based on relative positions. For example, the weights of \(x_0\) and \(x_2\) used in calculating the final weighted average of [\(x_0\), None, \(x_2\)] are \(1-\alpha\) and \(1\) ifadjust=True
, and \(1-\alpha\) and \(\alpha\) ifadjust=False
.
Examples
>>> df = pl.DataFrame({"a": [1, 2, 3]}) >>> df.select(pl.col("a").ewm_var(com=1)) shape: (3, 1) ββββββββββββ β a β β --- β β f64 β ββββββββββββ‘ β 0.0 β β 0.5 β β 0.928571 β ββββββββββββ
- exclude(
- columns: str | PolarsDataType | Collection[str] | Collection[PolarsDataType],
- *more_columns: str | PolarsDataType,
Exclude columns from a multi-column expression.
Only works after a wildcard or regex column selection, and you cannot provide both string column names and dtypes (you may prefer to use selectors instead).
- Parameters:
- columns
The name or datatype of the column(s) to exclude. Accepts regular expression input. Regular expressions should start with
^
and end with$
.- *more_columns
Additional names or datatypes of columns to exclude, specified as positional arguments.
Examples
>>> df = pl.DataFrame( ... { ... "aa": [1, 2, 3], ... "ba": ["a", "b", None], ... "cc": [None, 2.5, 1.5], ... } ... ) >>> df shape: (3, 3) βββββββ¬βββββββ¬βββββββ β aa β ba β cc β β --- β --- β --- β β i64 β str β f64 β βββββββͺβββββββͺβββββββ‘ β 1 β a β null β β 2 β b β 2.5 β β 3 β null β 1.5 β βββββββ΄βββββββ΄βββββββ
Exclude by column name(s):
>>> df.select(pl.all().exclude("ba")) shape: (3, 2) βββββββ¬βββββββ β aa β cc β β --- β --- β β i64 β f64 β βββββββͺβββββββ‘ β 1 β null β β 2 β 2.5 β β 3 β 1.5 β βββββββ΄βββββββ
Exclude by regex, e.g. removing all columns whose names end with the letter βaβ:
>>> df.select(pl.all().exclude("^.*a$")) shape: (3, 1) ββββββββ β cc β β --- β β f64 β ββββββββ‘ β null β β 2.5 β β 1.5 β ββββββββ
Exclude by dtype(s), e.g. removing all columns of type Int64 or Float64:
>>> df.select(pl.all().exclude([pl.Int64, pl.Float64])) shape: (3, 1) ββββββββ β ba β β --- β β str β ββββββββ‘ β a β β b β β null β ββββββββ
- exp() Self [source]
Compute the exponential, element-wise.
Examples
>>> df = pl.DataFrame({"values": [1.0, 2.0, 4.0]}) >>> df.select(pl.col("values").exp()) shape: (3, 1) ββββββββββββ β values β β --- β β f64 β ββββββββββββ‘ β 2.718282 β β 7.389056 β β 54.59815 β ββββββββββββ
- explode() Self [source]
Explode a list expression.
This means that every item is expanded to a new row.
- Returns:
- Expr
Expression with the data type of the list elements.
See also
Expr.list.explode
Explode a list column.
Expr.str.explode
Explode a string column.
Examples
>>> df = pl.DataFrame( ... { ... "group": ["a", "b"], ... "values": [ ... [1, 2], ... [3, 4], ... ], ... } ... ) >>> df.select(pl.col("values").explode()) shape: (4, 1) ββββββββββ β values β β --- β β i64 β ββββββββββ‘ β 1 β β 2 β β 3 β β 4 β ββββββββββ
- extend_constant(value: PythonLiteral | None, n: int) Self [source]
Extremely fast method for extending the Series with βnβ copies of a value.
- Parameters:
- value
A constant literal value (not an expression) with which to extend the expression result Series; can pass None to extend with nulls.
- n
The number of additional values that will be added.
Examples
>>> df = pl.DataFrame({"values": [1, 2, 3]}) >>> df.select((pl.col("values") - 1).extend_constant(99, n=2)) shape: (5, 1) ββββββββββ β values β β --- β β i64 β ββββββββββ‘ β 0 β β 1 β β 2 β β 99 β β 99 β ββββββββββ
- fill_nan(value: int | float | Expr | None) Self [source]
Fill floating point NaN value with a fill value.
Examples
>>> df = pl.DataFrame( ... { ... "a": [1.0, None, float("nan")], ... "b": [4.0, float("nan"), 6], ... } ... ) >>> df.with_columns(pl.col("b").fill_nan(0)) shape: (3, 2) ββββββββ¬ββββββ β a β b β β --- β --- β β f64 β f64 β ββββββββͺββββββ‘ β 1.0 β 4.0 β β null β 0.0 β β NaN β 6.0 β ββββββββ΄ββββββ
- fill_null( ) Self [source]
Fill null values using the specified value or strategy.
To interpolate over null values see interpolate. See the examples below to fill nulls with an expression.
- Parameters:
- value
Value used to fill null values.
- strategy{None, βforwardβ, βbackwardβ, βminβ, βmaxβ, βmeanβ, βzeroβ, βoneβ}
Strategy used to fill null values.
- limit
Number of consecutive null values to fill when using the βforwardβ or βbackwardβ strategy.
Examples
>>> df = pl.DataFrame( ... { ... "a": [1, 2, None], ... "b": [4, None, 6], ... } ... ) >>> df.with_columns(pl.col("b").fill_null(strategy="zero")) shape: (3, 2) ββββββββ¬ββββββ β a β b β β --- β --- β β i64 β i64 β ββββββββͺββββββ‘ β 1 β 4 β β 2 β 0 β β null β 6 β ββββββββ΄ββββββ >>> df.with_columns(pl.col("b").fill_null(99)) shape: (3, 2) ββββββββ¬ββββββ β a β b β β --- β --- β β i64 β i64 β ββββββββͺββββββ‘ β 1 β 4 β β 2 β 99 β β null β 6 β ββββββββ΄ββββββ >>> df.with_columns(pl.col("b").fill_null(strategy="forward")) shape: (3, 2) ββββββββ¬ββββββ β a β b β β --- β --- β β i64 β i64 β ββββββββͺββββββ‘ β 1 β 4 β β 2 β 4 β β null β 6 β ββββββββ΄ββββββ >>> df.with_columns(pl.col("b").fill_null(pl.col("b").median())) shape: (3, 2) ββββββββ¬ββββββ β a β b β β --- β --- β β i64 β f64 β ββββββββͺββββββ‘ β 1 β 4.0 β β 2 β 5.0 β β null β 6.0 β ββββββββ΄ββββββ >>> df.with_columns(pl.all().fill_null(pl.all().median())) shape: (3, 2) βββββββ¬ββββββ β a β b β β --- β --- β β f64 β f64 β βββββββͺββββββ‘ β 1.0 β 4.0 β β 2.0 β 5.0 β β 1.5 β 6.0 β βββββββ΄ββββββ
- filter(predicate: Expr) Self [source]
Filter a single column.
Mostly useful in an aggregation context. If you want to filter on a DataFrame level, use LazyFrame.filter.
- Parameters:
- predicate
Boolean expression.
Examples
>>> df = pl.DataFrame( ... { ... "group_col": ["g1", "g1", "g2"], ... "b": [1, 2, 3], ... } ... ) >>> df.groupby("group_col").agg( ... [ ... pl.col("b").filter(pl.col("b") < 2).sum().alias("lt"), ... pl.col("b").filter(pl.col("b") >= 2).sum().alias("gte"), ... ] ... ).sort("group_col") shape: (2, 3) βββββββββββββ¬ββββββ¬ββββββ β group_col β lt β gte β β --- β --- β --- β β str β i64 β i64 β βββββββββββββͺββββββͺββββββ‘ β g1 β 1 β 2 β β g2 β 0 β 3 β βββββββββββββ΄ββββββ΄ββββββ
- first() Self [source]
Get the first value.
Examples
>>> df = pl.DataFrame({"a": [1, 1, 2]}) >>> df.select(pl.col("a").first()) shape: (1, 1) βββββββ β a β β --- β β i64 β βββββββ‘ β 1 β βββββββ
- flatten() Self [source]
Flatten a list or string column.
Alias for
polars.expr.list.ExprListNameSpace.explode()
.Examples
>>> df = pl.DataFrame( ... { ... "group": ["a", "b", "b"], ... "values": [[1, 2], [2, 3], [4]], ... } ... ) >>> df.groupby("group").agg(pl.col("values").flatten()) shape: (2, 2) βββββββββ¬ββββββββββββ β group β values β β --- β --- β β str β list[i64] β βββββββββͺββββββββββββ‘ β a β [1, 2] β β b β [2, 3, 4] β βββββββββ΄ββββββββββββ
- floor() Self [source]
Rounds down to the nearest integer value.
Only works on floating point Series.
Examples
>>> df = pl.DataFrame({"a": [0.3, 0.5, 1.0, 1.1]}) >>> df.select(pl.col("a").floor()) shape: (4, 1) βββββββ β a β β --- β β f64 β βββββββ‘ β 0.0 β β 0.0 β β 1.0 β β 1.0 β βββββββ
- floordiv(other: Any) Self [source]
Method equivalent of integer division operator
expr // other
.- Parameters:
- other
Numeric literal or expression value.
See also
Examples
>>> df = pl.DataFrame({"x": [1, 2, 3, 4, 5]}) >>> df.with_columns( ... pl.col("x").truediv(2).alias("x/2"), ... pl.col("x").floordiv(2).alias("x//2"), ... ) shape: (5, 3) βββββββ¬ββββββ¬βββββββ β x β x/2 β x//2 β β --- β --- β --- β β i64 β f64 β i64 β βββββββͺββββββͺβββββββ‘ β 1 β 0.5 β 0 β β 2 β 1.0 β 1 β β 3 β 1.5 β 1 β β 4 β 2.0 β 2 β β 5 β 2.5 β 2 β βββββββ΄ββββββ΄βββββββ
- forward_fill(limit: int | None = None) Self [source]
Fill missing values with the latest seen values.
- Parameters:
- limit
The number of consecutive null values to forward fill.
Examples
>>> df = pl.DataFrame( ... { ... "a": [1, 2, None], ... "b": [4, None, 6], ... } ... ) >>> df.select(pl.all().forward_fill()) shape: (3, 2) βββββββ¬ββββββ β a β b β β --- β --- β β i64 β i64 β βββββββͺββββββ‘ β 1 β 4 β β 2 β 4 β β 2 β 6 β βββββββ΄ββββββ
- classmethod from_json(value: str) Self [source]
Read an expression from a JSON encoded string to construct an Expression.
- Parameters:
- value
JSON encoded string value
- ge(other: Any) Self [source]
Method equivalent of βgreater than or equalβ operator
expr >= other
.- Parameters:
- other
A literal or expression value to compare with.
Examples
>>> df = pl.DataFrame( ... data={ ... "x": [5.0, 4.0, float("nan"), 2.0], ... "y": [5.0, 3.0, float("nan"), 1.0], ... } ... ) >>> df.with_columns( ... pl.col("x").ge(pl.col("y")).alias("x >= y"), ... ) shape: (4, 3) βββββββ¬ββββββ¬βββββββββ β x β y β x >= y β β --- β --- β --- β β f64 β f64 β bool β βββββββͺββββββͺβββββββββ‘ β 5.0 β 5.0 β true β β 4.0 β 3.0 β true β β NaN β NaN β false β β 2.0 β 1.0 β true β βββββββ΄ββββββ΄βββββββββ
- gt(other: Any) Self [source]
Method equivalent of βgreater thanβ operator
expr > other
.- Parameters:
- other
A literal or expression value to compare with.
Examples
>>> df = pl.DataFrame( ... data={ ... "x": [5.0, 4.0, float("nan"), 2.0], ... "y": [5.0, 3.0, float("nan"), 1.0], ... } ... ) >>> df.with_columns( ... pl.col("x").gt(pl.col("y")).alias("x > y"), ... ) shape: (4, 3) βββββββ¬ββββββ¬ββββββββ β x β y β x > y β β --- β --- β --- β β f64 β f64 β bool β βββββββͺββββββͺββββββββ‘ β 5.0 β 5.0 β false β β 4.0 β 3.0 β true β β NaN β NaN β false β β 2.0 β 1.0 β true β βββββββ΄ββββββ΄ββββββββ
- hash( ) Self [source]
Hash the elements in the selection.
The hash value is of type UInt64.
- Parameters:
- seed
Random seed parameter. Defaults to 0.
- seed_1
Random seed parameter. Defaults to seed if not set.
- seed_2
Random seed parameter. Defaults to seed if not set.
- seed_3
Random seed parameter. Defaults to seed if not set.
Examples
>>> df = pl.DataFrame( ... { ... "a": [1, 2, None], ... "b": ["x", None, "z"], ... } ... ) >>> df.with_columns(pl.all().hash(10, 20, 30, 40)) shape: (3, 2) ββββββββββββββββββββββββ¬βββββββββββββββββββββββ β a β b β β --- β --- β β u64 β u64 β ββββββββββββββββββββββββͺβββββββββββββββββββββββ‘ β 9774092659964970114 β 13614470193936745724 β β 1101441246220388612 β 11638928888656214026 β β 11638928888656214026 β 13382926553367784577 β ββββββββββββββββββββββββ΄βββββββββββββββββββββββ
- head(n: int | Expr = 10) Self [source]
Get the first n rows.
- Parameters:
- n
Number of rows to return.
Examples
>>> df = pl.DataFrame({"foo": [1, 2, 3, 4, 5, 6, 7]}) >>> df.head(3) shape: (3, 1) βββββββ β foo β β --- β β i64 β βββββββ‘ β 1 β β 2 β β 3 β βββββββ
- implode() Self [source]
Aggregate values into a list.
Examples
>>> df = pl.DataFrame( ... { ... "a": [1, 2, 3], ... "b": [4, 5, 6], ... } ... ) >>> df.select(pl.all().implode()) shape: (1, 2) βββββββββββββ¬ββββββββββββ β a β b β β --- β --- β β list[i64] β list[i64] β βββββββββββββͺββββββββββββ‘ β [1, 2, 3] β [4, 5, 6] β βββββββββββββ΄ββββββββββββ
- inspect(fmt: str = '{}') Self [source]
Print the value that this expression evaluates to and pass on the value.
Examples
>>> df = pl.DataFrame({"foo": [1, 1, 2]}) >>> df.select(pl.col("foo").cumsum().inspect("value is: {}").alias("bar")) value is: shape: (3,) Series: 'foo' [i64] [ 1 2 4 ] shape: (3, 1) βββββββ β bar β β --- β β i64 β βββββββ‘ β 1 β β 2 β β 4 β βββββββ
- interpolate(method: InterpolationMethod = 'linear') Self [source]
Fill null values using interpolation.
- Parameters:
- method{βlinearβ, βnearestβ}
Interpolation method.
Examples
Fill null values using linear interpolation.
>>> df = pl.DataFrame( ... { ... "a": [1, None, 3], ... "b": [1.0, float("nan"), 3.0], ... } ... ) >>> df.select(pl.all().interpolate()) shape: (3, 2) βββββββ¬ββββββ β a β b β β --- β --- β β i64 β f64 β βββββββͺββββββ‘ β 1 β 1.0 β β 2 β NaN β β 3 β 3.0 β βββββββ΄ββββββ
Fill null values using nearest interpolation.
>>> df.select(pl.all().interpolate("nearest")) shape: (3, 2) βββββββ¬ββββββ β a β b β β --- β --- β β i64 β f64 β βββββββͺββββββ‘ β 1 β 1.0 β β 3 β NaN β β 3 β 3.0 β βββββββ΄ββββββ
Regrid data to a new grid.
>>> df_original_grid = pl.DataFrame( ... { ... "grid_points": [1, 3, 10], ... "values": [2.0, 6.0, 20.0], ... } ... ) # Interpolate from this to the new grid >>> df_new_grid = pl.DataFrame({"grid_points": range(1, 11)}) >>> df_new_grid.join( ... df_original_grid, on="grid_points", how="left" ... ).with_columns(pl.col("values").interpolate()) shape: (10, 2) βββββββββββββββ¬βββββββββ β grid_points β values β β --- β --- β β i64 β f64 β βββββββββββββββͺβββββββββ‘ β 1 β 2.0 β β 2 β 4.0 β β 3 β 6.0 β β 4 β 8.0 β β β¦ β β¦ β β 7 β 14.0 β β 8 β 16.0 β β 9 β 18.0 β β 10 β 20.0 β βββββββββββββββ΄βββββββββ
- is_between(
- lower_bound: IntoExpr,
- upper_bound: IntoExpr,
- closed: ClosedInterval = 'both',
Check if this expression is between the given start and end values.
- Parameters:
- lower_bound
Lower bound value. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals.
- upper_bound
Upper bound value. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals.
- closed{βbothβ, βleftβ, βrightβ, βnoneβ}
Define which sides of the interval are closed (inclusive).
- Returns:
- Expr
Expression of data type
Boolean
.
Examples
>>> df = pl.DataFrame({"num": [1, 2, 3, 4, 5]}) >>> df.with_columns(pl.col("num").is_between(2, 4).alias("is_between")) shape: (5, 2) βββββββ¬βββββββββββββ β num β is_between β β --- β --- β β i64 β bool β βββββββͺβββββββββββββ‘ β 1 β false β β 2 β true β β 3 β true β β 4 β true β β 5 β false β βββββββ΄βββββββββββββ
Use the
closed
argument to include or exclude the values at the bounds:>>> df.with_columns( ... pl.col("num").is_between(2, 4, closed="left").alias("is_between") ... ) shape: (5, 2) βββββββ¬βββββββββββββ β num β is_between β β --- β --- β β i64 β bool β βββββββͺβββββββββββββ‘ β 1 β false β β 2 β true β β 3 β true β β 4 β false β β 5 β false β βββββββ΄βββββββββββββ
You can also use strings as well as numeric/temporal values (note: ensure that string literals are wrapped with
lit
so as not to conflate them with column names):>>> df = pl.DataFrame({"a": ["a", "b", "c", "d", "e"]}) >>> df.with_columns( ... pl.col("a") ... .is_between(pl.lit("a"), pl.lit("c"), closed="both") ... .alias("is_between") ... ) shape: (5, 2) βββββββ¬βββββββββββββ β a β is_between β β --- β --- β β str β bool β βββββββͺβββββββββββββ‘ β a β true β β b β true β β c β true β β d β false β β e β false β βββββββ΄βββββββββββββ
- is_duplicated() Self [source]
Get mask of duplicated values.
Examples
>>> df = pl.DataFrame({"a": [1, 1, 2]}) >>> df.select(pl.col("a").is_duplicated()) shape: (3, 1) βββββββββ β a β β --- β β bool β βββββββββ‘ β true β β true β β false β βββββββββ
- is_finite() Self [source]
Returns a boolean Series indicating which values are finite.
- Returns:
- Expr
Expression of data type
Boolean
.
Examples
>>> df = pl.DataFrame( ... { ... "A": [1.0, 2], ... "B": [3.0, float("inf")], ... } ... ) >>> df.select(pl.all().is_finite()) shape: (2, 2) ββββββββ¬ββββββββ β A β B β β --- β --- β β bool β bool β ββββββββͺββββββββ‘ β true β true β β true β false β ββββββββ΄ββββββββ
- is_first() Self [source]
Get a mask of the first unique value.
- Returns:
- Expr
Expression of data type
Boolean
.
Examples
>>> df = pl.DataFrame( ... { ... "num": [1, 2, 3, 1, 5], ... } ... ) >>> df.with_columns(pl.col("num").is_first().alias("is_first")) shape: (5, 2) βββββββ¬βββββββββββ β num β is_first β β --- β --- β β i64 β bool β βββββββͺβββββββββββ‘ β 1 β true β β 2 β true β β 3 β true β β 1 β false β β 5 β true β βββββββ΄βββββββββββ
- is_in(other: Expr | Collection[Any] | Series) Self [source]
Check if elements of this expression are present in the other Series.
- Parameters:
- other
Series or sequence of primitive type.
- Returns:
- Expr
Expression of data type
Boolean
.
Examples
>>> df = pl.DataFrame( ... {"sets": [[1, 2, 3], [1, 2], [9, 10]], "optional_members": [1, 2, 3]} ... ) >>> df.select([pl.col("optional_members").is_in("sets").alias("contains")]) shape: (3, 1) ββββββββββββ β contains β β --- β β bool β ββββββββββββ‘ β true β β true β β false β ββββββββββββ
- is_infinite() Self [source]
Returns a boolean Series indicating which values are infinite.
- Returns:
- Expr
Expression of data type
Boolean
.
Examples
>>> df = pl.DataFrame( ... { ... "A": [1.0, 2], ... "B": [3.0, float("inf")], ... } ... ) >>> df.select(pl.all().is_infinite()) shape: (2, 2) βββββββββ¬ββββββββ β A β B β β --- β --- β β bool β bool β βββββββββͺββββββββ‘ β false β false β β false β true β βββββββββ΄ββββββββ
- is_nan() Self [source]
Returns a boolean Series indicating which values are NaN.
Notes
Floating point
`NaN
(Not A Number) should not be confused with missing data represented asNull/None
.Examples
>>> df = pl.DataFrame( ... { ... "a": [1, 2, None, 1, 5], ... "b": [1.0, 2.0, float("nan"), 1.0, 5.0], ... } ... ) >>> df.with_columns(pl.col(pl.Float64).is_nan().suffix("_isnan")) shape: (5, 3) ββββββββ¬ββββββ¬ββββββββββ β a β b β b_isnan β β --- β --- β --- β β i64 β f64 β bool β ββββββββͺββββββͺββββββββββ‘ β 1 β 1.0 β false β β 2 β 2.0 β false β β null β NaN β true β β 1 β 1.0 β false β β 5 β 5.0 β false β ββββββββ΄ββββββ΄ββββββββββ
- is_not() Self [source]
Negate a boolean expression.
Examples
>>> df = pl.DataFrame( ... { ... "a": [True, False, False], ... "b": ["a", "b", None], ... } ... ) >>> df shape: (3, 2) βββββββββ¬βββββββ β a β b β β --- β --- β β bool β str β βββββββββͺβββββββ‘ β true β a β β false β b β β false β null β βββββββββ΄βββββββ >>> df.select(pl.col("a").is_not()) shape: (3, 1) βββββββββ β a β β --- β β bool β βββββββββ‘ β false β β true β β true β βββββββββ
- is_not_nan() Self [source]
Returns a boolean Series indicating which values are not NaN.
Notes
Floating point
`NaN
(Not A Number) should not be confused with missing data represented asNull/None
.Examples
>>> df = pl.DataFrame( ... { ... "a": [1, 2, None, 1, 5], ... "b": [1.0, 2.0, float("nan"), 1.0, 5.0], ... } ... ) >>> df.with_columns(pl.col(pl.Float64).is_not_nan().suffix("_is_not_nan")) shape: (5, 3) ββββββββ¬ββββββ¬βββββββββββββββ β a β b β b_is_not_nan β β --- β --- β --- β β i64 β f64 β bool β ββββββββͺββββββͺβββββββββββββββ‘ β 1 β 1.0 β true β β 2 β 2.0 β true β β null β NaN β false β β 1 β 1.0 β true β β 5 β 5.0 β true β ββββββββ΄ββββββ΄βββββββββββββββ
- is_not_null() Self [source]
Returns a boolean Series indicating which values are not null.
Examples
>>> df = pl.DataFrame( ... { ... "a": [1, 2, None, 1, 5], ... "b": [1.0, 2.0, float("nan"), 1.0, 5.0], ... } ... ) >>> df.with_columns(pl.all().is_not_null().suffix("_not_null")) # nan != null shape: (5, 4) ββββββββ¬ββββββ¬βββββββββββββ¬βββββββββββββ β a β b β a_not_null β b_not_null β β --- β --- β --- β --- β β i64 β f64 β bool β bool β ββββββββͺββββββͺβββββββββββββͺβββββββββββββ‘ β 1 β 1.0 β true β true β β 2 β 2.0 β true β true β β null β NaN β false β true β β 1 β 1.0 β true β true β β 5 β 5.0 β true β true β ββββββββ΄ββββββ΄βββββββββββββ΄βββββββββββββ
- is_null() Self [source]
Returns a boolean Series indicating which values are null.
Examples
>>> df = pl.DataFrame( ... { ... "a": [1, 2, None, 1, 5], ... "b": [1.0, 2.0, float("nan"), 1.0, 5.0], ... } ... ) >>> df.with_columns(pl.all().is_null().suffix("_isnull")) # nan != null shape: (5, 4) ββββββββ¬ββββββ¬βββββββββββ¬βββββββββββ β a β b β a_isnull β b_isnull β β --- β --- β --- β --- β β i64 β f64 β bool β bool β ββββββββͺββββββͺβββββββββββͺβββββββββββ‘ β 1 β 1.0 β false β false β β 2 β 2.0 β false β false β β null β NaN β true β false β β 1 β 1.0 β false β false β β 5 β 5.0 β false β false β ββββββββ΄ββββββ΄βββββββββββ΄βββββββββββ
- is_unique() Self [source]
Get mask of unique values.
Examples
>>> df = pl.DataFrame({"a": [1, 1, 2]}) >>> df.select(pl.col("a").is_unique()) shape: (3, 1) βββββββββ β a β β --- β β bool β βββββββββ‘ β false β β false β β true β βββββββββ
- keep_name() Self [source]
Keep the original root name of the expression.
See also
Notes
Due to implementation constraints, this method can only be called as the last expression in a chain.
Examples
Undo an alias operation.
>>> df = pl.DataFrame( ... { ... "a": [1, 2], ... "b": [3, 4], ... } ... ) >>> df.with_columns((pl.col("a") * 9).alias("c").keep_name()) shape: (2, 2) βββββββ¬ββββββ β a β b β β --- β --- β β i64 β i64 β βββββββͺββββββ‘ β 9 β 3 β β 18 β 4 β βββββββ΄ββββββ
Prevent errors due to duplicate column names.
>>> df.select((pl.lit(10) / pl.all()).keep_name()) shape: (2, 2) ββββββββ¬βββββββββββ β a β b β β --- β --- β β f64 β f64 β ββββββββͺβββββββββββ‘ β 10.0 β 3.333333 β β 5.0 β 2.5 β ββββββββ΄βββββββββββ
- kurtosis(*, fisher: bool = True, bias: bool = True) Self [source]
Compute the kurtosis (Fisher or Pearson) of a dataset.
Kurtosis is the fourth central moment divided by the square of the variance. If Fisherβs definition is used, then 3.0 is subtracted from the result to give 0.0 for a normal distribution. If bias is False then the kurtosis is calculated using k statistics to eliminate bias coming from biased moment estimators
See scipy.stats for more information
- Parameters:
- fisherbool, optional
If True, Fisherβs definition is used (normal ==> 0.0). If False, Pearsonβs definition is used (normal ==> 3.0).
- biasbool, optional
If False, the calculations are corrected for statistical bias.
Examples
>>> df = pl.DataFrame({"a": [1, 2, 3, 2, 1]}) >>> df.select(pl.col("a").kurtosis()) shape: (1, 1) βββββββββββββ β a β β --- β β f64 β βββββββββββββ‘ β -1.153061 β βββββββββββββ
- last() Self [source]
Get the last value.
Examples
>>> df = pl.DataFrame({"a": [1, 1, 2]}) >>> df.select(pl.col("a").last()) shape: (1, 1) βββββββ β a β β --- β β i64 β βββββββ‘ β 2 β βββββββ
- le(other: Any) Self [source]
Method equivalent of βless than or equalβ operator
expr <= other
.- Parameters:
- other
A literal or expression value to compare with.
Examples
>>> df = pl.DataFrame( ... data={ ... "x": [5.0, 4.0, float("nan"), 0.5], ... "y": [5.0, 3.5, float("nan"), 2.0], ... } ... ) >>> df.with_columns( ... pl.col("x").le(pl.col("y")).alias("x <= y"), ... ) shape: (4, 3) βββββββ¬ββββββ¬βββββββββ β x β y β x <= y β β --- β --- β --- β β f64 β f64 β bool β βββββββͺββββββͺβββββββββ‘ β 5.0 β 5.0 β true β β 4.0 β 3.5 β false β β NaN β NaN β false β β 0.5 β 2.0 β true β βββββββ΄ββββββ΄βββββββββ
- len() Self [source]
Count the number of values in this expression.
Alias for
count()
.Examples
>>> df = pl.DataFrame( ... { ... "a": [8, 9, 10], ... "b": [None, 4, 4], ... } ... ) >>> df.select(pl.all().len()) # counts nulls shape: (1, 2) βββββββ¬ββββββ β a β b β β --- β --- β β u32 β u32 β βββββββͺββββββ‘ β 3 β 3 β βββββββ΄ββββββ
- limit(n: int | Expr = 10) Self [source]
Get the first n rows (alias for
Expr.head()
).- Parameters:
- n
Number of rows to return.
Examples
>>> df = pl.DataFrame({"foo": [1, 2, 3, 4, 5, 6, 7]}) >>> df.limit(3) shape: (3, 1) βββββββ β foo β β --- β β i64 β βββββββ‘ β 1 β β 2 β β 3 β βββββββ
- log(base: float = 2.718281828459045) Self [source]
Compute the logarithm to a given base.
- Parameters:
- base
Given base, defaults to e
Examples
>>> df = pl.DataFrame({"a": [1, 2, 3]}) >>> df.select(pl.col("a").log(base=2)) shape: (3, 1) ββββββββββββ β a β β --- β β f64 β ββββββββββββ‘ β 0.0 β β 1.0 β β 1.584963 β ββββββββββββ
- log10() Self [source]
Compute the base 10 logarithm of the input array, element-wise.
Examples
>>> df = pl.DataFrame({"values": [1.0, 2.0, 4.0]}) >>> df.select(pl.col("values").log10()) shape: (3, 1) βββββββββββ β values β β --- β β f64 β βββββββββββ‘ β 0.0 β β 0.30103 β β 0.60206 β βββββββββββ
- log1p() Self [source]
Compute the natural logarithm of each element plus one.
This computes log(1 + x) but is more numerically stable for x close to zero.
Examples
>>> df = pl.DataFrame({"a": [1, 2, 3]}) >>> df.select(pl.col("a").log1p()) shape: (3, 1) ββββββββββββ β a β β --- β β f64 β ββββββββββββ‘ β 0.693147 β β 1.098612 β β 1.386294 β ββββββββββββ
- lower_bound() Self [source]
Calculate the lower bound.
Returns a unit Series with the lowest value possible for the dtype of this expression.
Examples
>>> df = pl.DataFrame({"a": [1, 2, 3, 2, 1]}) >>> df.select(pl.col("a").lower_bound()) shape: (1, 1) ββββββββββββββββββββββββ β a β β --- β β i64 β ββββββββββββββββββββββββ‘ β -9223372036854775808 β ββββββββββββββββββββββββ
- lt(other: Any) Self [source]
Method equivalent of βless thanβ operator
expr < other
.- Parameters:
- other
A literal or expression value to compare with.
Examples
>>> df = pl.DataFrame( ... data={ ... "x": [1.0, 2.0, float("nan"), 3.0], ... "y": [2.0, 2.0, float("nan"), 4.0], ... } ... ) >>> df.with_columns( ... pl.col("x").lt(pl.col("y")).alias("x < y"), ... ) shape: (4, 3) βββββββ¬ββββββ¬ββββββββ β x β y β x < y β β --- β --- β --- β β f64 β f64 β bool β βββββββͺββββββͺββββββββ‘ β 1.0 β 2.0 β true β β 2.0 β 2.0 β false β β NaN β NaN β false β β 3.0 β 4.0 β true β βββββββ΄ββββββ΄ββββββββ
- map(
- function: Callable[[Series], Series | Any],
- return_dtype: PolarsDataType | None = None,
- *,
- agg_list: bool = False,
Apply a custom python function to a Series or sequence of Series.
The output of this custom function must be a Series. If you want to apply a custom function elementwise over single values, see
apply()
. A use case formap
is when you want to transform an expression with a third-party library.Read more in the book.
- Parameters:
- function
Lambda/ function to apply.
- return_dtype
Dtype of the output Series.
- agg_list
Aggregate list
Warning
If
return_dtype
is not provided, this may lead to unexpected results. We allow this, but it is considered a bug in the userβs query.See also
Examples
>>> df = pl.DataFrame( ... { ... "sine": [0.0, 1.0, 0.0, -1.0], ... "cosine": [1.0, 0.0, -1.0, 0.0], ... } ... ) >>> df.select(pl.all().map(lambda x: x.to_numpy().argmax())) shape: (1, 2) ββββββββ¬βββββββββ β sine β cosine β β --- β --- β β i64 β i64 β ββββββββͺβββββββββ‘ β 1 β 0 β ββββββββ΄βββββββββ
- map_alias(function: Callable[[str], str]) Self [source]
Rename the output of an expression by mapping a function over the root name.
- Parameters:
- function
Function that maps a root name to a new name.
Examples
Remove a common suffix and convert to lower case.
>>> df = pl.DataFrame( ... { ... "A_reverse": [3, 2, 1], ... "B_reverse": ["z", "y", "x"], ... } ... ) >>> df.with_columns( ... pl.all().reverse().map_alias(lambda c: c.rstrip("_reverse").lower()) ... ) shape: (3, 4) βββββββββββββ¬ββββββββββββ¬ββββββ¬ββββββ β A_reverse β B_reverse β a β b β β --- β --- β --- β --- β β i64 β str β i64 β str β βββββββββββββͺββββββββββββͺββββββͺββββββ‘ β 3 β z β 1 β x β β 2 β y β 2 β y β β 1 β x β 3 β z β βββββββββββββ΄ββββββββββββ΄ββββββ΄ββββββ
- map_dict( ) Self [source]
Replace values in column according to remapping dictionary.
Needs a global string cache for lazily evaluated queries on columns of type
pl.Categorical
.- Parameters:
- remapping
Dictionary containing the before/after values to map.
- default
Value to use when the remapping dict does not contain the lookup value. Accepts expression input. Non-expression inputs are parsed as literals. Use
pl.first()
, to keep the original value.- return_dtype
Set return dtype to override automatic return dtype determination.
See also
Examples
>>> country_code_dict = { ... "CA": "Canada", ... "DE": "Germany", ... "FR": "France", ... None: "Not specified", ... } >>> df = pl.DataFrame( ... { ... "country_code": ["FR", None, "ES", "DE"], ... } ... ).with_row_count() >>> df shape: (4, 2) ββββββββββ¬βββββββββββββββ β row_nr β country_code β β --- β --- β β u32 β str β ββββββββββͺβββββββββββββββ‘ β 0 β FR β β 1 β null β β 2 β ES β β 3 β DE β ββββββββββ΄βββββββββββββββ
>>> df.with_columns( ... pl.col("country_code").map_dict(country_code_dict).alias("remapped") ... ) shape: (4, 3) ββββββββββ¬βββββββββββββββ¬ββββββββββββββββ β row_nr β country_code β remapped β β --- β --- β --- β β u32 β str β str β ββββββββββͺβββββββββββββββͺββββββββββββββββ‘ β 0 β FR β France β β 1 β null β Not specified β β 2 β ES β null β β 3 β DE β Germany β ββββββββββ΄βββββββββββββββ΄ββββββββββββββββ
Set a default value for values that cannot be mappedβ¦
>>> df.with_columns( ... pl.col("country_code") ... .map_dict(country_code_dict, default="unknown") ... .alias("remapped") ... ) shape: (4, 3) ββββββββββ¬βββββββββββββββ¬ββββββββββββββββ β row_nr β country_code β remapped β β --- β --- β --- β β u32 β str β str β ββββββββββͺβββββββββββββββͺββββββββββββββββ‘ β 0 β FR β France β β 1 β null β Not specified β β 2 β ES β unknown β β 3 β DE β Germany β ββββββββββ΄βββββββββββββββ΄ββββββββββββββββ
β¦or keep the original value, by making use of
pl.first()
:>>> df.with_columns( ... pl.col("country_code") ... .map_dict(country_code_dict, default=pl.first()) ... .alias("remapped") ... ) shape: (4, 3) ββββββββββ¬βββββββββββββββ¬ββββββββββββββββ β row_nr β country_code β remapped β β --- β --- β --- β β u32 β str β str β ββββββββββͺβββββββββββββββͺββββββββββββββββ‘ β 0 β FR β France β β 1 β null β Not specified β β 2 β ES β ES β β 3 β DE β Germany β ββββββββββ΄βββββββββββββββ΄ββββββββββββββββ
β¦or keep the original value, by explicitly referring to the column:
>>> df.with_columns( ... pl.col("country_code") ... .map_dict(country_code_dict, default=pl.col("country_code")) ... .alias("remapped") ... ) shape: (4, 3) ββββββββββ¬βββββββββββββββ¬ββββββββββββββββ β row_nr β country_code β remapped β β --- β --- β --- β β u32 β str β str β ββββββββββͺβββββββββββββββͺββββββββββββββββ‘ β 0 β FR β France β β 1 β null β Not specified β β 2 β ES β ES β β 3 β DE β Germany β ββββββββββ΄βββββββββββββββ΄ββββββββββββββββ
If you need to access different columns to set a default value, a struct needs to be constructed; in the first field is the column that you want to remap and the rest of the fields are the other columns used in the default expression.
>>> df.with_columns( ... pl.struct(pl.col(["country_code", "row_nr"])).map_dict( ... remapping=country_code_dict, ... default=pl.col("row_nr").cast(pl.Utf8), ... ) ... ) shape: (4, 2) ββββββββββ¬ββββββββββββββββ β row_nr β country_code β β --- β --- β β u32 β str β ββββββββββͺββββββββββββββββ‘ β 0 β France β β 1 β Not specified β β 2 β 2 β β 3 β Germany β ββββββββββ΄ββββββββββββββββ
Override return dtype:
>>> df.with_columns( ... pl.col("row_nr") ... .map_dict({1: 7, 3: 4}, default=3, return_dtype=pl.UInt8) ... .alias("remapped") ... ) shape: (4, 3) ββββββββββ¬βββββββββββββββ¬βββββββββββ β row_nr β country_code β remapped β β --- β --- β --- β β u32 β str β u8 β ββββββββββͺβββββββββββββββͺβββββββββββ‘ β 0 β FR β 3 β β 1 β null β 7 β β 2 β ES β 3 β β 3 β DE β 4 β ββββββββββ΄βββββββββββββββ΄βββββββββββ
- max() Self [source]
Get maximum value.
Examples
>>> df = pl.DataFrame({"a": [-1, float("nan"), 1]}) >>> df.select(pl.col("a").max()) shape: (1, 1) βββββββ β a β β --- β β f64 β βββββββ‘ β 1.0 β βββββββ
- mean() Self [source]
Get mean value.
Examples
>>> df = pl.DataFrame({"a": [-1, 0, 1]}) >>> df.select(pl.col("a").mean()) shape: (1, 1) βββββββ β a β β --- β β f64 β βββββββ‘ β 0.0 β βββββββ
- median() Self [source]
Get median value using linear interpolation.
Examples
>>> df = pl.DataFrame({"a": [-1, 0, 1]}) >>> df.select(pl.col("a").median()) shape: (1, 1) βββββββ β a β β --- β β f64 β βββββββ‘ β 0.0 β βββββββ
- min() Self [source]
Get minimum value.
Examples
>>> df = pl.DataFrame({"a": [-1, float("nan"), 1]}) >>> df.select(pl.col("a").min()) shape: (1, 1) ββββββββ β a β β --- β β f64 β ββββββββ‘ β -1.0 β ββββββββ
- mod(other: Any) Self [source]
Method equivalent of modulus operator
expr % other
.- Parameters:
- other
Numeric literal or expression value.
Examples
>>> df = pl.DataFrame({"x": [0, 1, 2, 3, 4]}) >>> df.with_columns(pl.col("x").mod(2).alias("x%2")) shape: (5, 2) βββββββ¬ββββββ β x β x%2 β β --- β --- β β i64 β i64 β βββββββͺββββββ‘ β 0 β 0 β β 1 β 1 β β 2 β 0 β β 3 β 1 β β 4 β 0 β βββββββ΄ββββββ
- mode() Self [source]
Compute the most occurring value(s).
Can return multiple Values.
Examples
>>> df = pl.DataFrame( ... { ... "a": [1, 1, 2, 3], ... "b": [1, 1, 2, 2], ... } ... ) >>> df.select(pl.all().mode()) shape: (2, 2) βββββββ¬ββββββ β a β b β β --- β --- β β i64 β i64 β βββββββͺββββββ‘ β 1 β 1 β β 1 β 2 β βββββββ΄ββββββ
- mul(other: Any) Self [source]
Method equivalent of multiplication operator
expr * other
.- Parameters:
- other
Numeric literal or expression value.
Examples
>>> df = pl.DataFrame({"x": [1, 2, 4, 8, 16]}) >>> df.with_columns( ... pl.col("x").mul(2).alias("x*2"), ... pl.col("x").mul(pl.col("x").log(2)).alias("x * xlog2"), ... ) shape: (5, 3) βββββββ¬ββββββ¬ββββββββββββ β x β x*2 β x * xlog2 β β --- β --- β --- β β i64 β i64 β f64 β βββββββͺββββββͺββββββββββββ‘ β 1 β 2 β 0.0 β β 2 β 4 β 2.0 β β 4 β 8 β 8.0 β β 8 β 16 β 24.0 β β 16 β 32 β 64.0 β βββββββ΄ββββββ΄ββββββββββββ
- n_unique() Self [source]
Count unique values.
Examples
>>> df = pl.DataFrame({"a": [1, 1, 2]}) >>> df.select(pl.col("a").n_unique()) shape: (1, 1) βββββββ β a β β --- β β u32 β βββββββ‘ β 2 β βββββββ
- nan_max() Self [source]
Get maximum value, but propagate/poison encountered NaN values.
This differs from numpyβs nanmax as numpy defaults to propagating NaN values, whereas polars defaults to ignoring them.
Examples
>>> df = pl.DataFrame({"a": [0, float("nan")]}) >>> df.select(pl.col("a").nan_max()) shape: (1, 1) βββββββ β a β β --- β β f64 β βββββββ‘ β NaN β βββββββ
- nan_min() Self [source]
Get minimum value, but propagate/poison encountered NaN values.
This differs from numpyβs nanmax as numpy defaults to propagating NaN values, whereas polars defaults to ignoring them.
Examples
>>> df = pl.DataFrame({"a": [0, float("nan")]}) >>> df.select(pl.col("a").nan_min()) shape: (1, 1) βββββββ β a β β --- β β f64 β βββββββ‘ β NaN β βββββββ
- ne(other: Any) Self [source]
Method equivalent of inequality operator
expr != other
.- Parameters:
- other
A literal or expression value to compare with.
Examples
>>> df = pl.DataFrame( ... data={ ... "x": [1.0, 2.0, float("nan"), 4.0], ... "y": [2.0, 2.0, float("nan"), 4.0], ... } ... ) >>> df.with_columns( ... pl.col("x").ne(pl.col("y")).alias("x != y"), ... ) shape: (4, 3) βββββββ¬ββββββ¬βββββββββ β x β y β x != y β β --- β --- β --- β β f64 β f64 β bool β βββββββͺββββββͺβββββββββ‘ β 1.0 β 2.0 β true β β 2.0 β 2.0 β false β β NaN β NaN β true β β 4.0 β 4.0 β false β βββββββ΄ββββββ΄βββββββββ
- ne_missing(other: Any) Self [source]
Method equivalent of equality operator
expr != other
where None == None`.This differs from default
ne
where null values are propagated.- Parameters:
- other
A literal or expression value to compare with.
Examples
>>> df = pl.DataFrame( ... data={ ... "x": [1.0, 2.0, float("nan"), 4.0, None, None], ... "y": [2.0, 2.0, float("nan"), 4.0, 5.0, None], ... } ... ) >>> df.with_columns( ... pl.col("x").ne_missing(pl.col("y")).alias("x != y"), ... ) shape: (6, 3) ββββββββ¬βββββββ¬βββββββββ β x β y β x != y β β --- β --- β --- β β f64 β f64 β bool β ββββββββͺβββββββͺβββββββββ‘ β 1.0 β 2.0 β true β β 2.0 β 2.0 β false β β NaN β NaN β true β β 4.0 β 4.0 β false β β null β 5.0 β true β β null β null β false β ββββββββ΄βββββββ΄βββββββββ
- null_count() Self [source]
Count null values.
Examples
>>> df = pl.DataFrame( ... { ... "a": [None, 1, None], ... "b": [1, 2, 3], ... } ... ) >>> df.select(pl.all().null_count()) shape: (1, 2) βββββββ¬ββββββ β a β b β β --- β --- β β u32 β u32 β βββββββͺββββββ‘ β 2 β 0 β βββββββ΄ββββββ
- or_(*others: Any) Self [source]
Method equivalent of bitwise βorβ operator
expr | other | ...
.- Parameters:
- *others
One or more integer or boolean expressions to evaluate/combine.
Examples
>>> df = pl.DataFrame( ... data={ ... "x": [5, 6, 7, 4, 8], ... "y": [1.5, 2.5, 1.0, 4.0, -5.75], ... "z": [-9, 2, -1, 4, 8], ... } ... ) >>> df.select( ... (pl.col("x") == pl.col("y")) ... .or_( ... pl.col("x") == pl.col("y"), ... pl.col("y") == pl.col("z"), ... pl.col("y").cast(int) == pl.col("z"), ... ) ... .alias("any") ... ) shape: (5, 1) βββββββββ β any β β --- β β bool β βββββββββ‘ β false β β true β β false β β true β β false β βββββββββ
- over(
- expr: IntoExpr | Iterable[IntoExpr],
- *more_exprs: IntoExpr,
- mapping_strategy: WindowMappingStrategy = 'group_to_rows',
Compute expressions over the given groups.
This expression is similar to performing a groupby aggregation and joining the result back into the original dataframe.
The outcome is similar to how window functions work in PostgreSQL.
- Parameters:
- expr
Column(s) to group by. Accepts expression input. Strings are parsed as column names.
- *more_exprs
Additional columns to group by, specified as positional arguments.
- mapping_strategy: {βgroup_to_rowsβ, βjoinβ, βexplodeβ}
- group_to_rows
If the aggregation results in multiple values, assign them back to their position in the DataFrame. This can only be done if the group yields the same elements before aggregation as after.
- join
Join the groups as βList<group_dtype>β to the row positions. warning: this can be memory intensive.
- explode
Donβt do any mapping, but simply flatten the group. This only makes sense if the input data is sorted.
Examples
Pass the name of a column to compute the expression over that column.
>>> df = pl.DataFrame( ... { ... "a": ["a", "a", "b", "b", "b"], ... "b": [1, 2, 3, 5, 3], ... "c": [5, 4, 3, 2, 1], ... } ... ) >>> df.with_columns(pl.col("c").max().over("a").suffix("_max")) shape: (5, 4) βββββββ¬ββββββ¬ββββββ¬ββββββββ β a β b β c β c_max β β --- β --- β --- β --- β β str β i64 β i64 β i64 β βββββββͺββββββͺββββββͺββββββββ‘ β a β 1 β 5 β 5 β β a β 2 β 4 β 5 β β b β 3 β 3 β 3 β β b β 5 β 2 β 3 β β b β 3 β 1 β 3 β βββββββ΄ββββββ΄ββββββ΄ββββββββ
Expression input is supported.
>>> df.with_columns(pl.col("c").max().over(pl.col("b") // 2).suffix("_max")) shape: (5, 4) βββββββ¬ββββββ¬ββββββ¬ββββββββ β a β b β c β c_max β β --- β --- β --- β --- β β str β i64 β i64 β i64 β βββββββͺββββββͺββββββͺββββββββ‘ β a β 1 β 5 β 5 β β a β 2 β 4 β 4 β β b β 3 β 3 β 4 β β b β 5 β 2 β 2 β β b β 3 β 1 β 4 β βββββββ΄ββββββ΄ββββββ΄ββββββββ
Group by multiple columns by passing a list of column names or expressions.
>>> df.with_columns(pl.col("c").min().over(["a", "b"]).suffix("_min")) shape: (5, 4) βββββββ¬ββββββ¬ββββββ¬ββββββββ β a β b β c β c_min β β --- β --- β --- β --- β β str β i64 β i64 β i64 β βββββββͺββββββͺββββββͺββββββββ‘ β a β 1 β 5 β 5 β β a β 2 β 4 β 4 β β b β 3 β 3 β 1 β β b β 5 β 2 β 2 β β b β 3 β 1 β 1 β βββββββ΄ββββββ΄ββββββ΄ββββββββ
Or use positional arguments to group by multiple columns in the same way.
>>> df.with_columns(pl.col("c").min().over("a", pl.col("b") % 2).suffix("_min")) shape: (5, 4) βββββββ¬ββββββ¬ββββββ¬ββββββββ β a β b β c β c_min β β --- β --- β --- β --- β β str β i64 β i64 β i64 β βββββββͺββββββͺββββββͺββββββββ‘ β a β 1 β 5 β 5 β β a β 2 β 4 β 4 β β b β 3 β 3 β 1 β β b β 5 β 2 β 1 β β b β 3 β 1 β 1 β βββββββ΄ββββββ΄ββββββ΄ββββββββ
- pct_change(n: int = 1) Self [source]
Computes percentage change between values.
Percentage change (as fraction) between current element and most-recent non-null element at least
n
period(s) before the current element.Computes the change from the previous row by default.
- Parameters:
- n
periods to shift for forming percent change.
Examples
>>> df = pl.DataFrame( ... { ... "a": [10, 11, 12, None, 12], ... } ... ) >>> df.with_columns(pl.col("a").pct_change().alias("pct_change")) shape: (5, 2) ββββββββ¬βββββββββββββ β a β pct_change β β --- β --- β β i64 β f64 β ββββββββͺβββββββββββββ‘ β 10 β null β β 11 β 0.1 β β 12 β 0.090909 β β null β 0.0 β β 12 β 0.0 β ββββββββ΄βββββββββββββ
- pipe(
- function: Callable[Concatenate[Expr, P], T],
- *args: P.args,
- **kwargs: P.kwargs,
Offers a structured way to apply a sequence of user-defined functions (UDFs).
- Parameters:
- function
Callable; will receive the expression as the first parameter, followed by any given args/kwargs.
- *args
Arguments to pass to the UDF.
- **kwargs
Keyword arguments to pass to the UDF.
Examples
>>> def extract_number(expr: pl.Expr) -> pl.Expr: ... """Extract the digits from a string.""" ... return expr.str.extract(r"\d+", 0).cast(pl.Int64) >>> >>> def scale_negative_even(expr: pl.Expr, *, n: int = 1) -> pl.Expr: ... """Set even numbers negative, and scale by a user-supplied value.""" ... expr = pl.when(expr % 2 == 0).then(-expr).otherwise(expr) ... return expr * n >>> >>> df = pl.DataFrame({"val": ["a: 1", "b: 2", "c: 3", "d: 4"]}) >>> df.with_columns( ... udfs=( ... pl.col("val").pipe(extract_number).pipe(scale_negative_even, n=5) ... ), ... ) shape: (4, 2) ββββββββ¬βββββββ β val β udfs β β --- β --- β β str β i64 β ββββββββͺβββββββ‘ β a: 1 β 5 β β b: 2 β -10 β β c: 3 β 15 β β d: 4 β -20 β ββββββββ΄βββββββ
- pow(exponent: int | float | None | Series | Expr) Self [source]
Method equivalent of exponentiation operator
expr ** exponent
.- Parameters:
- exponent
Numeric literal or expression exponent value.
Examples
>>> df = pl.DataFrame({"x": [1, 2, 4, 8]}) >>> df.with_columns( ... pl.col("x").pow(3).alias("cube"), ... pl.col("x").pow(pl.col("x").log(2)).alias("x ** xlog2"), ... ) shape: (4, 3) βββββββ¬ββββββββ¬βββββββββββββ β x β cube β x ** xlog2 β β --- β --- β --- β β i64 β f64 β f64 β βββββββͺββββββββͺβββββββββββββ‘ β 1 β 1.0 β 1.0 β β 2 β 8.0 β 2.0 β β 4 β 64.0 β 16.0 β β 8 β 512.0 β 512.0 β βββββββ΄ββββββββ΄βββββββββββββ
- prefix(prefix: str) Self [source]
Add a prefix to the root column name of the expression.
- Parameters:
- prefix
Prefix to add to the root column name.
See also
Notes
This will undo any previous renaming operations on the expression.
Due to implementation constraints, this method can only be called as the last expression in a chain.
Examples
>>> df = pl.DataFrame( ... { ... "a": [1, 2, 3], ... "b": ["x", "y", "z"], ... } ... ) >>> df.with_columns(pl.all().reverse().prefix("reverse_")) shape: (3, 4) βββββββ¬ββββββ¬ββββββββββββ¬ββββββββββββ β a β b β reverse_a β reverse_b β β --- β --- β --- β --- β β i64 β str β i64 β str β βββββββͺββββββͺββββββββββββͺββββββββββββ‘ β 1 β x β 3 β z β β 2 β y β 2 β y β β 3 β z β 1 β x β βββββββ΄ββββββ΄ββββββββββββ΄ββββββββββββ
- product() Self [source]
Compute the product of an expression.
Examples
>>> df = pl.DataFrame({"a": [1, 2, 3]}) >>> df.select(pl.col("a").product()) shape: (1, 1) βββββββ β a β β --- β β i64 β βββββββ‘ β 6 β βββββββ
- qcut(
- quantiles: Sequence[float] | int,
- *,
- labels: Sequence[str] | None = None,
- left_closed: bool = False,
- allow_duplicates: bool = False,
- include_breaks: bool = False,
Bin continuous values into discrete categories based on their quantiles.
- Parameters:
- quantiles
Either a list of quantile probabilities between 0 and 1 or a positive integer determining the number of bins with uniform probability.
- labels
Names of the categories. The number of labels must be equal to the number of categories.
- left_closed
Set the intervals to be left-closed instead of right-closed.
- allow_duplicates
If set to
True
, duplicates in the resulting quantiles are dropped, rather than raising a DuplicateError. This can happen even with unique probabilities, depending on the data.- include_breaks
Include a column with the right endpoint of the bin each observation falls in. This will change the data type of the output from a
Categorical
to aStruct
.
- Returns:
- Expr
Expression of data type
Categorical
ifinclude_breaks
is set toFalse
(default), otherwise an expression of data typeStruct
.
See also
Examples
Divide a column into three categories according to pre-defined quantile probabilities.
>>> df = pl.DataFrame({"foo": [-2, -1, 0, 1, 2]}) >>> df.with_columns( ... pl.col("foo").qcut([0.25, 0.75], labels=["a", "b", "c"]).alias("qcut") ... ) shape: (5, 2) βββββββ¬βββββββ β foo β qcut β β --- β --- β β i64 β cat β βββββββͺβββββββ‘ β -2 β a β β -1 β a β β 0 β b β β 1 β b β β 2 β c β βββββββ΄βββββββ
Divide a column into two categories using uniform quantile probabilities.
>>> df.with_columns( ... pl.col("foo") ... .qcut(2, labels=["low", "high"], left_closed=True) ... .alias("qcut") ... ) shape: (5, 2) βββββββ¬βββββββ β foo β qcut β β --- β --- β β i64 β cat β βββββββͺβββββββ‘ β -2 β low β β -1 β low β β 0 β high β β 1 β high β β 2 β high β βββββββ΄βββββββ
Add both the category and the breakpoint.
>>> df.with_columns( ... pl.col("foo").qcut([0.25, 0.75], include_breaks=True).alias("qcut") ... ).unnest("qcut") shape: (5, 3) βββββββ¬βββββββ¬βββββββββββββ β foo β brk β foo_bin β β --- β --- β --- β β i64 β f64 β cat β βββββββͺβββββββͺβββββββββββββ‘ β -2 β -1.0 β (-inf, -1] β β -1 β -1.0 β (-inf, -1] β β 0 β 1.0 β (-1, 1] β β 1 β 1.0 β (-1, 1] β β 2 β inf β (1, inf] β βββββββ΄βββββββ΄βββββββββββββ
- quantile(
- quantile: float | Expr,
- interpolation: RollingInterpolationMethod = 'nearest',
Get quantile value.
- Parameters:
- quantile
Quantile between 0.0 and 1.0.
- interpolation{βnearestβ, βhigherβ, βlowerβ, βmidpointβ, βlinearβ}
Interpolation method.
Examples
>>> df = pl.DataFrame({"a": [0, 1, 2, 3, 4, 5]}) >>> df.select(pl.col("a").quantile(0.3)) shape: (1, 1) βββββββ β a β β --- β β f64 β βββββββ‘ β 1.0 β βββββββ >>> df.select(pl.col("a").quantile(0.3, interpolation="higher")) shape: (1, 1) βββββββ β a β β --- β β f64 β βββββββ‘ β 2.0 β βββββββ >>> df.select(pl.col("a").quantile(0.3, interpolation="lower")) shape: (1, 1) βββββββ β a β β --- β β f64 β βββββββ‘ β 1.0 β βββββββ >>> df.select(pl.col("a").quantile(0.3, interpolation="midpoint")) shape: (1, 1) βββββββ β a β β --- β β f64 β βββββββ‘ β 1.5 β βββββββ >>> df.select(pl.col("a").quantile(0.3, interpolation="linear")) shape: (1, 1) βββββββ β a β β --- β β f64 β βββββββ‘ β 1.5 β βββββββ
- radians() Self [source]
Convert from degrees to radians.
- Returns:
- Expr
Expression of data type
Float64
.
Examples
>>> df = pl.DataFrame({"a": [-720, -540, -360, -180, 0, 180, 360, 540, 720]}) >>> df.select(pl.col("a").radians()) shape: (9, 1) ββββββββββββββ β a β β --- β β f64 β ββββββββββββββ‘ β -12.566371 β β -9.424778 β β -6.283185 β β -3.141593 β β 0.0 β β 3.141593 β β 6.283185 β β 9.424778 β β 12.566371 β ββββββββββββββ
- rank( ) Self [source]
Assign ranks to data, dealing with ties appropriately.
- Parameters:
- method{βaverageβ, βminβ, βmaxβ, βdenseβ, βordinalβ, βrandomβ}
The method used to assign ranks to tied elements. The following methods are available (default is βaverageβ):
βaverageβ : The average of the ranks that would have been assigned to all the tied values is assigned to each value.
βminβ : The minimum of the ranks that would have been assigned to all the tied values is assigned to each value. (This is also referred to as βcompetitionβ ranking.)
βmaxβ : The maximum of the ranks that would have been assigned to all the tied values is assigned to each value.
βdenseβ : Like βminβ, but the rank of the next highest element is assigned the rank immediately after those assigned to the tied elements.
βordinalβ : All values are given a distinct rank, corresponding to the order that the values occur in the Series.
βrandomβ : Like βordinalβ, but the rank for ties is not dependent on the order that the values occur in the Series.
- descending
Rank in descending order.
- seed
If method=βrandomβ, use this as seed.
Examples
The βaverageβ method:
>>> df = pl.DataFrame({"a": [3, 6, 1, 1, 6]}) >>> df.select(pl.col("a").rank()) shape: (5, 1) βββββββ β a β β --- β β f32 β βββββββ‘ β 3.0 β β 4.5 β β 1.5 β β 1.5 β β 4.5 β βββββββ
The βordinalβ method:
>>> df = pl.DataFrame({"a": [3, 6, 1, 1, 6]}) >>> df.select(pl.col("a").rank("ordinal")) shape: (5, 1) βββββββ β a β β --- β β u32 β βββββββ‘ β 3 β β 4 β β 1 β β 2 β β 5 β βββββββ
Use βrankβ with βoverβ to rank within groups:
>>> df = pl.DataFrame({"a": [1, 1, 2, 2, 2], "b": [6, 7, 5, 14, 11]}) >>> df.with_columns(pl.col("b").rank().over("a").alias("rank")) shape: (5, 3) βββββββ¬ββββββ¬βββββββ β a β b β rank β β --- β --- β --- β β i64 β i64 β f32 β βββββββͺββββββͺβββββββ‘ β 1 β 6 β 1.0 β β 1 β 7 β 2.0 β β 2 β 5 β 1.0 β β 2 β 14 β 3.0 β β 2 β 11 β 2.0 β βββββββ΄ββββββ΄βββββββ
- rechunk() Self [source]
Create a single chunk of memory for this Series.
Examples
>>> df = pl.DataFrame({"a": [1, 1, 2]})
Create a Series with 3 nulls, append column a then rechunk
>>> df.select(pl.repeat(None, 3).append(pl.col("a")).rechunk()) shape: (6, 1) ββββββββββ β repeat β β --- β β i64 β ββββββββββ‘ β null β β null β β null β β 1 β β 1 β β 2 β ββββββββββ
- reinterpret(*, signed: bool = True) Self [source]
Reinterpret the underlying bits as a signed/unsigned integer.
This operation is only allowed for 64bit integers. For lower bits integers, you can safely use that cast operation.
- Parameters:
- signed
If True, reinterpret as pl.Int64. Otherwise, reinterpret as pl.UInt64.
Examples
>>> s = pl.Series("a", [1, 1, 2], dtype=pl.UInt64) >>> df = pl.DataFrame([s]) >>> df.select( ... [ ... pl.col("a").reinterpret(signed=True).alias("reinterpreted"), ... pl.col("a").alias("original"), ... ] ... ) shape: (3, 2) βββββββββββββββββ¬βββββββββββ β reinterpreted β original β β --- β --- β β i64 β u64 β βββββββββββββββββͺβββββββββββ‘ β 1 β 1 β β 1 β 1 β β 2 β 2 β βββββββββββββββββ΄βββββββββββ
- repeat_by(by: Series | Expr | str | int) Self [source]
Repeat the elements in this Series as specified in the given expression.
The repeated elements are expanded into a List.
- Parameters:
- by
Numeric column that determines how often the values will be repeated. The column will be coerced to UInt32. Give this dtype to make the coercion a no-op.
- Returns:
- Expr
Expression of data type
List
, where the inner data type is equal to the original data type.
Examples
>>> df = pl.DataFrame( ... { ... "a": ["x", "y", "z"], ... "n": [1, 2, 3], ... } ... ) >>> df.select(pl.col("a").repeat_by("n")) shape: (3, 1) βββββββββββββββββββ β a β β --- β β list[str] β βββββββββββββββββββ‘ β ["x"] β β ["y", "y"] β β ["z", "z", "z"] β βββββββββββββββββββ
- reshape(dimensions: tuple[int, ...]) Self [source]
Reshape this Expr to a flat Series or a Series of Lists.
- Parameters:
- dimensions
Tuple of the dimension sizes. If a -1 is used in any of the dimensions, that dimension is inferred.
- Returns:
- Expr
If a single dimension is given, results in an expression of the original data type. If a multiple dimensions are given, results in an expression of data type
List
with shape (rows, cols).
See also
Expr.list.explode
Explode a list column.
Examples
>>> df = pl.DataFrame({"foo": [1, 2, 3, 4, 5, 6, 7, 8, 9]}) >>> df.select(pl.col("foo").reshape((3, 3))) shape: (3, 1) βββββββββββββ β foo β β --- β β list[i64] β βββββββββββββ‘ β [1, 2, 3] β β [4, 5, 6] β β [7, 8, 9] β βββββββββββββ
- reverse() Self [source]
Reverse the selection.
Examples
>>> df = pl.DataFrame( ... { ... "A": [1, 2, 3, 4, 5], ... "fruits": ["banana", "banana", "apple", "apple", "banana"], ... "B": [5, 4, 3, 2, 1], ... "cars": ["beetle", "audi", "beetle", "beetle", "beetle"], ... } ... ) >>> df.select( ... [ ... pl.all(), ... pl.all().reverse().suffix("_reverse"), ... ] ... ) shape: (5, 8) βββββββ¬βββββββββ¬ββββββ¬βββββββββ¬ββββββββββββ¬βββββββββββββββββ¬ββββββββββββ¬βββββββββββββββ β A β fruits β B β cars β A_reverse β fruits_reverse β B_reverse β cars_reverse β β --- β --- β --- β --- β --- β --- β --- β --- β β i64 β str β i64 β str β i64 β str β i64 β str β βββββββͺβββββββββͺββββββͺβββββββββͺββββββββββββͺβββββββββββββββββͺββββββββββββͺβββββββββββββββ‘ β 1 β banana β 5 β beetle β 5 β banana β 1 β beetle β β 2 β banana β 4 β audi β 4 β apple β 2 β beetle β β 3 β apple β 3 β beetle β 3 β apple β 3 β beetle β β 4 β apple β 2 β beetle β 2 β banana β 4 β audi β β 5 β banana β 1 β beetle β 1 β banana β 5 β beetle β βββββββ΄βββββββββ΄ββββββ΄βββββββββ΄ββββββββββββ΄βββββββββββββββββ΄ββββββββββββ΄βββββββββββββββ
- rle() Self [source]
Get the lengths of runs of identical values.
- Returns:
- Expr
Expression of data type
Struct
with Fields βlengthsβ and βvaluesβ.
Examples
>>> df = pl.DataFrame(pl.Series("s", [1, 1, 2, 1, None, 1, 3, 3])) >>> df.select(pl.col("s").rle()).unnest("s") shape: (6, 2) βββββββββββ¬βββββββββ β lengths β values β β --- β --- β β i32 β i64 β βββββββββββͺβββββββββ‘ β 2 β 1 β β 1 β 2 β β 1 β 1 β β 1 β null β β 1 β 1 β β 2 β 3 β βββββββββββ΄βββββββββ
- rle_id() Self [source]
Map values to run IDs.
Similar to RLE, but it maps each value to an ID corresponding to the run into which it falls. This is especially useful when you want to define groups by runs of identical values rather than the values themselves.
Examples
>>> df = pl.DataFrame(dict(a=[1, 2, 1, 1, 1], b=["x", "x", None, "y", "y"])) >>> # It works on structs of multiple values too! >>> df.with_columns(a_r=pl.col("a").rle_id(), ab_r=pl.struct("a", "b").rle_id()) shape: (5, 4) βββββββ¬βββββββ¬ββββββ¬βββββββ β a β b β a_r β ab_r β β --- β --- β --- β --- β β i64 β str β u32 β u32 β βββββββͺβββββββͺββββββͺβββββββ‘ β 1 β x β 0 β 0 β β 2 β x β 1 β 1 β β 1 β null β 2 β 2 β β 1 β y β 2 β 3 β β 1 β y β 2 β 3 β βββββββ΄βββββββ΄ββββββ΄βββββββ
- rolling_apply(
- function: Callable[[Series], Any],
- window_size: int,
- weights: list[float] | None = None,
- min_periods: int | None = None,
- *,
- center: bool = False,
Apply a custom rolling window function.
Prefer the specific rolling window functions over this one, as they are faster.
Prefer:
rolling_min
rolling_max
rolling_mean
rolling_sum
The window at a given row will include the row itself and the window_size - 1 elements before it.
- Parameters:
- function
Aggregation function
- window_size
The length of the window.
- weights
An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.
- min_periods
The number of values in the window that should be non-null before computing a result. If None, it will be set equal to window size.
- center
Set the labels at the center of the window
Examples
>>> df = pl.DataFrame( ... { ... "A": [1.0, 2.0, 9.0, 2.0, 13.0], ... } ... ) >>> df.select( ... [ ... pl.col("A").rolling_apply(lambda s: s.std(), window_size=3), ... ] ... ) shape: (5, 1) ββββββββββββ β A β β --- β β f64 β ββββββββββββ‘ β null β β null β β 4.358899 β β 4.041452 β β 5.567764 β ββββββββββββ
- rolling_max(
- window_size: int | timedelta | str,
- weights: list[float] | None = None,
- min_periods: int | None = None,
- *,
- center: bool = False,
- by: str | None = None,
- closed: ClosedInterval = 'left',
Apply a rolling max (moving max) over the values in this array.
A window of length window_size will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by the weight vector. The resulting values will be aggregated to their sum.
If
by
has not been specified (the default), the window at a given row will include the row itself, and the window_size - 1 elements before it.If you pass a
by
column<t_0, t_1, ..., t_n>
, then closed=βleftβ means the windows will be:[t_0 - window_size, t_0)
[t_1 - window_size, t_1)
β¦
[t_n - window_size, t_n)
With closed=βrightβ, the left endpoint is not included and the right endpoint is included.
- Parameters:
- window_size
The length of the window. Can be a fixed integer size, or a dynamic temporal size indicated by a timedelta or the following string language:
1ns (1 nanosecond)
1us (1 microsecond)
1ms (1 millisecond)
1s (1 second)
1m (1 minute)
1h (1 hour)
1d (1 calendar day)
1w (1 calendar week)
1mo (1 calendar month)
1q (1 calendar quarter)
1y (1 calendar year)
1i (1 index count)
Suffix with β_saturatingβ to indicate that dates too large for their month should saturate at the largest date (e.g. 2022-02-29 -> 2022-02-28) instead of erroring.
By βcalendar dayβ, we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for βcalendar weekβ, βcalendar monthβ, βcalendar quarterβ, and βcalendar yearβ.
If a timedelta or the dynamic string language is used, the by and closed arguments must also be set.
- weights
An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.
- min_periods
The number of values in the window that should be non-null before computing a result. If None, it will be set equal to window size.
- center
Set the labels at the center of the window
- by
If the window_size is temporal, for instance β5hβ or β3sβ, you must set the column that will be used to determine the windows. This column must be of dtype Datetime.
- closed{βleftβ, βrightβ, βbothβ, βnoneβ}
Define which sides of the temporal interval are closed (inclusive); only applicable if by has been set.
Warning
This functionality is experimental and may change without it being considered a breaking change.
Notes
If you want to compute multiple aggregation statistics over the same dynamic window, consider using groupby_rolling this method can cache the window size computation.
Examples
>>> df = pl.DataFrame({"A": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]}) >>> df.with_columns( ... rolling_max=pl.col("A").rolling_max(window_size=2), ... ) shape: (6, 2) βββββββ¬ββββββββββββββ β A β rolling_max β β --- β --- β β f64 β f64 β βββββββͺββββββββββββββ‘ β 1.0 β null β β 2.0 β 2.0 β β 3.0 β 3.0 β β 4.0 β 4.0 β β 5.0 β 5.0 β β 6.0 β 6.0 β βββββββ΄ββββββββββββββ
Specify weights to multiply the values in the window with:
>>> df.with_columns( ... rolling_max=pl.col("A").rolling_max( ... window_size=2, weights=[0.25, 0.75] ... ), ... ) shape: (6, 2) βββββββ¬ββββββββββββββ β A β rolling_max β β --- β --- β β f64 β f64 β βββββββͺββββββββββββββ‘ β 1.0 β null β β 2.0 β 1.5 β β 3.0 β 2.25 β β 4.0 β 3.0 β β 5.0 β 3.75 β β 6.0 β 4.5 β βββββββ΄ββββββββββββββ
Center the values in the window
>>> df.with_columns( ... rolling_max=pl.col("A").rolling_max(window_size=3, center=True), ... ) shape: (6, 2) βββββββ¬ββββββββββββββ β A β rolling_max β β --- β --- β β f64 β f64 β βββββββͺββββββββββββββ‘ β 1.0 β null β β 2.0 β 3.0 β β 3.0 β 4.0 β β 4.0 β 5.0 β β 5.0 β 6.0 β β 6.0 β null β βββββββ΄ββββββββββββββ
Create a DataFrame with a datetime column and a row number column
>>> from datetime import timedelta, datetime >>> start = datetime(2001, 1, 1) >>> stop = datetime(2001, 1, 2) >>> df_temporal = pl.DataFrame( ... {"date": pl.date_range(start, stop, "1h", eager=True)} ... ).with_row_count() >>> df_temporal shape: (25, 2) ββββββββββ¬ββββββββββββββββββββββ β row_nr β date β β --- β --- β β u32 β datetime[ΞΌs] β ββββββββββͺββββββββββββββββββββββ‘ β 0 β 2001-01-01 00:00:00 β β 1 β 2001-01-01 01:00:00 β β 2 β 2001-01-01 02:00:00 β β 3 β 2001-01-01 03:00:00 β β β¦ β β¦ β β 21 β 2001-01-01 21:00:00 β β 22 β 2001-01-01 22:00:00 β β 23 β 2001-01-01 23:00:00 β β 24 β 2001-01-02 00:00:00 β ββββββββββ΄ββββββββββββββββββββββ
Compute the rolling max with the default left closure of temporal windows
>>> df_temporal.with_columns( ... rolling_row_max=pl.col("row_nr").rolling_max( ... window_size="2h", by="date", closed="left" ... ) ... ) shape: (25, 3) ββββββββββ¬ββββββββββββββββββββββ¬ββββββββββββββββββ β row_nr β date β rolling_row_max β β --- β --- β --- β β u32 β datetime[ΞΌs] β u32 β ββββββββββͺββββββββββββββββββββββͺββββββββββββββββββ‘ β 0 β 2001-01-01 00:00:00 β null β β 1 β 2001-01-01 01:00:00 β 0 β β 2 β 2001-01-01 02:00:00 β 1 β β 3 β 2001-01-01 03:00:00 β 2 β β β¦ β β¦ β β¦ β β 21 β 2001-01-01 21:00:00 β 20 β β 22 β 2001-01-01 22:00:00 β 21 β β 23 β 2001-01-01 23:00:00 β 22 β β 24 β 2001-01-02 00:00:00 β 23 β ββββββββββ΄ββββββββββββββββββββββ΄ββββββββββββββββββ
Compute the rolling max with the closure of windows on both sides
>>> df_temporal.with_columns( ... rolling_row_max=pl.col("row_nr").rolling_max( ... window_size="2h", by="date", closed="both" ... ) ... ) shape: (25, 3) ββββββββββ¬ββββββββββββββββββββββ¬ββββββββββββββββββ β row_nr β date β rolling_row_max β β --- β --- β --- β β u32 β datetime[ΞΌs] β u32 β ββββββββββͺββββββββββββββββββββββͺββββββββββββββββββ‘ β 0 β 2001-01-01 00:00:00 β 0 β β 1 β 2001-01-01 01:00:00 β 1 β β 2 β 2001-01-01 02:00:00 β 2 β β 3 β 2001-01-01 03:00:00 β 3 β β β¦ β β¦ β β¦ β β 21 β 2001-01-01 21:00:00 β 21 β β 22 β 2001-01-01 22:00:00 β 22 β β 23 β 2001-01-01 23:00:00 β 23 β β 24 β 2001-01-02 00:00:00 β 24 β ββββββββββ΄ββββββββββββββββββββββ΄ββββββββββββββββββ
- rolling_mean(
- window_size: int | timedelta | str,
- weights: list[float] | None = None,
- min_periods: int | None = None,
- *,
- center: bool = False,
- by: str | None = None,
- closed: ClosedInterval = 'left',
Apply a rolling mean (moving mean) over the values in this array.
A window of length window_size will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by the weight vector. The resulting values will be aggregated to their mean.
If
by
has not been specified (the default), the window at a given row will include the row itself, and the window_size - 1 elements before it.If you pass a
by
column<t_0, t_1, ..., t_n>
, then closed=βleftβ means the windows will be:[t_0 - window_size, t_0)
[t_1 - window_size, t_1)
β¦
[t_n - window_size, t_n)
With closed=βrightβ, the left endpoint is not included and the right endpoint is included.
- Parameters:
- window_size
The length of the window. Can be a fixed integer size, or a dynamic temporal size indicated by a timedelta or the following string language:
1ns (1 nanosecond)
1us (1 microsecond)
1ms (1 millisecond)
1s (1 second)
1m (1 minute)
1h (1 hour)
1d (1 calendar day)
1w (1 calendar week)
1mo (1 calendar month)
1q (1 calendar quarter)
1y (1 calendar year)
1i (1 index count)
Suffix with β_saturatingβ to indicate that dates too large for their month should saturate at the largest date (e.g. 2022-02-29 -> 2022-02-28) instead of erroring.
By βcalendar dayβ, we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for βcalendar weekβ, βcalendar monthβ, βcalendar quarterβ, and βcalendar yearβ.
If a timedelta or the dynamic string language is used, the by and closed arguments must also be set.
- weights
An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.
- min_periods
The number of values in the window that should be non-null before computing a result. If None, it will be set equal to window size.
- center
Set the labels at the center of the window
- by
If the window_size is temporal for instance β5hβ or β3sβ, you must set the column that will be used to determine the windows. This column must be of dtype Datetime.
- closed{βleftβ, βrightβ, βbothβ, βnoneβ}
Define which sides of the temporal interval are closed (inclusive); only applicable if by has been set.
Warning
This functionality is experimental and may change without it being considered a breaking change.
Notes
If you want to compute multiple aggregation statistics over the same dynamic window, consider using groupby_rolling this method can cache the window size computation.
Examples
>>> df = pl.DataFrame({"A": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]}) >>> df.with_columns( ... rolling_mean=pl.col("A").rolling_mean(window_size=2), ... ) shape: (6, 2) βββββββ¬βββββββββββββββ β A β rolling_mean β β --- β --- β β f64 β f64 β βββββββͺβββββββββββββββ‘ β 1.0 β null β β 2.0 β 1.5 β β 3.0 β 2.5 β β 4.0 β 3.5 β β 5.0 β 4.5 β β 6.0 β 5.5 β βββββββ΄βββββββββββββββ
Specify weights to multiply the values in the window with:
>>> df.with_columns( ... rolling_mean=pl.col("A").rolling_mean( ... window_size=2, weights=[0.25, 0.75] ... ), ... ) shape: (6, 2) βββββββ¬βββββββββββββββ β A β rolling_mean β β --- β --- β β f64 β f64 β βββββββͺβββββββββββββββ‘ β 1.0 β null β β 2.0 β 1.75 β β 3.0 β 2.75 β β 4.0 β 3.75 β β 5.0 β 4.75 β β 6.0 β 5.75 β βββββββ΄βββββββββββββββ
Center the values in the window
>>> df.with_columns( ... rolling_mean=pl.col("A").rolling_mean(window_size=3, center=True), ... ) shape: (6, 2) βββββββ¬βββββββββββββββ β A β rolling_mean β β --- β --- β β f64 β f64 β βββββββͺβββββββββββββββ‘ β 1.0 β null β β 2.0 β 2.0 β β 3.0 β 3.0 β β 4.0 β 4.0 β β 5.0 β 5.0 β β 6.0 β null β βββββββ΄βββββββββββββββ
Create a DataFrame with a datetime column and a row number column
>>> from datetime import timedelta, datetime >>> start = datetime(2001, 1, 1) >>> stop = datetime(2001, 1, 2) >>> df_temporal = pl.DataFrame( ... {"date": pl.date_range(start, stop, "1h", eager=True)} ... ).with_row_count() >>> df_temporal shape: (25, 2) ββββββββββ¬ββββββββββββββββββββββ β row_nr β date β β --- β --- β β u32 β datetime[ΞΌs] β ββββββββββͺββββββββββββββββββββββ‘ β 0 β 2001-01-01 00:00:00 β β 1 β 2001-01-01 01:00:00 β β 2 β 2001-01-01 02:00:00 β β 3 β 2001-01-01 03:00:00 β β β¦ β β¦ β β 21 β 2001-01-01 21:00:00 β β 22 β 2001-01-01 22:00:00 β β 23 β 2001-01-01 23:00:00 β β 24 β 2001-01-02 00:00:00 β ββββββββββ΄ββββββββββββββββββββββ
Compute the rolling mean with the default left closure of temporal windows
>>> df_temporal.with_columns( ... rolling_row_mean=pl.col("row_nr").rolling_mean( ... window_size="2h", by="date", closed="left" ... ) ... ) shape: (25, 3) ββββββββββ¬ββββββββββββββββββββββ¬βββββββββββββββββββ β row_nr β date β rolling_row_mean β β --- β --- β --- β β u32 β datetime[ΞΌs] β f64 β ββββββββββͺββββββββββββββββββββββͺβββββββββββββββββββ‘ β 0 β 2001-01-01 00:00:00 β null β β 1 β 2001-01-01 01:00:00 β 0.0 β β 2 β 2001-01-01 02:00:00 β 0.5 β β 3 β 2001-01-01 03:00:00 β 1.5 β β β¦ β β¦ β β¦ β β 21 β 2001-01-01 21:00:00 β 19.5 β β 22 β 2001-01-01 22:00:00 β 20.5 β β 23 β 2001-01-01 23:00:00 β 21.5 β β 24 β 2001-01-02 00:00:00 β 22.5 β ββββββββββ΄ββββββββββββββββββββββ΄βββββββββββββββββββ
Compute the rolling mean with the closure of windows on both sides
>>> df_temporal.with_columns( ... rolling_row_mean=pl.col("row_nr").rolling_mean( ... window_size="2h", by="date", closed="both" ... ) ... ) shape: (25, 3) ββββββββββ¬ββββββββββββββββββββββ¬βββββββββββββββββββ β row_nr β date β rolling_row_mean β β --- β --- β --- β β u32 β datetime[ΞΌs] β f64 β ββββββββββͺββββββββββββββββββββββͺβββββββββββββββββββ‘ β 0 β 2001-01-01 00:00:00 β 0.0 β β 1 β 2001-01-01 01:00:00 β 0.5 β β 2 β 2001-01-01 02:00:00 β 1.0 β β 3 β 2001-01-01 03:00:00 β 2.0 β β β¦ β β¦ β β¦ β β 21 β 2001-01-01 21:00:00 β 20.0 β β 22 β 2001-01-01 22:00:00 β 21.0 β β 23 β 2001-01-01 23:00:00 β 22.0 β β 24 β 2001-01-02 00:00:00 β 23.0 β ββββββββββ΄ββββββββββββββββββββββ΄βββββββββββββββββββ
- rolling_median(
- window_size: int | timedelta | str,
- weights: list[float] | None = None,
- min_periods: int | None = None,
- *,
- center: bool = False,
- by: str | None = None,
- closed: ClosedInterval = 'left',
Compute a rolling median.
If
by
has not been specified (the default), the window at a given row will include the row itself, and the window_size - 1 elements before it.If you pass a
by
column<t_0, t_1, ..., t_n>
, then closed=βleftβ means the windows will be:[t_0 - window_size, t_0)
[t_1 - window_size, t_1)
β¦
[t_n - window_size, t_n)
With closed=βrightβ, the left endpoint is not included and the right endpoint is included.
- Parameters:
- window_size
The length of the window. Can be a fixed integer size, or a dynamic temporal size indicated by a timedelta or the following string language:
1ns (1 nanosecond)
1us (1 microsecond)
1ms (1 millisecond)
1s (1 second)
1m (1 minute)
1h (1 hour)
1d (1 calendar day)
1w (1 calendar week)
1mo (1 calendar month)
1q (1 calendar quarter)
1y (1 calendar year)
1i (1 index count)
Suffix with β_saturatingβ to indicate that dates too large for their month should saturate at the largest date (e.g. 2022-02-29 -> 2022-02-28) instead of erroring.
By βcalendar dayβ, we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for βcalendar weekβ, βcalendar monthβ, βcalendar quarterβ, and βcalendar yearβ.
If a timedelta or the dynamic string language is used, the by and closed arguments must also be set.
- weights
An optional slice with the same length as the window that determines the relative contribution of each value in a window to the output.
- min_periods
The number of values in the window that should be non-null before computing a result. If None, it will be set equal to window size.
- center
Set the labels at the center of the window
- by
If the window_size is temporal for instance β5hβ or β3sβ, you must set the column that will be used to determine the windows. This column must be of dtype Datetime.
- closed{βleftβ, βrightβ, βbothβ, βnoneβ}
Define which sides of the temporal interval are closed (inclusive); only applicable if by has been set.
Warning
This functionality is experimental and may change without it being considered a breaking change.
Notes
If you want to compute multiple aggregation statistics over the same dynamic window, consider using groupby_rolling this method can cache the window size computation.
Examples
>>> df = pl.DataFrame({"A": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]}) >>> df.with_columns( ... rolling_median=pl.col("A").rolling_median(window_size=2), ... ) shape: (6, 2) βββββββ¬βββββββββββββββββ β A β rolling_median β β --- β --- β β f64 β f64 β βββββββͺβββββββββββββββββ‘ β 1.0 β null β β 2.0 β 1.5 β β 3.0 β 2.5 β β 4.0 β 3.5 β β 5.0 β 4.5 β β 6.0 β 5.5 β βββββββ΄βββββββββββββββββ
Specify weights for the values in each window:
>>> df.with_columns( ... rolling_median=pl.col("A").rolling_median( ... window_size=2, weights=[0.25, 0.75] ... ), ... ) shape: (6, 2) βββββββ¬βββββββββββββββββ β A β rolling_median β β --- β --- β β f64 β f64 β βββββββͺβββββββββββββββββ‘ β 1.0 β null β β 2.0 β 1.5 β β 3.0 β 2.5 β β 4.0 β 3.5 β β 5.0 β 4.5 β β 6.0 β 5.5 β βββββββ΄βββββββββββββββββ
Center the values in the window
>>> df.with_columns( ... rolling_median=pl.col("A").rolling_median(window_size=3, center=True), ... ) shape: (6, 2) βββββββ¬βββββββββββββββββ β A β rolling_median β β --- β --- β β f64 β f64 β βββββββͺβββββββββββββββββ‘ β 1.0 β null β β 2.0 β 2.0 β β 3.0 β 3.0 β β 4.0 β 4.0 β β 5.0 β 5.0 β β 6.0 β null β βββββββ΄βββββββββββββββββ
- rolling_min(
- window_size: int | timedelta | str,
- weights: list[float] | None = None,
- min_periods: int | None = None,
- *,
- center: bool = False,
- by: str | None = None,
- closed: ClosedInterval = 'left',
Apply a rolling min (moving min) over the values in this array.
A window of length window_size will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by the weight vector. The resulting values will be aggregated to their sum.
If
by
has not been specified (the default), the window at a given row will include the row itself, and the window_size - 1 elements before it.If you pass a
by
column<t_0, t_1, ..., t_n>
, then closed=βleftβ means the windows will be:[t_0 - window_size, t_0)
[t_1 - window_size, t_1)
β¦
[t_n - window_size, t_n)
With closed=βrightβ, the left endpoint is not included and the right endpoint is included.
- Parameters:
- window_size
The length of the window. Can be a fixed integer size, or a dynamic temporal size indicated by a timedelta or the following string language:
1ns (1 nanosecond)
1us (1 microsecond)
1ms (1 millisecond)
1s (1 second)
1m (1 minute)
1h (1 hour)
1d (1 calendar day)
1w (1 calendar week)
1mo (1 calendar month)
1q (1 calendar quarter)
1y (1 calendar year)
1i (1 index count)
Suffix with β_saturatingβ to indicate that dates too large for their month should saturate at the largest date (e.g. 2022-02-29 -> 2022-02-28) instead of erroring.
By βcalendar dayβ, we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for βcalendar weekβ, βcalendar monthβ, βcalendar quarterβ, and βcalendar yearβ.
If a timedelta or the dynamic string language is used, the by and closed arguments must also be set.
- weights
An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.
- min_periods
The number of values in the window that should be non-null before computing a result. If None, it will be set equal to window size.
- center
Set the labels at the center of the window
- by
If the window_size is temporal for instance β5hβ or β3sβ, you must set the column that will be used to determine the windows. This column must be of dtype Datetime.
- closed{βleftβ, βrightβ, βbothβ, βnoneβ}
Define which sides of the temporal interval are closed (inclusive); only applicable if by has been set.
Warning
This functionality is experimental and may change without it being considered a breaking change.
Notes
If you want to compute multiple aggregation statistics over the same dynamic window, consider using groupby_rolling this method can cache the window size computation.
Examples
>>> df = pl.DataFrame({"A": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]}) >>> df.with_columns( ... rolling_min=pl.col("A").rolling_min(window_size=2), ... ) shape: (6, 2) βββββββ¬ββββββββββββββ β A β rolling_min β β --- β --- β β f64 β f64 β βββββββͺββββββββββββββ‘ β 1.0 β null β β 2.0 β 1.0 β β 3.0 β 2.0 β β 4.0 β 3.0 β β 5.0 β 4.0 β β 6.0 β 5.0 β βββββββ΄ββββββββββββββ
Specify weights to multiply the values in the window with:
>>> df.with_columns( ... rolling_min=pl.col("A").rolling_min( ... window_size=2, weights=[0.25, 0.75] ... ), ... ) shape: (6, 2) βββββββ¬ββββββββββββββ β A β rolling_min β β --- β --- β β f64 β f64 β βββββββͺββββββββββββββ‘ β 1.0 β null β β 2.0 β 0.25 β β 3.0 β 0.5 β β 4.0 β 0.75 β β 5.0 β 1.0 β β 6.0 β 1.25 β βββββββ΄ββββββββββββββ
Center the values in the window
>>> df.with_columns( ... rolling_min=pl.col("A").rolling_min(window_size=3, center=True), ... ) shape: (6, 2) βββββββ¬ββββββββββββββ β A β rolling_min β β --- β --- β β f64 β f64 β βββββββͺββββββββββββββ‘ β 1.0 β null β β 2.0 β 1.0 β β 3.0 β 2.0 β β 4.0 β 3.0 β β 5.0 β 4.0 β β 6.0 β null β βββββββ΄ββββββββββββββ
Create a DataFrame with a datetime column and a row number column
>>> from datetime import timedelta, datetime >>> start = datetime(2001, 1, 1) >>> stop = datetime(2001, 1, 2) >>> df_temporal = pl.DataFrame( ... {"date": pl.date_range(start, stop, "1h", eager=True)} ... ).with_row_count() >>> df_temporal shape: (25, 2) ββββββββββ¬ββββββββββββββββββββββ β row_nr β date β β --- β --- β β u32 β datetime[ΞΌs] β ββββββββββͺββββββββββββββββββββββ‘ β 0 β 2001-01-01 00:00:00 β β 1 β 2001-01-01 01:00:00 β β 2 β 2001-01-01 02:00:00 β β 3 β 2001-01-01 03:00:00 β β β¦ β β¦ β β 21 β 2001-01-01 21:00:00 β β 22 β 2001-01-01 22:00:00 β β 23 β 2001-01-01 23:00:00 β β 24 β 2001-01-02 00:00:00 β ββββββββββ΄ββββββββββββββββββββββ >>> df_temporal.with_columns( ... rolling_row_min=pl.col("row_nr").rolling_min( ... window_size="2h", by="date", closed="left" ... ) ... ) shape: (25, 3) ββββββββββ¬ββββββββββββββββββββββ¬ββββββββββββββββββ β row_nr β date β rolling_row_min β β --- β --- β --- β β u32 β datetime[ΞΌs] β u32 β ββββββββββͺββββββββββββββββββββββͺββββββββββββββββββ‘ β 0 β 2001-01-01 00:00:00 β null β β 1 β 2001-01-01 01:00:00 β 0 β β 2 β 2001-01-01 02:00:00 β 0 β β 3 β 2001-01-01 03:00:00 β 1 β β β¦ β β¦ β β¦ β β 21 β 2001-01-01 21:00:00 β 19 β β 22 β 2001-01-01 22:00:00 β 20 β β 23 β 2001-01-01 23:00:00 β 21 β β 24 β 2001-01-02 00:00:00 β 22 β ββββββββββ΄ββββββββββββββββββββββ΄ββββββββββββββββββ
- rolling_quantile(
- quantile: float,
- interpolation: RollingInterpolationMethod = 'nearest',
- window_size: int | timedelta | str = 2,
- weights: list[float] | None = None,
- min_periods: int | None = None,
- *,
- center: bool = False,
- by: str | None = None,
- closed: ClosedInterval = 'left',
Compute a rolling quantile.
If
by
has not been specified (the default), the window at a given row will include the row itself, and the window_size - 1 elements before it.If you pass a
by
column<t_0, t_1, ..., t_n>
, then closed=βleftβ means the windows will be:[t_0 - window_size, t_0)
[t_1 - window_size, t_1)
β¦
[t_n - window_size, t_n)
With closed=βrightβ, the left endpoint is not included and the right endpoint is included.
- Parameters:
- quantile
Quantile between 0.0 and 1.0.
- interpolation{βnearestβ, βhigherβ, βlowerβ, βmidpointβ, βlinearβ}
Interpolation method.
- window_size
The length of the window. Can be a fixed integer size, or a dynamic temporal size indicated by a timedelta or the following string language:
1ns (1 nanosecond)
1us (1 microsecond)
1ms (1 millisecond)
1s (1 second)
1m (1 minute)
1h (1 hour)
1d (1 calendar day)
1w (1 calendar week)
1mo (1 calendar month)
1q (1 calendar quarter)
1y (1 calendar year)
1i (1 index count)
Suffix with β_saturatingβ to indicate that dates too large for their month should saturate at the largest date (e.g. 2022-02-29 -> 2022-02-28) instead of erroring.
By βcalendar dayβ, we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for βcalendar weekβ, βcalendar monthβ, βcalendar quarterβ, and βcalendar yearβ.
If a timedelta or the dynamic string language is used, the by and closed arguments must also be set.
- weights
An optional slice with the same length as the window that determines the relative contribution of each value in a window to the output.
- min_periods
The number of values in the window that should be non-null before computing a result. If None, it will be set equal to window size.
- center
Set the labels at the center of the window
- by
If the window_size is temporal for instance β5hβ or β3sβ, you must set the column that will be used to determine the windows. This column must be of dtype Datetime.
- closed{βleftβ, βrightβ, βbothβ, βnoneβ}
Define which sides of the temporal interval are closed (inclusive); only applicable if by has been set.
Warning
This functionality is experimental and may change without it being considered a breaking change.
Notes
If you want to compute multiple aggregation statistics over the same dynamic window, consider using groupby_rolling this method can cache the window size computation.
Examples
>>> df = pl.DataFrame({"A": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]}) >>> df.with_columns( ... rolling_quantile=pl.col("A").rolling_quantile( ... quantile=0.25, window_size=4 ... ), ... ) shape: (6, 2) βββββββ¬βββββββββββββββββββ β A β rolling_quantile β β --- β --- β β f64 β f64 β βββββββͺβββββββββββββββββββ‘ β 1.0 β null β β 2.0 β null β β 3.0 β null β β 4.0 β 2.0 β β 5.0 β 3.0 β β 6.0 β 4.0 β βββββββ΄βββββββββββββββββββ
Specify weights for the values in each window:
>>> df.with_columns( ... rolling_quantile=pl.col("A").rolling_quantile( ... quantile=0.25, window_size=4, weights=[0.2, 0.4, 0.4, 0.2] ... ), ... ) shape: (6, 2) βββββββ¬βββββββββββββββββββ β A β rolling_quantile β β --- β --- β β f64 β f64 β βββββββͺβββββββββββββββββββ‘ β 1.0 β null β β 2.0 β null β β 3.0 β null β β 4.0 β 2.0 β β 5.0 β 3.0 β β 6.0 β 4.0 β βββββββ΄βββββββββββββββββββ
Specify weights and interpolation method
>>> df.with_columns( ... rolling_quantile=pl.col("A").rolling_quantile( ... quantile=0.25, ... window_size=4, ... weights=[0.2, 0.4, 0.4, 0.2], ... interpolation="linear", ... ), ... ) shape: (6, 2) βββββββ¬βββββββββββββββββββ β A β rolling_quantile β β --- β --- β β f64 β f64 β βββββββͺβββββββββββββββββββ‘ β 1.0 β null β β 2.0 β null β β 3.0 β null β β 4.0 β 1.625 β β 5.0 β 2.625 β β 6.0 β 3.625 β βββββββ΄βββββββββββββββββββ
Center the values in the window
>>> df.with_columns( ... rolling_quantile=pl.col("A").rolling_quantile( ... quantile=0.2, window_size=5, center=True ... ), ... ) shape: (6, 2) βββββββ¬βββββββββββββββββββ β A β rolling_quantile β β --- β --- β β f64 β f64 β βββββββͺβββββββββββββββββββ‘ β 1.0 β null β β 2.0 β null β β 3.0 β 2.0 β β 4.0 β 3.0 β β 5.0 β null β β 6.0 β null β βββββββ΄βββββββββββββββββββ
- rolling_skew(window_size: int, *, bias: bool = True) Self [source]
Compute a rolling skew.
The window at a given row includes the row itself and the window_size - 1 elements before it.
- Parameters:
- window_size
Integer size of the rolling window.
- bias
If False, the calculations are corrected for statistical bias.
Examples
>>> df = pl.DataFrame({"a": [1, 4, 2, 9]}) >>> df.select(pl.col("a").rolling_skew(3)) shape: (4, 1) ββββββββββββ β a β β --- β β f64 β ββββββββββββ‘ β null β β null β β 0.381802 β β 0.47033 β ββββββββββββ
Note how the values match the following:
>>> pl.Series([1, 4, 2]).skew(), pl.Series([4, 2, 9]).skew() (0.38180177416060584, 0.47033046033698594)
- rolling_std(
- window_size: int | timedelta | str,
- weights: list[float] | None = None,
- min_periods: int | None = None,
- *,
- center: bool = False,
- by: str | None = None,
- closed: ClosedInterval = 'left',
- ddof: int = 1,
Compute a rolling standard deviation.
If
by
has not been specified (the default), the window at a given row will include the row itself, and the window_size - 1 elements before it.If you pass a
by
column<t_0, t_1, ..., t_n>
, then closed=βleftβ means the windows will be:[t_0 - window_size, t_0)
[t_1 - window_size, t_1)
β¦
[t_n - window_size, t_n)
With closed=βrightβ, the left endpoint is not included and the right endpoint is included.
- Parameters:
- window_size
The length of the window. Can be a fixed integer size, or a dynamic temporal size indicated by a timedelta or the following string language:
1ns (1 nanosecond)
1us (1 microsecond)
1ms (1 millisecond)
1s (1 second)
1m (1 minute)
1h (1 hour)
1d (1 calendar day)
1w (1 calendar week)
1mo (1 calendar month)
1q (1 calendar quarter)
1y (1 calendar year)
1i (1 index count)
Suffix with β_saturatingβ to indicate that dates too large for their month should saturate at the largest date (e.g. 2022-02-29 -> 2022-02-28) instead of erroring.
By βcalendar dayβ, we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for βcalendar weekβ, βcalendar monthβ, βcalendar quarterβ, and βcalendar yearβ.
If a timedelta or the dynamic string language is used, the by and closed arguments must also be set.
- weights
An optional slice with the same length as the window that determines the relative contribution of each value in a window to the output.
- min_periods
The number of values in the window that should be non-null before computing a result. If None, it will be set equal to window size.
- center
Set the labels at the center of the window
- by
If the window_size is temporal for instance β5hβ or β3sβ, you must set the column that will be used to determine the windows. This column must be of dtype Datetime.
- closed{βleftβ, βrightβ, βbothβ, βnoneβ}
Define which sides of the temporal interval are closed (inclusive); only applicable if by has been set.
- ddof
βDelta Degrees of Freedomβ: The divisor for a length N window is N - ddof
Warning
This functionality is experimental and may change without it being considered a breaking change.
Notes
If you want to compute multiple aggregation statistics over the same dynamic window, consider using groupby_rolling this method can cache the window size computation.
Examples
>>> df = pl.DataFrame({"A": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]}) >>> df.with_columns( ... rolling_std=pl.col("A").rolling_std(window_size=2), ... ) shape: (6, 2) βββββββ¬ββββββββββββββ β A β rolling_std β β --- β --- β β f64 β f64 β βββββββͺββββββββββββββ‘ β 1.0 β null β β 2.0 β 0.707107 β β 3.0 β 0.707107 β β 4.0 β 0.707107 β β 5.0 β 0.707107 β β 6.0 β 0.707107 β βββββββ΄ββββββββββββββ
Specify weights to multiply the values in the window with:
>>> df.with_columns( ... rolling_std=pl.col("A").rolling_std( ... window_size=2, weights=[0.25, 0.75] ... ), ... ) shape: (6, 2) βββββββ¬ββββββββββββββ β A β rolling_std β β --- β --- β β f64 β f64 β βββββββͺββββββββββββββ‘ β 1.0 β null β β 2.0 β 0.433013 β β 3.0 β 0.433013 β β 4.0 β 0.433013 β β 5.0 β 0.433013 β β 6.0 β 0.433013 β βββββββ΄ββββββββββββββ
Center the values in the window
>>> df.with_columns( ... rolling_std=pl.col("A").rolling_std(window_size=3, center=True), ... ) shape: (6, 2) βββββββ¬ββββββββββββββ β A β rolling_std β β --- β --- β β f64 β f64 β βββββββͺββββββββββββββ‘ β 1.0 β null β β 2.0 β 1.0 β β 3.0 β 1.0 β β 4.0 β 1.0 β β 5.0 β 1.0 β β 6.0 β null β βββββββ΄ββββββββββββββ
Create a DataFrame with a datetime column and a row number column
>>> from datetime import timedelta, datetime >>> start = datetime(2001, 1, 1) >>> stop = datetime(2001, 1, 2) >>> df_temporal = pl.DataFrame( ... {"date": pl.date_range(start, stop, "1h", eager=True)} ... ).with_row_count() >>> df_temporal shape: (25, 2) ββββββββββ¬ββββββββββββββββββββββ β row_nr β date β β --- β --- β β u32 β datetime[ΞΌs] β ββββββββββͺββββββββββββββββββββββ‘ β 0 β 2001-01-01 00:00:00 β β 1 β 2001-01-01 01:00:00 β β 2 β 2001-01-01 02:00:00 β β 3 β 2001-01-01 03:00:00 β β β¦ β β¦ β β 21 β 2001-01-01 21:00:00 β β 22 β 2001-01-01 22:00:00 β β 23 β 2001-01-01 23:00:00 β β 24 β 2001-01-02 00:00:00 β ββββββββββ΄ββββββββββββββββββββββ
Compute the rolling std with the default left closure of temporal windows
>>> df_temporal.with_columns( ... rolling_row_std=pl.col("row_nr").rolling_std( ... window_size="2h", by="date", closed="left" ... ) ... ) shape: (25, 3) ββββββββββ¬ββββββββββββββββββββββ¬ββββββββββββββββββ β row_nr β date β rolling_row_std β β --- β --- β --- β β u32 β datetime[ΞΌs] β f64 β ββββββββββͺββββββββββββββββββββββͺββββββββββββββββββ‘ β 0 β 2001-01-01 00:00:00 β null β β 1 β 2001-01-01 01:00:00 β 0.0 β β 2 β 2001-01-01 02:00:00 β 0.707107 β β 3 β 2001-01-01 03:00:00 β 0.707107 β β β¦ β β¦ β β¦ β β 21 β 2001-01-01 21:00:00 β 0.707107 β β 22 β 2001-01-01 22:00:00 β 0.707107 β β 23 β 2001-01-01 23:00:00 β 0.707107 β β 24 β 2001-01-02 00:00:00 β 0.707107 β ββββββββββ΄ββββββββββββββββββββββ΄ββββββββββββββββββ
Compute the rolling std with the closure of windows on both sides
>>> df_temporal.with_columns( ... rolling_row_std=pl.col("row_nr").rolling_std( ... window_size="2h", by="date", closed="both" ... ) ... ) shape: (25, 3) ββββββββββ¬ββββββββββββββββββββββ¬ββββββββββββββββββ β row_nr β date β rolling_row_std β β --- β --- β --- β β u32 β datetime[ΞΌs] β f64 β ββββββββββͺββββββββββββββββββββββͺββββββββββββββββββ‘ β 0 β 2001-01-01 00:00:00 β 0.0 β β 1 β 2001-01-01 01:00:00 β 0.707107 β β 2 β 2001-01-01 02:00:00 β 1.0 β β 3 β 2001-01-01 03:00:00 β 1.0 β β β¦ β β¦ β β¦ β β 21 β 2001-01-01 21:00:00 β 1.0 β β 22 β 2001-01-01 22:00:00 β 1.0 β β 23 β 2001-01-01 23:00:00 β 1.0 β β 24 β 2001-01-02 00:00:00 β 1.0 β ββββββββββ΄ββββββββββββββββββββββ΄ββββββββββββββββββ
- rolling_sum(
- window_size: int | timedelta | str,
- weights: list[float] | None = None,
- min_periods: int | None = None,
- *,
- center: bool = False,
- by: str | None = None,
- closed: ClosedInterval = 'left',
Apply a rolling sum (moving sum) over the values in this array.
A window of length window_size will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by the weight vector. The resulting values will be aggregated to their sum.
If
by
has not been specified (the default), the window at a given row will include the row itself, and the window_size - 1 elements before it.If you pass a
by
column<t_0, t_1, ..., t_n>
, then closed=βleftβ means the windows will be:[t_0 - window_size, t_0)
[t_1 - window_size, t_1)
β¦
[t_n - window_size, t_n)
With closed=βrightβ, the left endpoint is not included and the right endpoint is included.
- Parameters:
- window_size
The length of the window. Can be a fixed integer size, or a dynamic temporal size indicated by a timedelta or the following string language:
1ns (1 nanosecond)
1us (1 microsecond)
1ms (1 millisecond)
1s (1 second)
1m (1 minute)
1h (1 hour)
1d (1 calendar day)
1w (1 calendar week)
1mo (1 calendar month)
1q (1 calendar quarter)
1y (1 calendar year)
1i (1 index count)
Suffix with β_saturatingβ to indicate that dates too large for their month should saturate at the largest date (e.g. 2022-02-29 -> 2022-02-28) instead of erroring.
By βcalendar dayβ, we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for βcalendar weekβ, βcalendar monthβ, βcalendar quarterβ, and βcalendar yearβ.
If a timedelta or the dynamic string language is used, the by and closed arguments must also be set.
- weights
An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.
- min_periods
The number of values in the window that should be non-null before computing a result. If None, it will be set equal to window size.
- center
Set the labels at the center of the window
- by
If the window_size is temporal for instance β5hβ or β3sβ, you must set the column that will be used to determine the windows. This column must of dtype {Date, Datetime}
- closed{βleftβ, βrightβ, βbothβ, βnoneβ}
Define which sides of the temporal interval are closed (inclusive); only applicable if by has been set.
Warning
This functionality is experimental and may change without it being considered a breaking change.
Notes
If you want to compute multiple aggregation statistics over the same dynamic window, consider using groupby_rolling this method can cache the window size computation.
Examples
>>> df = pl.DataFrame({"A": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]}) >>> df.with_columns( ... rolling_sum=pl.col("A").rolling_sum(window_size=2), ... ) shape: (6, 2) βββββββ¬ββββββββββββββ β A β rolling_sum β β --- β --- β β f64 β f64 β βββββββͺββββββββββββββ‘ β 1.0 β null β β 2.0 β 3.0 β β 3.0 β 5.0 β β 4.0 β 7.0 β β 5.0 β 9.0 β β 6.0 β 11.0 β βββββββ΄ββββββββββββββ
Specify weights to multiply the values in the window with:
>>> df.with_columns( ... rolling_sum=pl.col("A").rolling_sum( ... window_size=2, weights=[0.25, 0.75] ... ), ... ) shape: (6, 2) βββββββ¬ββββββββββββββ β A β rolling_sum β β --- β --- β β f64 β f64 β βββββββͺββββββββββββββ‘ β 1.0 β null β β 2.0 β 1.75 β β 3.0 β 2.75 β β 4.0 β 3.75 β β 5.0 β 4.75 β β 6.0 β 5.75 β βββββββ΄ββββββββββββββ
Center the values in the window
>>> df.with_columns( ... rolling_sum=pl.col("A").rolling_sum(window_size=3, center=True), ... ) shape: (6, 2) βββββββ¬ββββββββββββββ β A β rolling_sum β β --- β --- β β f64 β f64 β βββββββͺββββββββββββββ‘ β 1.0 β null β β 2.0 β 6.0 β β 3.0 β 9.0 β β 4.0 β 12.0 β β 5.0 β 15.0 β β 6.0 β null β βββββββ΄ββββββββββββββ
Create a DataFrame with a datetime column and a row number column
>>> from datetime import timedelta, datetime >>> start = datetime(2001, 1, 1) >>> stop = datetime(2001, 1, 2) >>> df_temporal = pl.DataFrame( ... {"date": pl.date_range(start, stop, "1h", eager=True)} ... ).with_row_count() >>> df_temporal shape: (25, 2) ββββββββββ¬ββββββββββββββββββββββ β row_nr β date β β --- β --- β β u32 β datetime[ΞΌs] β ββββββββββͺββββββββββββββββββββββ‘ β 0 β 2001-01-01 00:00:00 β β 1 β 2001-01-01 01:00:00 β β 2 β 2001-01-01 02:00:00 β β 3 β 2001-01-01 03:00:00 β β β¦ β β¦ β β 21 β 2001-01-01 21:00:00 β β 22 β 2001-01-01 22:00:00 β β 23 β 2001-01-01 23:00:00 β β 24 β 2001-01-02 00:00:00 β ββββββββββ΄ββββββββββββββββββββββ
Compute the rolling sum with the default left closure of temporal windows
>>> df_temporal.with_columns( ... rolling_row_sum=pl.col("row_nr").rolling_sum( ... window_size="2h", by="date", closed="left" ... ) ... ) shape: (25, 3) ββββββββββ¬ββββββββββββββββββββββ¬ββββββββββββββββββ β row_nr β date β rolling_row_sum β β --- β --- β --- β β u32 β datetime[ΞΌs] β u32 β ββββββββββͺββββββββββββββββββββββͺββββββββββββββββββ‘ β 0 β 2001-01-01 00:00:00 β null β β 1 β 2001-01-01 01:00:00 β 0 β β 2 β 2001-01-01 02:00:00 β 1 β β 3 β 2001-01-01 03:00:00 β 3 β β β¦ β β¦ β β¦ β β 21 β 2001-01-01 21:00:00 β 39 β β 22 β 2001-01-01 22:00:00 β 41 β β 23 β 2001-01-01 23:00:00 β 43 β β 24 β 2001-01-02 00:00:00 β 45 β ββββββββββ΄ββββββββββββββββββββββ΄ββββββββββββββββββ
Compute the rolling sum with the closure of windows on both sides
>>> df_temporal.with_columns( ... rolling_row_sum=pl.col("row_nr").rolling_sum( ... window_size="2h", by="date", closed="both" ... ) ... ) shape: (25, 3) ββββββββββ¬ββββββββββββββββββββββ¬ββββββββββββββββββ β row_nr β date β rolling_row_sum β β --- β --- β --- β β u32 β datetime[ΞΌs] β u32 β ββββββββββͺββββββββββββββββββββββͺββββββββββββββββββ‘ β 0 β 2001-01-01 00:00:00 β 0 β β 1 β 2001-01-01 01:00:00 β 1 β β 2 β 2001-01-01 02:00:00 β 3 β β 3 β 2001-01-01 03:00:00 β 6 β β β¦ β β¦ β β¦ β β 21 β 2001-01-01 21:00:00 β 60 β β 22 β 2001-01-01 22:00:00 β 63 β β 23 β 2001-01-01 23:00:00 β 66 β β 24 β 2001-01-02 00:00:00 β 69 β ββββββββββ΄ββββββββββββββββββββββ΄ββββββββββββββββββ
- rolling_var(
- window_size: int | timedelta | str,
- weights: list[float] | None = None,
- min_periods: int | None = None,
- *,
- center: bool = False,
- by: str | None = None,
- closed: ClosedInterval = 'left',
- ddof: int = 1,
Compute a rolling variance.
If
by
has not been specified (the default), the window at a given row will include the row itself, and the window_size - 1 elements before it.If you pass a
by
column<t_0, t_1, ..., t_n>
, then closed=βleftβ means the windows will be:[t_0 - window_size, t_0)
[t_1 - window_size, t_1)
β¦
[t_n - window_size, t_n)
With closed=βrightβ, the left endpoint is not included and the right endpoint is included.
- Parameters:
- window_size
The length of the window. Can be a fixed integer size, or a dynamic temporal size indicated by a timedelta or the following string language:
1ns (1 nanosecond)
1us (1 microsecond)
1ms (1 millisecond)
1s (1 second)
1m (1 minute)
1h (1 hour)
1d (1 calendar day)
1w (1 calendar week)
1mo (1 calendar month)
1q (1 calendar quarter)
1y (1 calendar year)
1i (1 index count)
Suffix with β_saturatingβ to indicate that dates too large for their month should saturate at the largest date (e.g. 2022-02-29 -> 2022-02-28) instead of erroring.
By βcalendar dayβ, we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for βcalendar weekβ, βcalendar monthβ, βcalendar quarterβ, and βcalendar yearβ.
If a timedelta or the dynamic string language is used, the by and closed arguments must also be set.
- weights
An optional slice with the same length as the window that determines the relative contribution of each value in a window to the output.
- min_periods
The number of values in the window that should be non-null before computing a result. If None, it will be set equal to window size.
- center
Set the labels at the center of the window
- by
If the window_size is temporal for instance β5hβ or β3sβ, you must set the column that will be used to determine the windows. This column must be of dtype Datetime.
- closed{βleftβ, βrightβ, βbothβ, βnoneβ}
Define which sides of the temporal interval are closed (inclusive); only applicable if by has been set.
- ddof
βDelta Degrees of Freedomβ: The divisor for a length N window is N - ddof
Warning
This functionality is experimental and may change without it being considered a breaking change.
Notes
If you want to compute multiple aggregation statistics over the same dynamic window, consider using groupby_rolling this method can cache the window size computation.
Examples
>>> df = pl.DataFrame({"A": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]}) >>> df.with_columns( ... rolling_var=pl.col("A").rolling_var(window_size=2), ... ) shape: (6, 2) βββββββ¬ββββββββββββββ β A β rolling_var β β --- β --- β β f64 β f64 β βββββββͺββββββββββββββ‘ β 1.0 β null β β 2.0 β 0.5 β β 3.0 β 0.5 β β 4.0 β 0.5 β β 5.0 β 0.5 β β 6.0 β 0.5 β βββββββ΄ββββββββββββββ
Specify weights to multiply the values in the window with:
>>> df.with_columns( ... rolling_var=pl.col("A").rolling_var( ... window_size=2, weights=[0.25, 0.75] ... ), ... ) shape: (6, 2) βββββββ¬ββββββββββββββ β A β rolling_var β β --- β --- β β f64 β f64 β βββββββͺββββββββββββββ‘ β 1.0 β null β β 2.0 β 0.1875 β β 3.0 β 0.1875 β β 4.0 β 0.1875 β β 5.0 β 0.1875 β β 6.0 β 0.1875 β βββββββ΄ββββββββββββββ
Center the values in the window
>>> df.with_columns( ... rolling_var=pl.col("A").rolling_var(window_size=3, center=True), ... ) shape: (6, 2) βββββββ¬ββββββββββββββ β A β rolling_var β β --- β --- β β f64 β f64 β βββββββͺββββββββββββββ‘ β 1.0 β null β β 2.0 β 1.0 β β 3.0 β 1.0 β β 4.0 β 1.0 β β 5.0 β 1.0 β β 6.0 β null β βββββββ΄ββββββββββββββ
Create a DataFrame with a datetime column and a row number column
>>> from datetime import timedelta, datetime >>> start = datetime(2001, 1, 1) >>> stop = datetime(2001, 1, 2) >>> df_temporal = pl.DataFrame( ... {"date": pl.date_range(start, stop, "1h", eager=True)} ... ).with_row_count() >>> df_temporal shape: (25, 2) ββββββββββ¬ββββββββββββββββββββββ β row_nr β date β β --- β --- β β u32 β datetime[ΞΌs] β ββββββββββͺββββββββββββββββββββββ‘ β 0 β 2001-01-01 00:00:00 β β 1 β 2001-01-01 01:00:00 β β 2 β 2001-01-01 02:00:00 β β 3 β 2001-01-01 03:00:00 β β β¦ β β¦ β β 21 β 2001-01-01 21:00:00 β β 22 β 2001-01-01 22:00:00 β β 23 β 2001-01-01 23:00:00 β β 24 β 2001-01-02 00:00:00 β ββββββββββ΄ββββββββββββββββββββββ
Compute the rolling var with the default left closure of temporal windows
>>> df_temporal.with_columns( ... rolling_row_var=pl.col("row_nr").rolling_var( ... window_size="2h", by="date", closed="left" ... ) ... ) shape: (25, 3) ββββββββββ¬ββββββββββββββββββββββ¬ββββββββββββββββββ β row_nr β date β rolling_row_var β β --- β --- β --- β β u32 β datetime[ΞΌs] β f64 β ββββββββββͺββββββββββββββββββββββͺββββββββββββββββββ‘ β 0 β 2001-01-01 00:00:00 β null β β 1 β 2001-01-01 01:00:00 β 0.0 β β 2 β 2001-01-01 02:00:00 β 0.5 β β 3 β 2001-01-01 03:00:00 β 0.5 β β β¦ β β¦ β β¦ β β 21 β 2001-01-01 21:00:00 β 0.5 β β 22 β 2001-01-01 22:00:00 β 0.5 β β 23 β 2001-01-01 23:00:00 β 0.5 β β 24 β 2001-01-02 00:00:00 β 0.5 β ββββββββββ΄ββββββββββββββββββββββ΄ββββββββββββββββββ
Compute the rolling var with the closure of windows on both sides
>>> df_temporal.with_columns( ... rolling_row_var=pl.col("row_nr").rolling_var( ... window_size="2h", by="date", closed="both" ... ) ... ) shape: (25, 3) ββββββββββ¬ββββββββββββββββββββββ¬ββββββββββββββββββ β row_nr β date β rolling_row_var β β --- β --- β --- β β u32 β datetime[ΞΌs] β f64 β ββββββββββͺββββββββββββββββββββββͺββββββββββββββββββ‘ β 0 β 2001-01-01 00:00:00 β 0.0 β β 1 β 2001-01-01 01:00:00 β 0.5 β β 2 β 2001-01-01 02:00:00 β 1.0 β β 3 β 2001-01-01 03:00:00 β 1.0 β β β¦ β β¦ β β¦ β β 21 β 2001-01-01 21:00:00 β 1.0 β β 22 β 2001-01-01 22:00:00 β 1.0 β β 23 β 2001-01-01 23:00:00 β 1.0 β β 24 β 2001-01-02 00:00:00 β 1.0 β ββββββββββ΄ββββββββββββββββββββββ΄ββββββββββββββββββ
- round(decimals: int = 0) Self [source]
Round underlying floating point data by decimals digits.
- Parameters:
- decimals
Number of decimals to round by.
Examples
>>> df = pl.DataFrame({"a": [0.33, 0.52, 1.02, 1.17]}) >>> df.select(pl.col("a").round(1)) shape: (4, 1) βββββββ β a β β --- β β f64 β βββββββ‘ β 0.3 β β 0.5 β β 1.0 β β 1.2 β βββββββ
- sample(
- n: int | None = None,
- *,
- fraction: float | None = None,
- with_replacement: bool = False,
- shuffle: bool = False,
- seed: int | None = None,
- fixed_seed: bool = False,
Sample from this expression.
- Parameters:
- n
Number of items to return. Cannot be used with fraction. Defaults to 1 if fraction is None.
- fraction
Fraction of items to return. Cannot be used with n.
- with_replacement
Allow values to be sampled more than once.
- shuffle
Shuffle the order of sampled data points.
- seed
Seed for the random number generator. If set to None (default), a random seed is generated using the
random
module.- fixed_seed
If True, The seed will not be incremented between draws. This can make output predictable because draw ordering can change due to threads being scheduled in a different order.
Examples
>>> df = pl.DataFrame({"a": [1, 2, 3]}) >>> df.select(pl.col("a").sample(fraction=1.0, with_replacement=True, seed=1)) shape: (3, 1) βββββββ β a β β --- β β i64 β βββββββ‘ β 3 β β 1 β β 1 β βββββββ
- search_sorted( ) Self [source]
Find indices where elements should be inserted to maintain order.
\[a[i-1] < v <= a[i]\]- Parameters:
- element
Expression or scalar value.
- side{βanyβ, βleftβ, βrightβ}
If βanyβ, the index of the first suitable location found is given. If βleftβ, the index of the leftmost suitable location found is given. If βrightβ, return the rightmost suitable location found is given.
Examples
>>> df = pl.DataFrame( ... { ... "values": [1, 2, 3, 5], ... } ... ) >>> df.select( ... [ ... pl.col("values").search_sorted(0).alias("zero"), ... pl.col("values").search_sorted(3).alias("three"), ... pl.col("values").search_sorted(6).alias("six"), ... ] ... ) shape: (1, 3) ββββββββ¬ββββββββ¬ββββββ β zero β three β six β β --- β --- β --- β β u32 β u32 β u32 β ββββββββͺββββββββͺββββββ‘ β 0 β 2 β 4 β ββββββββ΄ββββββββ΄ββββββ
- set_sorted(*, descending: bool = False) Self [source]
Flags the expression as βsortedβ.
Enables downstream code to user fast paths for sorted arrays.
- Parameters:
- descending
Whether the Series order is descending.
Warning
This can lead to incorrect results if this Series is not sorted!! Use with care!
Examples
>>> df = pl.DataFrame({"values": [1, 2, 3]}) >>> df.select(pl.col("values").set_sorted().max()) shape: (1, 1) ββββββββββ β values β β --- β β i64 β ββββββββββ‘ β 3 β ββββββββββ
- shift(periods: int = 1) Self [source]
Shift the values by a given period.
- Parameters:
- periods
Number of places to shift (may be negative).
Examples
>>> df = pl.DataFrame({"foo": [1, 2, 3, 4]}) >>> df.select(pl.col("foo").shift(1)) shape: (4, 1) ββββββββ β foo β β --- β β i64 β ββββββββ‘ β null β β 1 β β 2 β β 3 β ββββββββ
- shift_and_fill(fill_value: IntoExpr, *, periods: int = 1) Self [source]
Shift the values by a given period and fill the resulting null values.
- Parameters:
- fill_value
Fill None values with the result of this expression.
- periods
Number of places to shift (may be negative).
Examples
>>> df = pl.DataFrame({"foo": [1, 2, 3, 4]}) >>> df.select(pl.col("foo").shift_and_fill("a", periods=1)) shape: (4, 1) βββββββ β foo β β --- β β str β βββββββ‘ β a β β 1 β β 2 β β 3 β βββββββ
- shrink_dtype() Self [source]
Shrink numeric columns to the minimal required datatype.
Shrink to the dtype needed to fit the extrema of this [Series]. This can be used to reduce memory pressure.
Examples
>>> pl.DataFrame( ... { ... "a": [1, 2, 3], ... "b": [1, 2, 2 << 32], ... "c": [-1, 2, 1 << 30], ... "d": [-112, 2, 112], ... "e": [-112, 2, 129], ... "f": ["a", "b", "c"], ... "g": [0.1, 1.32, 0.12], ... "h": [True, None, False], ... } ... ).select(pl.all().shrink_dtype()) shape: (3, 8) βββββββ¬βββββββββββββ¬βββββββββββββ¬βββββββ¬βββββββ¬ββββββ¬βββββββ¬ββββββββ β a β b β c β d β e β f β g β h β β --- β --- β --- β --- β --- β --- β --- β --- β β i8 β i64 β i32 β i8 β i16 β str β f32 β bool β βββββββͺβββββββββββββͺβββββββββββββͺβββββββͺβββββββͺββββββͺβββββββͺββββββββ‘ β 1 β 1 β -1 β -112 β -112 β a β 0.1 β true β β 2 β 2 β 2 β 2 β 2 β b β 1.32 β null β β 3 β 8589934592 β 1073741824 β 112 β 129 β c β 0.12 β false β βββββββ΄βββββββββββββ΄βββββββββββββ΄βββββββ΄βββββββ΄ββββββ΄βββββββ΄ββββββββ
- shuffle(seed: int | None = None, fixed_seed: bool = False) Self [source]
Shuffle the contents of this expression.
- Parameters:
- seed
Seed for the random number generator. If set to None (default), a random seed is generated using the
random
module.- fixed_seed
If True, The seed will not be incremented between draws. This can make output predictable because draw ordering can change due to threads being scheduled in a different order.
Examples
>>> df = pl.DataFrame({"a": [1, 2, 3]}) >>> df.select(pl.col("a").shuffle(seed=1)) shape: (3, 1) βββββββ β a β β --- β β i64 β βββββββ‘ β 2 β β 1 β β 3 β βββββββ
- sign() Self [source]
Compute the element-wise indication of the sign.
The returned values can be -1, 0, or 1:
-1 if x < 0.
0 if x == 0.
1 if x > 0.
(null values are preserved as-is).
Examples
>>> df = pl.DataFrame({"a": [-9.0, -0.0, 0.0, 4.0, None]}) >>> df.select(pl.col("a").sign()) shape: (5, 1) ββββββββ β a β β --- β β i64 β ββββββββ‘ β -1 β β 0 β β 0 β β 1 β β null β ββββββββ
- sin() Self [source]
Compute the element-wise value for the sine.
- Returns:
- Expr
Expression of data type
Float64
.
Examples
>>> df = pl.DataFrame({"a": [0.0]}) >>> df.select(pl.col("a").sin()) shape: (1, 1) βββββββ β a β β --- β β f64 β βββββββ‘ β 0.0 β βββββββ
- sinh() Self [source]
Compute the element-wise value for the hyperbolic sine.
- Returns:
- Expr
Expression of data type
Float64
.
Examples
>>> df = pl.DataFrame({"a": [1.0]}) >>> df.select(pl.col("a").sinh()) shape: (1, 1) ββββββββββββ β a β β --- β β f64 β ββββββββββββ‘ β 1.175201 β ββββββββββββ
- skew(*, bias: bool = True) Self [source]
Compute the sample skewness of a data set.
For normally distributed data, the skewness should be about zero. For unimodal continuous distributions, a skewness value greater than zero means that there is more weight in the right tail of the distribution. The function skewtest can be used to determine if the skewness value is close enough to zero, statistically speaking.
See scipy.stats for more information.
- Parameters:
- biasbool, optional
If False, the calculations are corrected for statistical bias.
Notes
The sample skewness is computed as the Fisher-Pearson coefficient of skewness, i.e.
\[g_1=\frac{m_3}{m_2^{3/2}}\]where
\[m_i=\frac{1}{N}\sum_{n=1}^N(x[n]-\bar{x})^i\]is the biased sample \(i\texttt{th}\) central moment, and \(\bar{x}\) is the sample mean. If
bias
is False, the calculations are corrected for bias and the value computed is the adjusted Fisher-Pearson standardized moment coefficient, i.e.\[G_1 = \frac{k_3}{k_2^{3/2}} = \frac{\sqrt{N(N-1)}}{N-2}\frac{m_3}{m_2^{3/2}}\]Examples
>>> df = pl.DataFrame({"a": [1, 2, 3, 2, 1]}) >>> df.select(pl.col("a").skew()) shape: (1, 1) ββββββββββββ β a β β --- β β f64 β ββββββββββββ‘ β 0.343622 β ββββββββββββ
- slice(offset: int | Expr, length: int | Expr | None = None) Self [source]
Get a slice of this expression.
- Parameters:
- offset
Start index. Negative indexing is supported.
- length
Length of the slice. If set to
None
, all rows starting at the offset will be selected.
Examples
>>> df = pl.DataFrame( ... { ... "a": [8, 9, 10, 11], ... "b": [None, 4, 4, 4], ... } ... ) >>> df.select(pl.all().slice(1, 2)) shape: (2, 2) βββββββ¬ββββββ β a β b β β --- β --- β β i64 β i64 β βββββββͺββββββ‘ β 9 β 4 β β 10 β 4 β βββββββ΄ββββββ
- sort(*, descending: bool = False, nulls_last: bool = False) Self [source]
Sort this column.
When used in a projection/selection context, the whole column is sorted. When used in a groupby context, the groups are sorted.
- Parameters:
- descending
Sort in descending order.
- nulls_last
Place null values last.
Examples
>>> df = pl.DataFrame( ... { ... "a": [1, None, 3, 2], ... } ... ) >>> df.select(pl.col("a").sort()) shape: (4, 1) ββββββββ β a β β --- β β i64 β ββββββββ‘ β null β β 1 β β 2 β β 3 β ββββββββ >>> df.select(pl.col("a").sort(descending=True)) shape: (4, 1) ββββββββ β a β β --- β β i64 β ββββββββ‘ β null β β 3 β β 2 β β 1 β ββββββββ >>> df.select(pl.col("a").sort(nulls_last=True)) shape: (4, 1) ββββββββ β a β β --- β β i64 β ββββββββ‘ β 1 β β 2 β β 3 β β null β ββββββββ
When sorting in a groupby context, the groups are sorted.
>>> df = pl.DataFrame( ... { ... "group": ["one", "one", "one", "two", "two", "two"], ... "value": [1, 98, 2, 3, 99, 4], ... } ... ) >>> df.groupby("group").agg(pl.col("value").sort()) shape: (2, 2) βββββββββ¬βββββββββββββ β group β value β β --- β --- β β str β list[i64] β βββββββββͺβββββββββββββ‘ β two β [3, 4, 99] β β one β [1, 2, 98] β βββββββββ΄βββββββββββββ
- sort_by( ) Self [source]
Sort this column by the ordering of other columns.
When used in a projection/selection context, the whole column is sorted. When used in a groupby context, the groups are sorted.
- Parameters:
- by
Column(s) to sort by. Accepts expression input. Strings are parsed as column names.
- *more_by
Additional columns to sort by, specified as positional arguments.
- descending
Sort in descending order. When sorting by multiple columns, can be specified per column by passing a sequence of booleans.
Examples
Pass a single column name to sort by that column.
>>> df = pl.DataFrame( ... { ... "group": ["a", "a", "b", "b"], ... "value1": [1, 3, 4, 2], ... "value2": [8, 7, 6, 5], ... } ... ) >>> df.select(pl.col("group").sort_by("value1")) shape: (4, 1) βββββββββ β group β β --- β β str β βββββββββ‘ β a β β b β β a β β b β βββββββββ
Sorting by expressions is also supported.
>>> df.select(pl.col("group").sort_by(pl.col("value1") + pl.col("value2"))) shape: (4, 1) βββββββββ β group β β --- β β str β βββββββββ‘ β b β β a β β a β β b β βββββββββ
Sort by multiple columns by passing a list of columns.
>>> df.select(pl.col("group").sort_by(["value1", "value2"], descending=True)) shape: (4, 1) βββββββββ β group β β --- β β str β βββββββββ‘ β b β β a β β b β β a β βββββββββ
Or use positional arguments to sort by multiple columns in the same way.
>>> df.select(pl.col("group").sort_by("value1", "value2")) shape: (4, 1) βββββββββ β group β β --- β β str β βββββββββ‘ β a β β b β β a β β b β βββββββββ
When sorting in a groupby context, the groups are sorted.
>>> df.groupby("group").agg( ... pl.col("value1").sort_by("value2") ... ) shape: (2, 2) βββββββββ¬ββββββββββββ β group β value1 β β --- β --- β β str β list[i64] β βββββββββͺββββββββββββ‘ β a β [3, 1] β β b β [2, 4] β βββββββββ΄ββββββββββββ
Take a single row from each group where a column attains its minimal value within that group.
>>> df.groupby("group").agg( ... pl.all().sort_by("value2").first() ... ) shape: (2, 3) βββββββββ¬βββββββββ¬βββββββββ β group β value1 β value2 | β --- β --- β --- β β str β i64 β i64 | βββββββββͺβββββββββͺβββββββββ‘ β a β 3 β 7 | β b β 2 β 5 | βββββββββ΄βββββββββ΄βββββββββ
- sqrt() Self [source]
Compute the square root of the elements.
Examples
>>> df = pl.DataFrame({"values": [1.0, 2.0, 4.0]}) >>> df.select(pl.col("values").sqrt()) shape: (3, 1) ββββββββββββ β values β β --- β β f64 β ββββββββββββ‘ β 1.0 β β 1.414214 β β 2.0 β ββββββββββββ
- std(ddof: int = 1) Self [source]
Get standard deviation.
- Parameters:
- ddof
βDelta Degrees of Freedomβ: the divisor used in the calculation is N - ddof, where N represents the number of elements. By default ddof is 1.
Examples
>>> df = pl.DataFrame({"a": [-1, 0, 1]}) >>> df.select(pl.col("a").std()) shape: (1, 1) βββββββ β a β β --- β β f64 β βββββββ‘ β 1.0 β βββββββ
- sub(other: Any) Self [source]
Method equivalent of subtraction operator
expr - other
.- Parameters:
- other
Numeric literal or expression value.
Examples
>>> df = pl.DataFrame({"x": [0, 1, 2, 3, 4]}) >>> df.with_columns( ... pl.col("x").sub(2).alias("x-2"), ... pl.col("x").sub(pl.col("x").cumsum()).alias("x-expr"), ... ) shape: (5, 3) βββββββ¬ββββββ¬βββββββββ β x β x-2 β x-expr β β --- β --- β --- β β i64 β i64 β i64 β βββββββͺββββββͺβββββββββ‘ β 0 β -2 β 0 β β 1 β -1 β 0 β β 2 β 0 β -1 β β 3 β 1 β -3 β β 4 β 2 β -6 β βββββββ΄ββββββ΄βββββββββ
- suffix(suffix: str) Self [source]
Add a suffix to the root column name of the expression.
- Parameters:
- suffix
Suffix to add to the root column name.
See also
Notes
This will undo any previous renaming operations on the expression.
Due to implementation constraints, this method can only be called as the last expression in a chain.
Examples
>>> df = pl.DataFrame( ... { ... "a": [1, 2, 3], ... "b": ["x", "y", "z"], ... } ... ) >>> df.with_columns(pl.all().reverse().suffix("_reverse")) shape: (3, 4) βββββββ¬ββββββ¬ββββββββββββ¬ββββββββββββ β a β b β a_reverse β b_reverse β β --- β --- β --- β --- β β i64 β str β i64 β str β βββββββͺββββββͺββββββββββββͺββββββββββββ‘ β 1 β x β 3 β z β β 2 β y β 2 β y β β 3 β z β 1 β x β βββββββ΄ββββββ΄ββββββββββββ΄ββββββββββββ
- sum() Self [source]
Get sum value.
Notes
Dtypes in {Int8, UInt8, Int16, UInt16} are cast to Int64 before summing to prevent overflow issues.
Examples
>>> df = pl.DataFrame({"a": [-1, 0, 1]}) >>> df.select(pl.col("a").sum()) shape: (1, 1) βββββββ β a β β --- β β i64 β βββββββ‘ β 0 β βββββββ
- tail(n: int | Expr = 10) Self [source]
Get the last n rows.
- Parameters:
- n
Number of rows to return.
Examples
>>> df = pl.DataFrame({"foo": [1, 2, 3, 4, 5, 6, 7]}) >>> df.tail(3) shape: (3, 1) βββββββ β foo β β --- β β i64 β βββββββ‘ β 5 β β 6 β β 7 β βββββββ
- take(indices: int | list[int] | Expr | Series | np.ndarray[Any, Any]) Self [source]
Take values by index.
- Parameters:
- indices
An expression that leads to a UInt32 dtyped Series.
- Returns:
- Expr
Expression of the same data type.
Examples
>>> df = pl.DataFrame( ... { ... "group": [ ... "one", ... "one", ... "one", ... "two", ... "two", ... "two", ... ], ... "value": [1, 98, 2, 3, 99, 4], ... } ... ) >>> df.groupby("group", maintain_order=True).agg(pl.col("value").take(1)) shape: (2, 2) βββββββββ¬ββββββββ β group β value β β --- β --- β β str β i64 β βββββββββͺββββββββ‘ β one β 98 β β two β 99 β βββββββββ΄ββββββββ
- take_every(n: int) Self [source]
Take every nth value in the Series and return as a new Series.
Examples
>>> df = pl.DataFrame({"foo": [1, 2, 3, 4, 5, 6, 7, 8, 9]}) >>> df.select(pl.col("foo").take_every(3)) shape: (3, 1) βββββββ β foo β β --- β β i64 β βββββββ‘ β 1 β β 4 β β 7 β βββββββ
- tan() Self [source]
Compute the element-wise value for the tangent.
- Returns:
- Expr
Expression of data type
Float64
.
Examples
>>> df = pl.DataFrame({"a": [1.0]}) >>> df.select(pl.col("a").tan().round(2)) shape: (1, 1) ββββββββ β a β β --- β β f64 β ββββββββ‘ β 1.56 β ββββββββ
- tanh() Self [source]
Compute the element-wise value for the hyperbolic tangent.
- Returns:
- Expr
Expression of data type
Float64
.
Examples
>>> df = pl.DataFrame({"a": [1.0]}) >>> df.select(pl.col("a").tanh()) shape: (1, 1) ββββββββββββ β a β β --- β β f64 β ββββββββββββ‘ β 0.761594 β ββββββββββββ
- to_physical() Self [source]
Cast to physical representation of the logical dtype.
polars.datatypes.Date()
->polars.datatypes.Int32()
polars.datatypes.Datetime()
->polars.datatypes.Int64()
polars.datatypes.Time()
->polars.datatypes.Int64()
polars.datatypes.Duration()
->polars.datatypes.Int64()
polars.datatypes.Categorical()
->polars.datatypes.UInt32()
List(inner)
->List(physical of inner)
Other data types will be left unchanged.
Examples
Replicating the pandas pd.factorize function.
>>> pl.DataFrame({"vals": ["a", "x", None, "a"]}).with_columns( ... [ ... pl.col("vals").cast(pl.Categorical), ... pl.col("vals") ... .cast(pl.Categorical) ... .to_physical() ... .alias("vals_physical"), ... ] ... ) shape: (4, 2) ββββββββ¬ββββββββββββββββ β vals β vals_physical β β --- β --- β β cat β u32 β ββββββββͺββββββββββββββββ‘ β a β 0 β β x β 1 β β null β null β β a β 0 β ββββββββ΄ββββββββββββββββ
- top_k(k: int = 5) Self [source]
Return the k largest elements.
This has time complexity:
\[\begin{split}O(n + k \\log{}n - \frac{k}{2})\end{split}\]- Parameters:
- k
Number of elements to return.
See also
Examples
>>> df = pl.DataFrame( ... { ... "value": [1, 98, 2, 3, 99, 4], ... } ... ) >>> df.select( ... [ ... pl.col("value").top_k().alias("top_k"), ... pl.col("value").bottom_k().alias("bottom_k"), ... ] ... ) shape: (5, 2) βββββββββ¬βββββββββββ β top_k β bottom_k β β --- β --- β β i64 β i64 β βββββββββͺβββββββββββ‘ β 99 β 1 β β 98 β 2 β β 4 β 3 β β 3 β 4 β β 2 β 98 β βββββββββ΄βββββββββββ
- truediv(other: Any) Self [source]
Method equivalent of float division operator
expr / other
.- Parameters:
- other
Numeric literal or expression value.
See also
Notes
Zero-division behaviour follows IEEE-754:
0/0: Invalid operation - mathematically undefined, returns NaN. n/0: On finite operands gives an exact infinite result, eg: Β±infinity.
Examples
>>> df = pl.DataFrame( ... data={"x": [-2, -1, 0, 1, 2], "y": [0.5, 0.0, 0.0, -4.0, -0.5]} ... ) >>> df.with_columns( ... pl.col("x").truediv(2).alias("x/2"), ... pl.col("x").truediv(pl.col("y")).alias("x/y"), ... ) shape: (5, 4) βββββββ¬βββββββ¬βββββββ¬ββββββββ β x β y β x/2 β x/y β β --- β --- β --- β --- β β i64 β f64 β f64 β f64 β βββββββͺβββββββͺβββββββͺββββββββ‘ β -2 β 0.5 β -1.0 β -4.0 β β -1 β 0.0 β -0.5 β -inf β β 0 β 0.0 β 0.0 β NaN β β 1 β -4.0 β 0.5 β -0.25 β β 2 β -0.5 β 1.0 β -4.0 β βββββββ΄βββββββ΄βββββββ΄ββββββββ
- unique(*, maintain_order: bool = False) Self [source]
Get unique values of this expression.
- Parameters:
- maintain_order
Maintain order of data. This requires more work.
Examples
>>> df = pl.DataFrame({"a": [1, 1, 2]}) >>> df.select(pl.col("a").unique()) shape: (2, 1) βββββββ β a β β --- β β i64 β βββββββ‘ β 2 β β 1 β βββββββ >>> df.select(pl.col("a").unique(maintain_order=True)) shape: (2, 1) βββββββ β a β β --- β β i64 β βββββββ‘ β 1 β β 2 β βββββββ
- unique_counts() Self [source]
Return a count of the unique values in the order of appearance.
This method differs from value_counts in that it does not return the values, only the counts and might be faster
Examples
>>> df = pl.DataFrame( ... { ... "id": ["a", "b", "b", "c", "c", "c"], ... } ... ) >>> df.select( ... [ ... pl.col("id").unique_counts(), ... ] ... ) shape: (3, 1) βββββββ β id β β --- β β u32 β βββββββ‘ β 1 β β 2 β β 3 β βββββββ
- upper_bound() Self [source]
Calculate the upper bound.
Returns a unit Series with the highest value possible for the dtype of this expression.
Examples
>>> df = pl.DataFrame({"a": [1, 2, 3, 2, 1]}) >>> df.select(pl.col("a").upper_bound()) shape: (1, 1) βββββββββββββββββββββββ β a β β --- β β i64 β βββββββββββββββββββββββ‘ β 9223372036854775807 β βββββββββββββββββββββββ
- value_counts(*, multithreaded: bool = False, sort: bool = False) Self [source]
Count all unique values and create a struct mapping value to count.
- Parameters:
- multithreaded:
Better to turn this off in the aggregation context, as it can lead to contention.
- sort:
Ensure the output is sorted from most values to least.
- Returns:
- Expr
Expression of data type
Struct
.
Examples
>>> df = pl.DataFrame( ... { ... "id": ["a", "b", "b", "c", "c", "c"], ... } ... ) >>> df.select( ... [ ... pl.col("id").value_counts(sort=True), ... ] ... ) shape: (3, 1) βββββββββββββ β id β β --- β β struct[2] β βββββββββββββ‘ β {"c",3} β β {"b",2} β β {"a",1} β βββββββββββββ
- var(ddof: int = 1) Self [source]
Get variance.
- Parameters:
- ddof
βDelta Degrees of Freedomβ: the divisor used in the calculation is N - ddof, where N represents the number of elements. By default ddof is 1.
Examples
>>> df = pl.DataFrame({"a": [-1, 0, 1]}) >>> df.select(pl.col("a").var()) shape: (1, 1) βββββββ β a β β --- β β f64 β βββββββ‘ β 1.0 β βββββββ
- where(predicate: Expr) Self [source]
Filter a single column.
Alias for
filter()
.- Parameters:
- predicate
Boolean expression.
Examples
>>> df = pl.DataFrame( ... { ... "group_col": ["g1", "g1", "g2"], ... "b": [1, 2, 3], ... } ... ) >>> df.groupby("group_col").agg( ... [ ... pl.col("b").where(pl.col("b") < 2).sum().alias("lt"), ... pl.col("b").where(pl.col("b") >= 2).sum().alias("gte"), ... ] ... ).sort("group_col") shape: (2, 3) βββββββββββββ¬ββββββ¬ββββββ β group_col β lt β gte β β --- β --- β --- β β str β i64 β i64 β βββββββββββββͺββββββͺββββββ‘ β g1 β 1 β 2 β β g2 β 0 β 3 β βββββββββββββ΄ββββββ΄ββββββ
- xor(other: Any) Self [source]
Method equivalent of bitwise exclusive-or operator
expr ^ other
.- Parameters:
- other
Integer or boolean value; accepts expression input.
Examples
>>> df = pl.DataFrame( ... {"x": [True, False, True, False], "y": [True, True, False, False]} ... ) >>> df.with_columns(pl.col("x").xor(pl.col("y")).alias("x ^ y")) shape: (4, 3) βββββββββ¬ββββββββ¬ββββββββ β x β y β x ^ y β β --- β --- β --- β β bool β bool β bool β βββββββββͺββββββββͺββββββββ‘ β true β true β false β β false β true β true β β true β false β true β β false β false β false β βββββββββ΄ββββββββ΄ββββββββ
>>> def binary_string(n: int) -> str: ... return bin(n)[2:].zfill(8) >>> >>> df = pl.DataFrame( ... data={"x": [10, 8, 250, 66], "y": [1, 2, 3, 4]}, ... schema={"x": pl.UInt8, "y": pl.UInt8}, ... ) >>> df.with_columns( ... pl.col("x").apply(binary_string).alias("bin_x"), ... pl.col("y").apply(binary_string).alias("bin_y"), ... pl.col("x").xor(pl.col("y")).alias("xor_xy"), ... pl.col("x").xor(pl.col("y")).apply(binary_string).alias("bin_xor_xy"), ... ) shape: (4, 6) βββββββ¬ββββββ¬βββββββββββ¬βββββββββββ¬βββββββββ¬βββββββββββββ β x β y β bin_x β bin_y β xor_xy β bin_xor_xy β β --- β --- β --- β --- β --- β --- β β u8 β u8 β str β str β u8 β str β βββββββͺββββββͺβββββββββββͺβββββββββββͺβββββββββͺβββββββββββββ‘ β 10 β 1 β 00001010 β 00000001 β 11 β 00001011 β β 8 β 2 β 00001000 β 00000010 β 10 β 00001010 β β 250 β 3 β 11111010 β 00000011 β 249 β 11111001 β β 66 β 4 β 01000010 β 00000100 β 70 β 01000110 β βββββββ΄ββββββ΄βββββββββββ΄βββββββββββ΄βββββββββ΄βββββββββββββ