Series#
This page gives an overview of all public Series methods.
- class polars.Series(
- name: str | ArrayLike | None = None,
- values: ArrayLike | None = None,
- dtype: PolarsDataType | None = None,
- *,
- strict: bool = True,
- nan_to_null: bool = False,
- dtype_if_empty: PolarsDataType = Null,
A Series represents a single column in a polars DataFrame.
- Parameters:
- namestr, default None
Name of the Series. Will be used as a column name when used in a DataFrame. When not specified, name is set to an empty string.
- valuesArrayLike, default None
One-dimensional data in various forms. Supported are: Sequence, Series, pyarrow Array, and numpy ndarray.
- dtypeDataType, default None
Data type of the resulting Series. If set to
None
(default), the data type is inferred from thevalues
input. The strategy for data type inference depends on thestrict
parameter:If
strict
is set to True (default), the inferred data type is equal to the first non-null value, orNull
if all values are null.If
strict
is set to False, the inferred data type is the supertype of the values, orObject
if no supertype can be found. WARNING: A full pass over the values is required to determine the supertype.If no values were passed, the resulting data type is
Null
.
- strictbool, default True
Throw an error if any value does not exactly match the given or inferred data type. If set to
False
, values that do not match the data type are cast to that data type or, if casting is not possible, set to null instead.- nan_to_nullbool, default False
In case a numpy array is used to create this Series, indicate how to deal with np.nan values. (This parameter is a no-op on non-numpy data).
- dtype_if_emptyDataType, default Null
Data type of the Series if
values
contains no non-null data.Deprecated since version 0.20.6: The data type for empty Series will always be
Null
, unlessdtype
is specified. To preserve behavior, check if the resulting Series has data typeNull
and cast to the desired data type. This parameter will be removed in the next breaking release.
Examples
Constructing a Series by specifying name and values positionally:
>>> s = pl.Series("a", [1, 2, 3]) >>> s shape: (3,) Series: 'a' [i64] [ 1 2 3 ]
Notice that the dtype is automatically inferred as a polars Int64:
>>> s.dtype Int64
Constructing a Series with a specific dtype:
>>> s2 = pl.Series("a", [1, 2, 3], dtype=pl.Float32) >>> s2 shape: (3,) Series: 'a' [f32] [ 1.0 2.0 3.0 ]
It is possible to construct a Series with values as the first positional argument. This syntax considered an anti-pattern, but it can be useful in certain scenarios. You must specify any other arguments through keywords.
>>> s3 = pl.Series([1, 2, 3]) >>> s3 shape: (3,) Series: '' [i64] [ 1 2 3 ]
Methods:
Compute absolute values.
Rename the series.
Return whether all values in the column are
True
.Return whether any of the values in the column are
True
.Append a Series to this one.
Apply a custom/user-defined function (UDF) over elements in this Series.
Compute the element-wise value for the inverse cosine.
Compute the element-wise value for the inverse hyperbolic cosine.
Compute the element-wise value for the inverse sine.
Compute the element-wise value for the inverse hyperbolic sine.
Compute the element-wise value for the inverse tangent.
Compute the element-wise value for the inverse hyperbolic tangent.
Get the index of the maximal value.
Get the index of the minimal value.
Get the index values that would sort this Series.
Get index values where Boolean Series evaluate True.
Get unique index as Series.
Return the
k
smallest elements.Cast between data types.
Compute the cube root of the elements.
Rounds up to the nearest integer value.
Get the length of each individual chunk.
Create an empty copy of the current Series, with zero to 'n' elements.
Set values outside the given boundaries to the boundary value.
Clip (limit) the values in an array to a
max
boundary.Clip (limit) the values in an array to a
min
boundary.Create a copy of this Series.
Compute the element-wise value for the cosine.
Compute the element-wise value for the hyperbolic cosine.
Compute the element-wise value for the cotangent.
count
Return the number of non-null elements in the column.
Return the cumulative count of the non-null values in the column.
Get an array with the cumulative max computed at every element.
Get an array with the cumulative min computed at every element.
Get an array with the cumulative product computed at every element.
Get an array with the cumulative sum computed at every element.
Get an array with the cumulative max computed at every element.
Get an array with the cumulative min computed at every element.
Get an array with the cumulative product computed at every element.
Get an array with the cumulative sum computed at every element.
Run an expression over a sliding window that increases
1
slot every iteration.Bin continuous values into discrete categories.
Quick summary statistics of a Series.
Calculate the first discrete difference between shifted items.
Compute the dot/inner product between two Series.
Drop all floating point NaN values.
Drop all null values.
Computes the entropy.
eq
Method equivalent of operator expression
series == other
.eq_missing
Method equivalent of equality operator
series == other
whereNone == None
.Check whether the Series is equal to another Series.
Return an estimation of the total (heap) allocated size of the Series.
Exponentially-weighted moving average.
Calculate time-based exponentially weighted moving average.
Exponentially-weighted moving standard deviation.
Exponentially-weighted moving variance.
Compute the exponential, element-wise.
Explode a list Series.
Extend the memory backed by this Series with the values from another.
Extremely fast method for extending the Series with 'n' copies of a value.
Fill floating point NaN value with a fill value.
Fill null values using the specified value or strategy.
Filter elements by a boolean mask.
Rounds down to the nearest integer value.
Take values by index.
Take every nth value in the Series and return as new Series.
ge
Method equivalent of operator expression
series >= other
.Get the chunks of this Series as a list of Series.
gt
Method equivalent of operator expression
series > other
.Check whether the Series contains one or more null values.
Return True if the Series has a validity bitmask.
Hash the Series.
Get the first
n
elements.Bin values into buckets and count their occurrences.
Aggregate values into a list.
Fill null values using interpolation.
Fill null values using interpolation based on another column.
Get a boolean mask of the values that are between the given lower/upper bounds.
Check if this Series is a Boolean.
Get mask of all duplicated values.
Check if the Series is empty.
Returns a boolean Series indicating which values are finite.
Return a boolean mask indicating the first occurrence of each distinct value.
Return a boolean mask indicating the first occurrence of each distinct value.
Check if this Series has floating point numbers.
Check if elements of this Series are in the other Series.
Returns a boolean Series indicating which values are infinite.
Check if this Series datatype is an integer (signed or unsigned).
Return a boolean mask indicating the last occurrence of each distinct value.
Return a boolean mask indicating the last occurrence of each distinct value.
Returns a boolean Series indicating which values are not NaN.
Returns a boolean Series indicating which values are not NaN.
Returns a boolean Series indicating which values are not null.
Returns a boolean Series indicating which values are null.
Check if this Series datatype is numeric.
Check if the Series is sorted.
Check if this Series datatype is temporal.
Get mask of all unique values.
Check if this Series datatype is a String.
Return the Series as a scalar, or return the element at the given index.
Compute the kurtosis (Fisher or Pearson) of a dataset.
le
Method equivalent of operator expression
series <= other
.Return the number of elements in the Series.
Get the first
n
elements.Compute the logarithm to a given base.
Compute the base 10 logarithm of the input array, element-wise.
Compute the natural logarithm of the input array plus one, element-wise.
Return the lower bound of this Series' dtype as a unit Series.
lt
Method equivalent of operator expression
series < other
.Replace values in the Series using a remapping dictionary.
Map a custom/user-defined function (UDF) over elements in this Series.
Get the maximum value in this Series.
Reduce this Series to the mean value.
Get the median of this Series.
Get the minimal value in this Series.
Compute the most occurring value(s).
Get the number of chunks that this Series contains.
Count the number of unique values in this Series.
Get maximum value, but propagate/poison encountered NaN values.
Get minimum value, but propagate/poison encountered NaN values.
ne
Method equivalent of operator expression
series != other
.ne_missing
Method equivalent of equality operator
series != other
whereNone == None
.Create a new Series filled with values from the given index.
Negate a boolean Series.
Count the null values in this Series.
Computes percentage change between values.
Get a boolean mask of the local maximum peaks.
Get a boolean mask of the local minimum peaks.
pow
Raise to the power of the given exponent.
Reduce this Series to the product value.
Bin continuous values into discrete categories based on their quantiles.
Get the quantile value of this Series.
Assign ranks to data, dealing with ties appropriately.
Create a single chunk of memory for this Series.
Reinterpret the underlying bits as a signed/unsigned integer.
Rename this Series.
Replace values by different values.
Reshape this Series to a flat Series or a Series of Lists.
Return Series in reverse order.
Compress the Series data using run-length encoding.
Get a distinct integer ID for each run of identical values.
Apply a custom rolling window function.
Compute a custom rolling window function.
Apply a rolling max (moving max) over the values in this array.
Apply a rolling mean (moving mean) over the values in this array.
Compute a rolling median.
Apply a rolling min (moving min) over the values in this array.
Compute a rolling quantile.
Compute a rolling skew.
Compute a rolling std dev.
Apply a rolling sum (moving sum) over the values in this array.
Compute a rolling variance.
Round underlying floating point data by
decimals
digits.Round to a number of significant figures.
Sample from this Series.
Set values at the index locations.
Find indices where elements should be inserted to maintain order.
Check whether the Series is equal to another Series.
Set masked values.
Set values at the index locations.
Flags the Series as 'sorted'.
Shift values by the given number of indices.
Shift values by the given number of places and fill the resulting null values.
Shrink numeric columns to the minimal required datatype.
Shrink Series memory usage.
Shuffle the contents of this Series.
Compute the element-wise indication of the sign.
Compute the element-wise value for the sine.
Compute the element-wise value for the hyperbolic sine.
Compute the sample skewness of a data set.
Get a slice of this Series.
Sort this Series.
Compute the square root of the elements.
Get the standard deviation of this Series.
Reduce this Series to the sum value.
Get the last
n
elements.Take values by index.
Take every nth value in the Series and return as new Series.
Compute the element-wise value for the tangent.
Compute the element-wise value for the hyperbolic tangent.
Return the underlying Arrow array.
Get dummy/indicator variables.
Cast this Series to a DataFrame.
Convert Series to instantiatable string representation.
Convert this Series to a Jax Array.
Convert this Series to a Python list.
Convert this Series to a NumPy ndarray.
Convert this Series to a pandas Series.
Cast to physical representation of the logical dtype.
Convert this Series to a PyTorch Tensor.
Return the
k
largest elements.Get unique elements in series.
Return a count of the unique values in the order of appearance.
Return the upper bound of this Series' dtype as a unit Series.
Count the occurrences of unique values.
Get variance of this Series.
Get a view into this Series data with a numpy array.
Take values from self or other based on the given mask.
Attributes:
Get the data type of this Series.
Get flags that are set on the Series.
Get the name of this Series.
plot
Create a plot namespace.
Shape of this Series.
- abs() Series [source]
Compute absolute values.
Same as
abs(series)
.Examples
>>> s = pl.Series([1, -2, -3]) >>> s.abs() shape: (3,) Series: '' [i64] [ 1 2 3 ]
- alias(name: str) Series [source]
Rename the series.
- Parameters:
- name
The new name.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.alias("b") shape: (3,) Series: 'b' [i64] [ 1 2 3 ]
- all(*, ignore_nulls: bool = True) bool | None [source]
Return whether all values in the column are
True
.Only works on columns of data type
Boolean
.- Parameters:
- ignore_nulls
Ignore null values (default).
If set to
False
, Kleene logic is used to deal with nulls: if the column contains any null values and noFalse
values, the output isNone
.
- Returns:
- bool or None
Examples
>>> pl.Series([True, True]).all() True >>> pl.Series([False, True]).all() False >>> pl.Series([None, True]).all() True
Enable Kleene logic by setting
ignore_nulls=False
.>>> pl.Series([None, True]).all(ignore_nulls=False) # Returns None
- any(*, ignore_nulls: bool = True) bool | None [source]
Return whether any of the values in the column are
True
.Only works on columns of data type
Boolean
.- Parameters:
- ignore_nulls
Ignore null values (default).
If set to
False
, Kleene logic is used to deal with nulls: if the column contains any null values and noTrue
values, the output isNone
.
- Returns:
- bool or None
Examples
>>> pl.Series([True, False]).any() True >>> pl.Series([False, False]).any() False >>> pl.Series([None, False]).any() False
Enable Kleene logic by setting
ignore_nulls=False
.>>> pl.Series([None, False]).any(ignore_nulls=False) # Returns None
- append(other: Series) Self [source]
Append a Series to this one.
The resulting series will consist of multiple chunks.
- Parameters:
- other
Series to append.
Warning
This method modifies the series in-place. The series is returned for convenience only.
See also
Examples
>>> a = pl.Series("a", [1, 2, 3]) >>> b = pl.Series("b", [4, 5]) >>> a.append(b) shape: (5,) Series: 'a' [i64] [ 1 2 3 4 5 ]
The resulting series will consist of multiple chunks.
>>> a.n_chunks() 2
- apply(
- function: Callable[[Any], Any],
- return_dtype: PolarsDataType | None = None,
- *,
- skip_nulls: bool = True,
Apply a custom/user-defined function (UDF) over elements in this Series.
Deprecated since version 0.19.0: This method has been renamed to
Series.map_elements()
.- Parameters:
- function
Custom function or lambda.
- return_dtype
Output datatype. If none is given, the same datatype as this Series will be used.
- skip_nulls
Nulls will be skipped and not passed to the python function. This is faster because python can be skipped and because we call more specialized functions.
- arccos() Series [source]
Compute the element-wise value for the inverse cosine.
Examples
>>> s = pl.Series("a", [1.0, 0.0, -1.0]) >>> s.arccos() shape: (3,) Series: 'a' [f64] [ 0.0 1.570796 3.141593 ]
- arccosh() Series [source]
Compute the element-wise value for the inverse hyperbolic cosine.
Examples
>>> s = pl.Series("a", [5.0, 1.0, 0.0, -1.0]) >>> s.arccosh() shape: (4,) Series: 'a' [f64] [ 2.292432 0.0 NaN NaN ]
- arcsin() Series [source]
Compute the element-wise value for the inverse sine.
Examples
>>> s = pl.Series("a", [1.0, 0.0, -1.0]) >>> s.arcsin() shape: (3,) Series: 'a' [f64] [ 1.570796 0.0 -1.570796 ]
- arcsinh() Series [source]
Compute the element-wise value for the inverse hyperbolic sine.
Examples
>>> s = pl.Series("a", [1.0, 0.0, -1.0]) >>> s.arcsinh() shape: (3,) Series: 'a' [f64] [ 0.881374 0.0 -0.881374 ]
- arctan() Series [source]
Compute the element-wise value for the inverse tangent.
Examples
>>> s = pl.Series("a", [1.0, 0.0, -1.0]) >>> s.arctan() shape: (3,) Series: 'a' [f64] [ 0.785398 0.0 -0.785398 ]
- arctanh() Series [source]
Compute the element-wise value for the inverse hyperbolic tangent.
Examples
>>> s = pl.Series("a", [2.0, 1.0, 0.5, 0.0, -0.5, -1.0, -1.1]) >>> s.arctanh() shape: (7,) Series: 'a' [f64] [ NaN inf 0.549306 0.0 -0.549306 -inf NaN ]
- arg_max() int | None [source]
Get the index of the maximal value.
- Returns:
- int
Examples
>>> s = pl.Series("a", [3, 2, 1]) >>> s.arg_max() 0
- arg_min() int | None [source]
Get the index of the minimal value.
- Returns:
- int
Examples
>>> s = pl.Series("a", [3, 2, 1]) >>> s.arg_min() 2
- arg_sort( ) Series [source]
Get the index values that would sort this Series.
- Parameters:
- descending
Sort in descending order.
- nulls_last
Place null values last instead of first.
See also
Series.gather
Take values by index.
Series.rank
Get the rank of each row.
Examples
>>> s = pl.Series("a", [5, 3, 4, 1, 2]) >>> s.arg_sort() shape: (5,) Series: 'a' [u32] [ 3 4 1 2 0 ]
- arg_true() Series [source]
Get index values where Boolean Series evaluate True.
- Returns:
- Series
Series of data type
UInt32
.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> (s == 2).arg_true() shape: (1,) Series: 'a' [u32] [ 1 ]
- arg_unique() Series [source]
Get unique index as Series.
- Returns:
- Series
Examples
>>> s = pl.Series("a", [1, 2, 2, 3]) >>> s.arg_unique() shape: (3,) Series: 'a' [u32] [ 0 1 3 ]
- bottom_k(k: int = 5) Series [source]
Return the
k
smallest elements.This has time complexity:
\[O(n + k \log{n})\]- Parameters:
- k
Number of elements to return.
See also
Examples
>>> s = pl.Series("a", [2, 5, 1, 4, 3]) >>> s.bottom_k(3) shape: (3,) Series: 'a' [i64] [ 1 2 3 ]
- cast( ) Self [source]
Cast between data types.
- Parameters:
- dtype
DataType to cast to.
- strict
Throw an error if a cast could not be done (for instance, due to an overflow).
Examples
>>> s = pl.Series("a", [True, False, True]) >>> s shape: (3,) Series: 'a' [bool] [ true false true ]
>>> s.cast(pl.UInt32) shape: (3,) Series: 'a' [u32] [ 1 0 1 ]
- cbrt() Series [source]
Compute the cube root of the elements.
Optimization for
>>> pl.Series([1, 2]) ** (1.0 / 3) shape: (2,) Series: '' [f64] [ 1.0 1.259921 ]
Examples
>>> s = pl.Series([1, 2, 3]) >>> s.cbrt() shape: (3,) Series: '' [f64] [ 1.0 1.259921 1.44225 ]
- ceil() Series [source]
Rounds up to the nearest integer value.
Only works on floating point Series.
Examples
>>> s = pl.Series("a", [1.12345, 2.56789, 3.901234]) >>> s.ceil() shape: (3,) Series: 'a' [f64] [ 2.0 3.0 4.0 ]
- chunk_lengths() list[int] [source]
Get the length of each individual chunk.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s2 = pl.Series("a", [4, 5, 6])
Concatenate Series with rechunk = True
>>> pl.concat([s, s2], rechunk=True).chunk_lengths() [6]
Concatenate Series with rechunk = False
>>> pl.concat([s, s2], rechunk=False).chunk_lengths() [3, 3]
- clear(n: int = 0) Series [source]
Create an empty copy of the current Series, with zero to ‘n’ elements.
The copy has an identical name/dtype, but no data.
- Parameters:
- n
Number of (empty) elements to return in the cleared frame.
See also
clone
Cheap deepcopy/clone.
Examples
>>> s = pl.Series("a", [None, True, False]) >>> s.clear() shape: (0,) Series: 'a' [bool] [ ]
>>> s.clear(n=2) shape: (2,) Series: 'a' [bool] [ null null ]
- clip(
- lower_bound: NumericLiteral | TemporalLiteral | IntoExprColumn | None = None,
- upper_bound: NumericLiteral | TemporalLiteral | IntoExprColumn | None = None,
Set values outside the given boundaries to the boundary value.
- Parameters:
- lower_bound
Lower bound. Accepts expression input. Non-expression inputs are parsed as literals. If set to
None
(default), no lower bound is applied.- upper_bound
Upper bound. Accepts expression input. Non-expression inputs are parsed as literals. If set to
None
(default), no upper bound is applied.
See also
Notes
This method only works for numeric and temporal columns. To clip other data types, consider writing a
when-then-otherwise
expression. Seewhen()
.Examples
Specifying both a lower and upper bound:
>>> s = pl.Series([-50, 5, 50, None]) >>> s.clip(1, 10) shape: (4,) Series: '' [i64] [ 1 5 10 null ]
Specifying only a single bound:
>>> s.clip(upper_bound=10) shape: (4,) Series: '' [i64] [ -50 5 10 null ]
- clip_max(
- upper_bound: NumericLiteral | TemporalLiteral | IntoExprColumn,
Clip (limit) the values in an array to a
max
boundary.Deprecated since version 0.19.12: Use
clip()
instead.- Parameters:
- upper_bound
Upper bound.
- clip_min(
- lower_bound: NumericLiteral | TemporalLiteral | IntoExprColumn,
Clip (limit) the values in an array to a
min
boundary.Deprecated since version 0.19.12: Use
clip()
instead.- Parameters:
- lower_bound
Lower bound.
- clone() Self [source]
Create a copy of this Series.
This is a cheap operation that does not copy data.
See also
clear
Create an empty copy of the current Series, with identical schema but no data.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.clone() shape: (3,) Series: 'a' [i64] [ 1 2 3 ]
- cos() Series [source]
Compute the element-wise value for the cosine.
Examples
>>> import math >>> s = pl.Series("a", [0.0, math.pi / 2.0, math.pi]) >>> s.cos() shape: (3,) Series: 'a' [f64] [ 1.0 6.1232e-17 -1.0 ]
- cosh() Series [source]
Compute the element-wise value for the hyperbolic cosine.
Examples
>>> s = pl.Series("a", [1.0, 0.0, -1.0]) >>> s.cosh() shape: (3,) Series: 'a' [f64] [ 1.543081 1.0 1.543081 ]
- cot() Series [source]
Compute the element-wise value for the cotangent.
Examples
>>> import math >>> s = pl.Series("a", [0.0, math.pi / 2.0, math.pi]) >>> s.cot() shape: (3,) Series: 'a' [f64] [ inf 6.1232e-17 -8.1656e15 ]
- count() int [source]
Return the number of non-null elements in the column.
See also
Examples
>>> s = pl.Series("a", [1, 2, None]) >>> s.count() 2
- cum_count(*, reverse: bool = False) Self [source]
Return the cumulative count of the non-null values in the column.
- Parameters:
- reverse
Reverse the operation.
Examples
>>> s = pl.Series(["x", "k", None, "d"]) >>> s.cum_count() shape: (4,) Series: '' [u32] [ 1 2 2 3 ]
- cum_max(*, reverse: bool = False) Series [source]
Get an array with the cumulative max computed at every element.
- Parameters:
- reverse
reverse the operation.
Examples
>>> s = pl.Series("s", [3, 5, 1]) >>> s.cum_max() shape: (3,) Series: 's' [i64] [ 3 5 5 ]
- cum_min(*, reverse: bool = False) Series [source]
Get an array with the cumulative min computed at every element.
- Parameters:
- reverse
reverse the operation.
Examples
>>> s = pl.Series("s", [1, 2, 3]) >>> s.cum_min() shape: (3,) Series: 's' [i64] [ 1 1 1 ]
- cum_prod(*, reverse: bool = False) Series [source]
Get an array with the cumulative product computed at every element.
- Parameters:
- reverse
reverse the operation.
Notes
Dtypes in {Int8, UInt8, Int16, UInt16} are cast to Int64 before summing to prevent overflow issues.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.cum_prod() shape: (3,) Series: 'a' [i64] [ 1 2 6 ]
- cum_sum(*, reverse: bool = False) Series [source]
Get an array with the cumulative sum computed at every element.
- Parameters:
- reverse
reverse the operation.
Notes
Dtypes in {Int8, UInt8, Int16, UInt16} are cast to Int64 before summing to prevent overflow issues.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.cum_sum() shape: (3,) Series: 'a' [i64] [ 1 3 6 ]
- cummax(*, reverse: bool = False) Series [source]
Get an array with the cumulative max computed at every element.
Deprecated since version 0.19.14: This method has been renamed to
cum_max()
.- Parameters:
- reverse
reverse the operation.
- cummin(*, reverse: bool = False) Series [source]
Get an array with the cumulative min computed at every element.
Deprecated since version 0.19.14: This method has been renamed to
cum_min()
.- Parameters:
- reverse
reverse the operation.
- cumprod(*, reverse: bool = False) Series [source]
Get an array with the cumulative product computed at every element.
Deprecated since version 0.19.14: This method has been renamed to
cum_prod()
.- Parameters:
- reverse
reverse the operation.
- cumsum(*, reverse: bool = False) Series [source]
Get an array with the cumulative sum computed at every element.
Deprecated since version 0.19.14: This method has been renamed to
cum_sum()
.- Parameters:
- reverse
reverse the operation.
- cumulative_eval( ) Series [source]
Run an expression over a sliding window that increases
1
slot every iteration.Warning
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
- Parameters:
- expr
Expression to evaluate
- min_periods
Number of valid values there should be in the window before the expression is evaluated. valid values =
length - null_count
- parallel
Run in parallel. Don’t do this in a group by or another operation that already has much parallelization.
Warning
This can be really slow as it can have
O(n^2)
complexity. Don’t use this for operations that visit all elements.Examples
>>> s = pl.Series("values", [1, 2, 3, 4, 5]) >>> s.cumulative_eval(pl.element().first() - pl.element().last() ** 2) shape: (5,) Series: 'values' [i64] [ 0 -3 -8 -15 -24 ]
- cut(
- breaks: Sequence[float],
- *,
- labels: Sequence[str] | None = None,
- break_point_label: str = 'break_point',
- category_label: str = 'category',
- left_closed: bool = False,
- include_breaks: bool = False,
- as_series: bool = True,
Bin continuous values into discrete categories.
Warning
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
- Parameters:
- breaks
List of unique cut points.
- labels
Names of the categories. The number of labels must be equal to the number of cut points plus one.
- break_point_label
Name of the breakpoint column. Only used if
include_breaks
is set toTrue
.Deprecated since version 0.19.0: This parameter will be removed. Use
Series.struct.rename_fields
to rename the field instead.- category_label
Name of the category column. Only used if
include_breaks
is set toTrue
.Deprecated since version 0.19.0: This parameter will be removed. Use
Series.struct.rename_fields
to rename the field instead.- left_closed
Set the intervals to be left-closed instead of right-closed.
- include_breaks
Include a column with the right endpoint of the bin each observation falls in. This will change the data type of the output from a
Categorical
to aStruct
.- as_series
If set to
False
, return a DataFrame containing the original values, the breakpoints, and the categories.Deprecated since version 0.19.0: This parameter will be removed. The same behavior can be achieved by setting
include_breaks=True
, unnesting the resulting struct Series, and adding the result to the original Series.
- Returns:
- Series
Series of data type
Categorical
ifinclude_breaks
is set toFalse
(default), otherwise a Series of data typeStruct
.
See also
Examples
Divide the column into three categories.
>>> s = pl.Series("foo", [-2, -1, 0, 1, 2]) >>> s.cut([-1, 1], labels=["a", "b", "c"]) shape: (5,) Series: 'foo' [cat] [ "a" "a" "b" "b" "c" ]
Create a DataFrame with the breakpoint and category for each value.
>>> cut = s.cut([-1, 1], include_breaks=True).alias("cut") >>> s.to_frame().with_columns(cut).unnest("cut") shape: (5, 3) ┌─────┬─────────────┬────────────┐ │ foo ┆ break_point ┆ category │ │ --- ┆ --- ┆ --- │ │ i64 ┆ f64 ┆ cat │ ╞═════╪═════════════╪════════════╡ │ -2 ┆ -1.0 ┆ (-inf, -1] │ │ -1 ┆ -1.0 ┆ (-inf, -1] │ │ 0 ┆ 1.0 ┆ (-1, 1] │ │ 1 ┆ 1.0 ┆ (-1, 1] │ │ 2 ┆ inf ┆ (1, inf] │ └─────┴─────────────┴────────────┘
- describe(
- percentiles: Sequence[float] | float | None = (0.25, 0.5, 0.75),
- interpolation: RollingInterpolationMethod = 'nearest',
Quick summary statistics of a Series.
Series with mixed datatypes will return summary statistics for the datatype of the first value.
- Parameters:
- percentiles
One or more percentiles to include in the summary statistics (if the Series has a numeric dtype). All values must be in the range
[0, 1]
.- interpolation{‘nearest’, ‘higher’, ‘lower’, ‘midpoint’, ‘linear’}
Interpolation method used when calculating percentiles.
- Returns:
- DataFrame
Mapping with summary statistics of a Series.
Notes
The median is included by default as the 50% percentile.
Examples
>>> s = pl.Series([1, 2, 3, 4, 5]) >>> s.describe() shape: (9, 2) ┌────────────┬──────────┐ │ statistic ┆ value │ │ --- ┆ --- │ │ str ┆ f64 │ ╞════════════╪══════════╡ │ count ┆ 5.0 │ │ null_count ┆ 0.0 │ │ mean ┆ 3.0 │ │ std ┆ 1.581139 │ │ min ┆ 1.0 │ │ 25% ┆ 2.0 │ │ 50% ┆ 3.0 │ │ 75% ┆ 4.0 │ │ max ┆ 5.0 │ └────────────┴──────────┘
Non-numeric data types may not have all statistics available.
>>> s = pl.Series(["aa", "aa", None, "bb", "cc"]) >>> s.describe() shape: (4, 2) ┌────────────┬───────┐ │ statistic ┆ value │ │ --- ┆ --- │ │ str ┆ str │ ╞════════════╪═══════╡ │ count ┆ 4 │ │ null_count ┆ 1 │ │ min ┆ aa │ │ max ┆ cc │ └────────────┴───────┘
- diff(n: int = 1, null_behavior: NullBehavior = 'ignore') Series [source]
Calculate the first discrete difference between shifted items.
- Parameters:
- n
Number of slots to shift.
- null_behavior{‘ignore’, ‘drop’}
How to handle null values.
Examples
>>> s = pl.Series("s", values=[20, 10, 30, 25, 35], dtype=pl.Int8) >>> s.diff() shape: (5,) Series: 's' [i8] [ null -10 20 -5 10 ]
>>> s.diff(n=2) shape: (5,) Series: 's' [i8] [ null null 10 15 5 ]
>>> s.diff(n=2, null_behavior="drop") shape: (3,) Series: 's' [i8] [ 10 15 5 ]
- dot(other: Series | ArrayLike) int | float | None [source]
Compute the dot/inner product between two Series.
- Parameters:
- other
Series (or array) to compute dot product with.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s2 = pl.Series("b", [4.0, 5.0, 6.0]) >>> s.dot(s2) 32.0
- drop_nans() Series [source]
Drop all floating point NaN values.
The original order of the remaining elements is preserved.
See also
Notes
A NaN value is not the same as a null value. To drop null values, use
drop_nulls()
.Examples
>>> s = pl.Series([1.0, None, 3.0, float("nan")]) >>> s.drop_nans() shape: (3,) Series: '' [f64] [ 1.0 null 3.0 ]
- drop_nulls() Series [source]
Drop all null values.
The original order of the remaining elements is preserved.
See also
Notes
A null value is not the same as a NaN value. To drop NaN values, use
drop_nans()
.Examples
>>> s = pl.Series([1.0, None, 3.0, float("nan")]) >>> s.drop_nulls() shape: (3,) Series: '' [f64] [ 1.0 3.0 NaN ]
- property dtype: DataType[source]
Get the data type of this Series.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.dtype Int64
- entropy( ) float | None [source]
Computes the entropy.
Uses the formula
-sum(pk * log(pk)
wherepk
are discrete probabilities.- Parameters:
- base
Given base, defaults to
e
- normalize
Normalize pk if it doesn’t sum to 1.
Examples
>>> a = pl.Series([0.99, 0.005, 0.005]) >>> a.entropy(normalize=True) 0.06293300616044681 >>> b = pl.Series([0.65, 0.10, 0.25]) >>> b.entropy(normalize=True) 0.8568409950394724
- eq(other: Any) Series | Expr [source]
Method equivalent of operator expression
series == other
.
- eq_missing(other: Any) Series | Expr [source]
Method equivalent of equality operator
series == other
whereNone == None
.This differs from the standard
ne
where null values are propagated.- Parameters:
- other
A literal or expression value to compare with.
See also
ne_missing
eq
Examples
>>> s1 = pl.Series("a", [333, 200, None]) >>> s2 = pl.Series("a", [100, 200, None]) >>> s1.eq(s2) shape: (3,) Series: 'a' [bool] [ false true null ] >>> s1.eq_missing(s2) shape: (3,) Series: 'a' [bool] [ false true true ]
- equals( ) bool [source]
Check whether the Series is equal to another Series.
- Parameters:
- other
Series to compare with.
- check_dtypes
Require data types to match.
- null_equal
Consider null values as equal.
See also
assert_series_equal
Examples
>>> s1 = pl.Series("a", [1, 2, 3]) >>> s2 = pl.Series("b", [4, 5, 6]) >>> s1.equals(s1) True >>> s1.equals(s2) False
- estimated_size(unit: SizeUnit = 'b') int | float [source]
Return an estimation of the total (heap) allocated size of the Series.
Estimated size is given in the specified unit (bytes by default).
This estimation is the sum of the size of its buffers, validity, including nested arrays. Multiple arrays may share buffers and bitmaps. Therefore, the size of 2 arrays is not the sum of the sizes computed from this function. In particular, [
StructArray
]’s size is an upper bound.When an array is sliced, its allocated size remains constant because the buffer unchanged. However, this function will yield a smaller number. This is because this function returns the visible size of the buffer, not its total capacity.
FFI buffers are included in this estimation.
- Parameters:
- unit{‘b’, ‘kb’, ‘mb’, ‘gb’, ‘tb’}
Scale the returned size to the given unit.
Examples
>>> s = pl.Series("values", list(range(1_000_000)), dtype=pl.UInt32) >>> s.estimated_size() 4000000 >>> s.estimated_size("mb") 3.814697265625
- ewm_mean(
- com: float | None = None,
- span: float | None = None,
- half_life: float | None = None,
- alpha: float | None = None,
- *,
- adjust: bool = True,
- min_periods: int = 1,
- ignore_nulls: bool | None = None,
Exponentially-weighted moving average.
- Parameters:
- com
Specify decay in terms of center of mass, \(\gamma\), with
\[\alpha = \frac{1}{1 + \gamma} \; \forall \; \gamma \geq 0\]- span
Specify decay in terms of span, \(\theta\), with
\[\alpha = \frac{2}{\theta + 1} \; \forall \; \theta \geq 1\]- half_life
Specify decay in terms of half-life, \(\lambda\), with
\[\alpha = 1 - \exp \left\{ \frac{ -\ln(2) }{ \lambda } \right\} \; \forall \; \lambda > 0\]- alpha
Specify smoothing factor alpha directly, \(0 < \alpha \leq 1\).
- adjust
Divide by decaying adjustment factor in beginning periods to account for imbalance in relative weightings
When
adjust=True
(the default) the EW function is calculated using weights \(w_i = (1 - \alpha)^i\)When
adjust=False
the EW function is calculated recursively by\[\begin{split}y_0 &= x_0 \\ y_t &= (1 - \alpha)y_{t - 1} + \alpha x_t\end{split}\]
- min_periods
Minimum number of observations in window required to have a value (otherwise result is null).
- ignore_nulls
Ignore missing values when calculating weights.
When
ignore_nulls=False
, weights are based on absolute positions. For example, the weights of \(x_0\) and \(x_2\) used in calculating the final weighted average of [\(x_0\), None, \(x_2\)] are \((1-\alpha)^2\) and \(1\) ifadjust=True
, and \((1-\alpha)^2\) and \(\alpha\) ifadjust=False
.When
ignore_nulls=True
(current default), weights are based on relative positions. For example, the weights of \(x_0\) and \(x_2\) used in calculating the final weighted average of [\(x_0\), None, \(x_2\)] are \(1-\alpha\) and \(1\) ifadjust=True
, and \(1-\alpha\) and \(\alpha\) ifadjust=False
.
Examples
>>> s = pl.Series([1, 2, 3]) >>> s.ewm_mean(com=1, ignore_nulls=False) shape: (3,) Series: '' [f64] [ 1.0 1.666667 2.428571 ]
- ewm_mean_by(by: IntoExpr, *, half_life: str | timedelta) Series [source]
Calculate time-based exponentially weighted moving average.
Given observations \(x_1, x_2, \ldots, x_n\) at times \(t_1, t_2, \ldots, t_n\), the EWMA is calculated as
\[ \begin{align}\begin{aligned}y_0 &= x_0\\\alpha_i &= \exp(-\lambda(t_i - t_{i-1}))\\y_i &= \alpha_i x_i + (1 - \alpha_i) y_{i-1}; \quad i > 0\end{aligned}\end{align} \]where \(\lambda\) equals \(\ln(2) / \text{half_life}\).
- Parameters:
- by
Times to calculate average by. Should be
DateTime
,Date
,UInt64
,UInt32
,Int64
, orInt32
data type.- half_life
Unit over which observation decays to half its value.
Can be created either from a timedelta, or by using the following string language:
1ns (1 nanosecond)
1us (1 microsecond)
1ms (1 millisecond)
1s (1 second)
1m (1 minute)
1h (1 hour)
1d (1 day)
1w (1 week)
1i (1 index count)
Or combine them: “3d12h4m25s” # 3 days, 12 hours, 4 minutes, and 25 seconds
Note that
half_life
is treated as a constant duration - calendar durations such as months (or even days in the time-zone-aware case) are not supported, please express your duration in an approximately equivalent number of hours (e.g. ‘370h’ instead of ‘1mo’).- check_sorted
Check whether
by
column is sorted. Incorrectly setting this toFalse
will lead to incorrect output.
- Returns:
- Expr
Float32 if input is Float32, otherwise Float64.
Examples
>>> from datetime import date, timedelta >>> df = pl.DataFrame( ... { ... "values": [0, 1, 2, None, 4], ... "times": [ ... date(2020, 1, 1), ... date(2020, 1, 3), ... date(2020, 1, 10), ... date(2020, 1, 15), ... date(2020, 1, 17), ... ], ... } ... ).sort("times") >>> df["values"].ewm_mean_by(df["times"], half_life="4d") shape: (5,) Series: 'values' [f64] [ 0.0 0.292893 1.492474 null 3.254508 ]
- ewm_std(
- com: float | None = None,
- span: float | None = None,
- half_life: float | None = None,
- alpha: float | None = None,
- *,
- adjust: bool = True,
- bias: bool = False,
- min_periods: int = 1,
- ignore_nulls: bool | None = None,
Exponentially-weighted moving standard deviation.
- Parameters:
- com
Specify decay in terms of center of mass, \(\gamma\), with
\[\alpha = \frac{1}{1 + \gamma} \; \forall \; \gamma \geq 0\]- span
Specify decay in terms of span, \(\theta\), with
\[\alpha = \frac{2}{\theta + 1} \; \forall \; \theta \geq 1\]- half_life
Specify decay in terms of half-life, \(\lambda\), with
\[\alpha = 1 - \exp \left\{ \frac{ -\ln(2) }{ \lambda } \right\} \; \forall \; \lambda > 0\]- alpha
Specify smoothing factor alpha directly, \(0 < \alpha \leq 1\).
- adjust
Divide by decaying adjustment factor in beginning periods to account for imbalance in relative weightings
When
adjust=True
(the default) the EW function is calculated using weights \(w_i = (1 - \alpha)^i\)When
adjust=False
the EW function is calculated recursively by\[\begin{split}y_0 &= x_0 \\ y_t &= (1 - \alpha)y_{t - 1} + \alpha x_t\end{split}\]
- bias
When
bias=False
, apply a correction to make the estimate statistically unbiased.- min_periods
Minimum number of observations in window required to have a value (otherwise result is null).
- ignore_nulls
Ignore missing values when calculating weights.
When
ignore_nulls=False
, weights are based on absolute positions. For example, the weights of \(x_0\) and \(x_2\) used in calculating the final weighted average of [\(x_0\), None, \(x_2\)] are \((1-\alpha)^2\) and \(1\) ifadjust=True
, and \((1-\alpha)^2\) and \(\alpha\) ifadjust=False
.When
ignore_nulls=True
(current default), weights are based on relative positions. For example, the weights of \(x_0\) and \(x_2\) used in calculating the final weighted average of [\(x_0\), None, \(x_2\)] are \(1-\alpha\) and \(1\) ifadjust=True
, and \(1-\alpha\) and \(\alpha\) ifadjust=False
.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.ewm_std(com=1, ignore_nulls=False) shape: (3,) Series: 'a' [f64] [ 0.0 0.707107 0.963624 ]
- ewm_var(
- com: float | None = None,
- span: float | None = None,
- half_life: float | None = None,
- alpha: float | None = None,
- *,
- adjust: bool = True,
- bias: bool = False,
- min_periods: int = 1,
- ignore_nulls: bool | None = None,
Exponentially-weighted moving variance.
- Parameters:
- com
Specify decay in terms of center of mass, \(\gamma\), with
\[\alpha = \frac{1}{1 + \gamma} \; \forall \; \gamma \geq 0\]- span
Specify decay in terms of span, \(\theta\), with
\[\alpha = \frac{2}{\theta + 1} \; \forall \; \theta \geq 1\]- half_life
Specify decay in terms of half-life, \(\lambda\), with
\[\alpha = 1 - \exp \left\{ \frac{ -\ln(2) }{ \lambda } \right\} \; \forall \; \lambda > 0\]- alpha
Specify smoothing factor alpha directly, \(0 < \alpha \leq 1\).
- adjust
Divide by decaying adjustment factor in beginning periods to account for imbalance in relative weightings
When
adjust=True
(the default) the EW function is calculated using weights \(w_i = (1 - \alpha)^i\)When
adjust=False
the EW function is calculated recursively by\[\begin{split}y_0 &= x_0 \\ y_t &= (1 - \alpha)y_{t - 1} + \alpha x_t\end{split}\]
- bias
When
bias=False
, apply a correction to make the estimate statistically unbiased.- min_periods
Minimum number of observations in window required to have a value (otherwise result is null).
- ignore_nulls
Ignore missing values when calculating weights.
When
ignore_nulls=False
, weights are based on absolute positions. For example, the weights of \(x_0\) and \(x_2\) used in calculating the final weighted average of [\(x_0\), None, \(x_2\)] are \((1-\alpha)^2\) and \(1\) ifadjust=True
, and \((1-\alpha)^2\) and \(\alpha\) ifadjust=False
.When
ignore_nulls=True
(current default), weights are based on relative positions. For example, the weights of \(x_0\) and \(x_2\) used in calculating the final weighted average of [\(x_0\), None, \(x_2\)] are \(1-\alpha\) and \(1\) ifadjust=True
, and \(1-\alpha\) and \(\alpha\) ifadjust=False
.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.ewm_var(com=1, ignore_nulls=False) shape: (3,) Series: 'a' [f64] [ 0.0 0.5 0.928571 ]
- exp() Series [source]
Compute the exponential, element-wise.
Examples
>>> s = pl.Series([1, 2, 3]) >>> s.exp() shape: (3,) Series: '' [f64] [ 2.718282 7.389056 20.085537 ]
- explode() Series [source]
Explode a list Series.
This means that every item is expanded to a new row.
- Returns:
- Series
Series with the data type of the list elements.
See also
Series.list.explode
Explode a list column.
Examples
>>> s = pl.Series("a", [[1, 2, 3], [4, 5, 6]]) >>> s shape: (2,) Series: 'a' [list[i64]] [ [1, 2, 3] [4, 5, 6] ] >>> s.explode() shape: (6,) Series: 'a' [i64] [ 1 2 3 4 5 6 ]
- extend(other: Series) Self [source]
Extend the memory backed by this Series with the values from another.
Different from
append
, which adds the chunks fromother
to the chunks of this series,extend
appends the data fromother
to the underlying memory locations and thus may cause a reallocation (which is expensive).If this does
not
cause a reallocation, the resulting data structure will not have any extra chunks and thus will yield faster queries.Prefer
extend
overappend
when you want to do a query after a single append. For instance, during online operations where you addn
rows and rerun a query.Prefer
append
overextend
when you want to append many times before doing a query. For instance, when you read in multiple files and want to store them in a singleSeries
. In the latter case, finish the sequence ofappend
operations with arechunk
.- Parameters:
- other
Series to extend the series with.
Warning
This method modifies the series in-place. The series is returned for convenience only.
See also
Examples
>>> a = pl.Series("a", [1, 2, 3]) >>> b = pl.Series("b", [4, 5]) >>> a.extend(b) shape: (5,) Series: 'a' [i64] [ 1 2 3 4 5 ]
The resulting series will consist of a single chunk.
>>> a.n_chunks() 1
- extend_constant(value: IntoExpr, n: int | IntoExprColumn) Series [source]
Extremely fast method for extending the Series with ‘n’ copies of a value.
- Parameters:
- value
A constant literal value or a unit expressioin with which to extend the expression result Series; can pass None to extend with nulls.
- n
The number of additional values that will be added.
Examples
>>> s = pl.Series([1, 2, 3]) >>> s.extend_constant(99, n=2) shape: (5,) Series: '' [i64] [ 1 2 3 99 99 ]
- fill_nan(value: int | float | Expr | None) Series [source]
Fill floating point NaN value with a fill value.
- Parameters:
- value
Value used to fill NaN values.
Warning
Note that floating point NaNs (Not a Number) are not missing values. To replace missing values, use
fill_null()
.See also
Examples
>>> s = pl.Series("a", [1, 2, 3, float("nan")]) >>> s.fill_nan(0) shape: (4,) Series: 'a' [f64] [ 1.0 2.0 3.0 0.0 ]
- fill_null(
- value: Any | Expr | None = None,
- strategy: FillNullStrategy | None = None,
- limit: int | None = None,
Fill null values using the specified value or strategy.
- Parameters:
- value
Value used to fill null values.
- strategy{None, ‘forward’, ‘backward’, ‘min’, ‘max’, ‘mean’, ‘zero’, ‘one’}
Strategy used to fill null values.
- limit
Number of consecutive null values to fill when using the ‘forward’ or ‘backward’ strategy.
See also
Examples
>>> s = pl.Series("a", [1, 2, 3, None]) >>> s.fill_null(strategy="forward") shape: (4,) Series: 'a' [i64] [ 1 2 3 3 ] >>> s.fill_null(strategy="min") shape: (4,) Series: 'a' [i64] [ 1 2 3 1 ] >>> s = pl.Series("b", ["x", None, "z"]) >>> s.fill_null(pl.lit("")) shape: (3,) Series: 'b' [str] [ "x" "" "z" ]
- filter(predicate: Series | list[bool]) Self [source]
Filter elements by a boolean mask.
The original order of the remaining elements is preserved.
Elements where the filter does not evaluate to True are discarded, including nulls.
- Parameters:
- predicate
Boolean mask.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> mask = pl.Series("", [True, False, True]) >>> s.filter(mask) shape: (2,) Series: 'a' [i64] [ 1 3 ]
- property flags: dict[str, bool][source]
Get flags that are set on the Series.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.flags {'SORTED_ASC': False, 'SORTED_DESC': False}
- floor() Series [source]
Rounds down to the nearest integer value.
Only works on floating point Series.
Examples
>>> s = pl.Series("a", [1.12345, 2.56789, 3.901234]) >>> s.floor() shape: (3,) Series: 'a' [f64] [ 1.0 2.0 3.0 ]
- gather(indices: int | list[int] | Expr | Series | np.ndarray[Any, Any]) Series [source]
Take values by index.
- Parameters:
- indices
Index location used for selection.
Examples
>>> s = pl.Series("a", [1, 2, 3, 4]) >>> s.gather([1, 3]) shape: (2,) Series: 'a' [i64] [ 2 4 ]
- gather_every(n: int, offset: int = 0) Series [source]
Take every nth value in the Series and return as new Series.
- Parameters:
- n
Gather every n-th row.
- offset
Start the row index at this offset.
Examples
>>> s = pl.Series("a", [1, 2, 3, 4]) >>> s.gather_every(2) shape: (2,) Series: 'a' [i64] [ 1 3 ] >>> s.gather_every(2, offset=1) shape: (2,) Series: 'a' [i64] [ 2 4 ]
- ge(other: Any) Series | Expr [source]
Method equivalent of operator expression
series >= other
.
- get_chunks() list[Series] [source]
Get the chunks of this Series as a list of Series.
Examples
>>> s1 = pl.Series("a", [1, 2, 3]) >>> s2 = pl.Series("a", [4, 5, 6]) >>> s = pl.concat([s1, s2], rechunk=False) >>> s.get_chunks() [shape: (3,) Series: 'a' [i64] [ 1 2 3 ], shape: (3,) Series: 'a' [i64] [ 4 5 6 ]]
- gt(other: Any) Series | Expr [source]
Method equivalent of operator expression
series > other
.
- has_nulls() bool [source]
Check whether the Series contains one or more null values.
Examples
>>> s = pl.Series([1, 2, None]) >>> s.has_nulls() True >>> s[:2].has_nulls() False
- has_validity() bool [source]
Return True if the Series has a validity bitmask.
Deprecated since version 0.20.30: Use
has_nulls()
instead.If there is no mask, it means that there are no
null
values.Notes
While the absence of a validity bitmask guarantees that a Series does not have
null
values, the converse is not true, eg: the presence of a bitmask does not mean that there are null values, as every value of the bitmask could befalse
.To confirm that a column has
null
values usehas_nulls()
.
- hash( ) Series [source]
Hash the Series.
The hash value is of type
UInt64
.- Parameters:
- seed
Random seed parameter. Defaults to 0.
- seed_1
Random seed parameter. Defaults to
seed
if not set.- seed_2
Random seed parameter. Defaults to
seed
if not set.- seed_3
Random seed parameter. Defaults to
seed
if not set.
Notes
This implementation of
hash
does not guarantee stable results across different Polars versions. Its stability is only guaranteed within a single version.Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.hash(seed=42) shape: (3,) Series: 'a' [u64] [ 10734580197236529959 3022416320763508302 13756996518000038261 ]
- head(n: int = 10) Series [source]
Get the first
n
elements.- Parameters:
- n
Number of elements to return. If a negative value is passed, return all elements except the last
abs(n)
.
Examples
>>> s = pl.Series("a", [1, 2, 3, 4, 5]) >>> s.head(3) shape: (3,) Series: 'a' [i64] [ 1 2 3 ]
Pass a negative value to get all rows
except
the lastabs(n)
.>>> s.head(-3) shape: (2,) Series: 'a' [i64] [ 1 2 ]
- hist(
- bins: list[float] | None = None,
- *,
- bin_count: int | None = None,
- include_category: bool = True,
- include_breakpoint: bool = True,
Bin values into buckets and count their occurrences.
Warning
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
- Parameters:
- bins
Discretizations to make. If None given, we determine the boundaries based on the data.
- bin_count
If no bins provided, this will be used to determine the distance of the bins
- include_breakpoint
Include a column that indicates the upper breakpoint.
- include_category
Include a column that shows the intervals as categories.
- Returns:
- DataFrame
Examples
>>> a = pl.Series("a", [1, 3, 8, 8, 2, 1, 3]) >>> a.hist(bin_count=4) shape: (5, 3) ┌─────────────┬─────────────┬───────┐ │ break_point ┆ category ┆ count │ │ --- ┆ --- ┆ --- │ │ f64 ┆ cat ┆ u32 │ ╞═════════════╪═════════════╪═══════╡ │ 0.0 ┆ (-inf, 0.0] ┆ 0 │ │ 2.25 ┆ (0.0, 2.25] ┆ 3 │ │ 4.5 ┆ (2.25, 4.5] ┆ 2 │ │ 6.75 ┆ (4.5, 6.75] ┆ 0 │ │ inf ┆ (6.75, inf] ┆ 2 │ └─────────────┴─────────────┴───────┘
- implode() Self [source]
Aggregate values into a list.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.implode() shape: (1,) Series: 'a' [list[i64]] [ [1, 2, 3] ]
- interpolate(method: InterpolationMethod = 'linear') Series [source]
Fill null values using interpolation.
- Parameters:
- method{‘linear’, ‘nearest’}
Interpolation method.
Examples
>>> s = pl.Series("a", [1, 2, None, None, 5]) >>> s.interpolate() shape: (5,) Series: 'a' [f64] [ 1.0 2.0 3.0 4.0 5.0 ]
- interpolate_by(by: IntoExpr) Series [source]
Fill null values using interpolation based on another column.
- Parameters:
- by
Column to interpolate values based on.
Examples
Fill null values using linear interpolation.
>>> s = pl.Series([1, None, None, 3]) >>> by = pl.Series([1, 2, 7, 8]) >>> s.interpolate_by(by) shape: (4,) Series: '' [f64] [ 1.0 1.285714 2.714286 3.0 ]
- is_between(
- lower_bound: IntoExpr,
- upper_bound: IntoExpr,
- closed: ClosedInterval = 'both',
Get a boolean mask of the values that are between the given lower/upper bounds.
- Parameters:
- lower_bound
Lower bound value. Accepts expression input. Non-expression inputs (including strings) are parsed as literals.
- upper_bound
Upper bound value. Accepts expression input. Non-expression inputs (including strings) are parsed as literals.
- closed{‘both’, ‘left’, ‘right’, ‘none’}
Define which sides of the interval are closed (inclusive).
Notes
If the value of the
lower_bound
is greater than that of theupper_bound
then the result will be False, as no value can satisfy the condition.Examples
>>> s = pl.Series("num", [1, 2, 3, 4, 5]) >>> s.is_between(2, 4) shape: (5,) Series: 'num' [bool] [ false true true true false ]
Use the
closed
argument to include or exclude the values at the bounds:>>> s.is_between(2, 4, closed="left") shape: (5,) Series: 'num' [bool] [ false true true false false ]
You can also use strings as well as numeric/temporal values:
>>> s = pl.Series("s", ["a", "b", "c", "d", "e"]) >>> s.is_between("b", "d", closed="both") shape: (5,) Series: 's' [bool] [ false true true true false ]
- is_boolean() bool [source]
Check if this Series is a Boolean.
Deprecated since version 0.19.14: Use
Series.dtype == pl.Boolean
instead.Examples
>>> s = pl.Series("a", [True, False, True]) >>> s.is_boolean() True
- is_duplicated() Series [source]
Get mask of all duplicated values.
- Returns:
- Series
Series of data type
Boolean
.
Examples
>>> s = pl.Series("a", [1, 2, 2, 3]) >>> s.is_duplicated() shape: (4,) Series: 'a' [bool] [ false true true false ]
- is_empty() bool [source]
Check if the Series is empty.
Examples
>>> s = pl.Series("a", [], dtype=pl.Float32) >>> s.is_empty() True
- is_finite() Series [source]
Returns a boolean Series indicating which values are finite.
- Returns:
- Series
Series of data type
Boolean
.
Examples
>>> import numpy as np >>> s = pl.Series("a", [1.0, 2.0, np.inf]) >>> s.is_finite() shape: (3,) Series: 'a' [bool] [ true true false ]
- is_first() Series [source]
Return a boolean mask indicating the first occurrence of each distinct value.
Deprecated since version 0.19.3: This method has been renamed to
Series.is_first_distinct()
.- Returns:
- Series
Series of data type
Boolean
.
- is_first_distinct() Series [source]
Return a boolean mask indicating the first occurrence of each distinct value.
- Returns:
- Series
Series of data type
Boolean
.
Examples
>>> s = pl.Series([1, 1, 2, 3, 2]) >>> s.is_first_distinct() shape: (5,) Series: '' [bool] [ true false true true false ]
- is_float() bool [source]
Check if this Series has floating point numbers.
Deprecated since version 0.19.13: Use
Series.dtype.is_float()
instead.Examples
>>> s = pl.Series("a", [1.0, 2.0, 3.0]) >>> s.is_float() True
- is_in(
- other: Series | Collection[Any],
Check if elements of this Series are in the other Series.
- Returns:
- Series
Series of data type
Boolean
.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s2 = pl.Series("b", [2, 4]) >>> s2.is_in(s) shape: (2,) Series: 'b' [bool] [ true false ]
>>> # check if some values are a member of sublists >>> sets = pl.Series("sets", [[1, 2, 3], [1, 2], [9, 10]]) >>> optional_members = pl.Series("optional_members", [1, 2, 3]) >>> print(sets) shape: (3,) Series: 'sets' [list[i64]] [ [1, 2, 3] [1, 2] [9, 10] ] >>> print(optional_members) shape: (3,) Series: 'optional_members' [i64] [ 1 2 3 ] >>> optional_members.is_in(sets) shape: (3,) Series: 'optional_members' [bool] [ true true false ]
- is_infinite() Series [source]
Returns a boolean Series indicating which values are infinite.
- Returns:
- Series
Series of data type
Boolean
.
Examples
>>> import numpy as np >>> s = pl.Series("a", [1.0, 2.0, np.inf]) >>> s.is_infinite() shape: (3,) Series: 'a' [bool] [ false false true ]
- is_integer(signed: bool | None = None) bool [source]
Check if this Series datatype is an integer (signed or unsigned).
Deprecated since version 0.19.13: Use
Series.dtype.is_integer()
instead. For signed/unsigned variants, useSeries.dtype.is_signed_integer()
orSeries.dtype.is_unsigned_integer()
.- Parameters:
- signed
if
None
, both signed and unsigned integer dtypes will match.if
True
, only signed integer dtypes will be considered a match.if
False
, only unsigned integer dtypes will be considered a match.
Examples
>>> s = pl.Series("a", [1, 2, 3], dtype=pl.UInt32) >>> s.is_integer() True >>> s.is_integer(signed=False) True >>> s.is_integer(signed=True) False
- is_last() Series [source]
Return a boolean mask indicating the last occurrence of each distinct value.
Deprecated since version 0.19.3: This method has been renamed to
Series.is_last_distinct()
.- Returns:
- Series
Series of data type
Boolean
.
- is_last_distinct() Series [source]
Return a boolean mask indicating the last occurrence of each distinct value.
- Returns:
- Series
Series of data type
Boolean
.
Examples
>>> s = pl.Series([1, 1, 2, 3, 2]) >>> s.is_last_distinct() shape: (5,) Series: '' [bool] [ false true false true true ]
- is_nan() Series [source]
Returns a boolean Series indicating which values are not NaN.
- Returns:
- Series
Series of data type
Boolean
.
Examples
>>> import numpy as np >>> s = pl.Series("a", [1.0, 2.0, 3.0, np.nan]) >>> s.is_nan() shape: (4,) Series: 'a' [bool] [ false false false true ]
- is_not_nan() Series [source]
Returns a boolean Series indicating which values are not NaN.
- Returns:
- Series
Series of data type
Boolean
.
Examples
>>> import numpy as np >>> s = pl.Series("a", [1.0, 2.0, 3.0, np.nan]) >>> s.is_not_nan() shape: (4,) Series: 'a' [bool] [ true true true false ]
- is_not_null() Series [source]
Returns a boolean Series indicating which values are not null.
- Returns:
- Series
Series of data type
Boolean
.
Examples
>>> s = pl.Series("a", [1.0, 2.0, 3.0, None]) >>> s.is_not_null() shape: (4,) Series: 'a' [bool] [ true true true false ]
- is_null() Series [source]
Returns a boolean Series indicating which values are null.
- Returns:
- Series
Series of data type
Boolean
.
Examples
>>> s = pl.Series("a", [1.0, 2.0, 3.0, None]) >>> s.is_null() shape: (4,) Series: 'a' [bool] [ false false false true ]
- is_numeric() bool [source]
Check if this Series datatype is numeric.
Deprecated since version 0.19.13: Use
Series.dtype.is_numeric()
instead.Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.is_numeric() True
- is_sorted(*, descending: bool = False, nulls_last: bool = False) bool [source]
Check if the Series is sorted.
- Parameters:
- descending
Check if the Series is sorted in descending order
- nulls_last
Set nulls at the end of the Series in sorted check.
Examples
>>> s = pl.Series([1, 3, 2]) >>> s.is_sorted() False
>>> s = pl.Series([3, 2, 1]) >>> s.is_sorted(descending=True) True
- is_temporal(excluding: OneOrMoreDataTypes | None = None) bool [source]
Check if this Series datatype is temporal.
Deprecated since version 0.19.13: Use
Series.dtype.is_temporal()
instead.- Parameters:
- excluding
Optionally exclude one or more temporal dtypes from matching.
Examples
>>> from datetime import date >>> s = pl.Series([date(2021, 1, 1), date(2021, 1, 2), date(2021, 1, 3)]) >>> s.is_temporal() True >>> s.is_temporal(excluding=[pl.Date]) False
- is_unique() Series [source]
Get mask of all unique values.
- Returns:
- Series
Series of data type
Boolean
.
Examples
>>> s = pl.Series("a", [1, 2, 2, 3]) >>> s.is_unique() shape: (4,) Series: 'a' [bool] [ true false false true ]
- is_utf8() bool [source]
Check if this Series datatype is a String.
Deprecated since version 0.19.14: Use
Series.dtype == pl.String
instead.Examples
>>> s = pl.Series("x", ["a", "b", "c"]) >>> s.is_utf8() True
- item(index: int | None = None) Any [source]
Return the Series as a scalar, or return the element at the given index.
If no index is provided, this is equivalent to
s[0]
, with a check that the shape is (1,). With an index, this is equivalent tos[index]
.Examples
>>> s1 = pl.Series("a", [1]) >>> s1.item() 1 >>> s2 = pl.Series("a", [9, 8, 7]) >>> s2.cum_sum().item(-1) 24
- kurtosis(*, fisher: bool = True, bias: bool = True) float | None [source]
Compute the kurtosis (Fisher or Pearson) of a dataset.
Kurtosis is the fourth central moment divided by the square of the variance. If Fisher’s definition is used, then 3.0 is subtracted from the result to give 0.0 for a normal distribution. If bias is False then the kurtosis is calculated using k statistics to eliminate bias coming from biased moment estimators
See scipy.stats for more information
- Parameters:
- fisherbool, optional
If True, Fisher’s definition is used (normal ==> 0.0). If False, Pearson’s definition is used (normal ==> 3.0).
- biasbool, optional
If False, the calculations are corrected for statistical bias.
Examples
>>> s = pl.Series("grades", [66, 79, 54, 97, 96, 70, 69, 85, 93, 75]) >>> s.kurtosis() -1.0522623626787952 >>> s.kurtosis(fisher=False) 1.9477376373212048 >>> s.kurtosis(fisher=False, bias=False) 2.1040361802642726
- le(other: Any) Series | Expr [source]
Method equivalent of operator expression
series <= other
.
- len() int [source]
Return the number of elements in the Series.
Null values count towards the total.
See also
Examples
>>> s = pl.Series("a", [1, 2, None]) >>> s.len() 3
- limit(n: int = 10) Series [source]
Get the first
n
elements.Alias for
Series.head()
.- Parameters:
- n
Number of elements to return. If a negative value is passed, return all elements except the last
abs(n)
.
See also
Examples
>>> s = pl.Series("a", [1, 2, 3, 4, 5]) >>> s.limit(3) shape: (3,) Series: 'a' [i64] [ 1 2 3 ]
Pass a negative value to get all rows
except
the lastabs(n)
.>>> s.limit(-3) shape: (2,) Series: 'a' [i64] [ 1 2 ]
- log(base: float = 2.718281828459045) Series [source]
Compute the logarithm to a given base.
Examples
>>> s = pl.Series([1, 2, 3]) >>> s.log() shape: (3,) Series: '' [f64] [ 0.0 0.693147 1.098612 ]
- log10() Series [source]
Compute the base 10 logarithm of the input array, element-wise.
Examples
>>> s = pl.Series([10, 100, 1000]) >>> s.log10() shape: (3,) Series: '' [f64] [ 1.0 2.0 3.0 ]
- log1p() Series [source]
Compute the natural logarithm of the input array plus one, element-wise.
Examples
>>> s = pl.Series([1, 2, 3]) >>> s.log1p() shape: (3,) Series: '' [f64] [ 0.693147 1.098612 1.386294 ]
- lower_bound() Self [source]
Return the lower bound of this Series’ dtype as a unit Series.
See also
upper_bound
return the upper bound of the given Series’ dtype.
Examples
>>> s = pl.Series("s", [-1, 0, 1], dtype=pl.Int32) >>> s.lower_bound() shape: (1,) Series: 's' [i32] [ -2147483648 ]
>>> s = pl.Series("s", [1.0, 2.5, 3.0], dtype=pl.Float32) >>> s.lower_bound() shape: (1,) Series: 's' [f32] [ -inf ]
- lt(other: Any) Series | Expr [source]
Method equivalent of operator expression
series < other
.
- map_dict( ) Self [source]
Replace values in the Series using a remapping dictionary.
Deprecated since version 0.19.16: This method has been renamed to
replace()
. The default behavior has changed to keep any values not present in the mapping unchanged. Passdefault=None
to keep existing behavior.- Parameters:
- mapping
Dictionary containing the before/after values to map.
- default
Value to use when the remapping dict does not contain the lookup value. Use
pl.first()
, to keep the original value.- return_dtype
Set return dtype to override automatic return dtype determination.
- map_elements(
- function: Callable[[Any], Any],
- return_dtype: PolarsDataType | None = None,
- *,
- skip_nulls: bool = True,
Map a custom/user-defined function (UDF) over elements in this Series.
Warning
This method is much slower than the native expressions API. Only use it if you cannot implement your logic otherwise.
Suppose that the function is:
x ↦ sqrt(x)
:For mapping elements of a series, consider:
s.sqrt()
.For mapping inner elements of lists, consider:
s.list.eval(pl.element().sqrt())
.
If the function returns a different datatype, the return_dtype arg should be set, otherwise the method will fail.
Implementing logic using a Python function is almost always significantly slower and more memory intensive than implementing the same logic using the native expression API because:
The native expression engine runs in Rust; UDFs run in Python.
Use of Python UDFs forces the DataFrame to be materialized in memory.
Polars-native expressions can be parallelised (UDFs typically cannot).
Polars-native expressions can be logically optimised (UDFs cannot).
Wherever possible you should strongly prefer the native expression API to achieve the best performance.
- Parameters:
- function
Custom function or lambda.
- return_dtype
Output datatype. If not set, the dtype will be inferred based on the first non-null value that is returned by the function.
- skip_nulls
Nulls will be skipped and not passed to the python function. This is faster because python can be skipped and because we call more specialized functions.
- Returns:
- Series
Warning
If
return_dtype
is not provided, this may lead to unexpected results. We allow this, but it is considered a bug in the user’s query.Notes
If your function is expensive and you don’t want it to be called more than once for a given input, consider applying an
@lru_cache
decorator to it. If your data is suitable you may achieve significant speedups.Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.map_elements(lambda x: x + 10, return_dtype=pl.Int64) shape: (3,) Series: 'a' [i64] [ 11 12 13 ]
- max() PythonLiteral | None [source]
Get the maximum value in this Series.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.max() 3
- mean() PythonLiteral | None [source]
Reduce this Series to the mean value.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.mean() 2.0
- median() PythonLiteral | None [source]
Get the median of this Series.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.median() 2.0
- min() PythonLiteral | None [source]
Get the minimal value in this Series.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.min() 1
- mode() Series [source]
Compute the most occurring value(s).
Can return multiple Values.
Examples
>>> s = pl.Series("a", [1, 2, 2, 3]) >>> s.mode() shape: (1,) Series: 'a' [i64] [ 2 ]
- n_chunks() int [source]
Get the number of chunks that this Series contains.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.n_chunks() 1 >>> s2 = pl.Series("a", [4, 5, 6])
Concatenate Series with rechunk = True
>>> pl.concat([s, s2], rechunk=True).n_chunks() 1
Concatenate Series with rechunk = False
>>> pl.concat([s, s2], rechunk=False).n_chunks() 2
- n_unique() int [source]
Count the number of unique values in this Series.
Examples
>>> s = pl.Series("a", [1, 2, 2, 3]) >>> s.n_unique() 3
- property name: str[source]
Get the name of this Series.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.name 'a'
- nan_max() int | float | date | datetime | timedelta | str [source]
Get maximum value, but propagate/poison encountered NaN values.
This differs from numpy’s
nanmax
as numpy defaults to propagating NaN values, whereas polars defaults to ignoring them.Examples
>>> s = pl.Series("a", [1, 3, 4]) >>> s.nan_max() 4
>>> s = pl.Series("a", [1, float("nan"), 4]) >>> s.nan_max() nan
- nan_min() int | float | date | datetime | timedelta | str [source]
Get minimum value, but propagate/poison encountered NaN values.
This differs from numpy’s
nanmax
as numpy defaults to propagating NaN values, whereas polars defaults to ignoring them.Examples
>>> s = pl.Series("a", [1, 3, 4]) >>> s.nan_min() 1
>>> s = pl.Series("a", [1, float("nan"), 4]) >>> s.nan_min() nan
- ne(other: Any) Series | Expr [source]
Method equivalent of operator expression
series != other
.
- ne_missing(other: Any) Series | Expr [source]
Method equivalent of equality operator
series != other
whereNone == None
.This differs from the standard
ne
where null values are propagated.- Parameters:
- other
A literal or expression value to compare with.
See also
eq_missing
ne
Examples
>>> s1 = pl.Series("a", [333, 200, None]) >>> s2 = pl.Series("a", [100, 200, None]) >>> s1.ne(s2) shape: (3,) Series: 'a' [bool] [ true false null ] >>> s1.ne_missing(s2) shape: (3,) Series: 'a' [bool] [ true false false ]
- new_from_index(index: int, length: int) Self [source]
Create a new Series filled with values from the given index.
Examples
>>> s = pl.Series("a", [1, 2, 3, 4, 5]) >>> s.new_from_index(1, 3) shape: (3,) Series: 'a' [i64] [ 2 2 2 ]
- not_() Series [source]
Negate a boolean Series.
- Returns:
- Series
Series of data type
Boolean
.
Examples
>>> s = pl.Series("a", [True, False, False]) >>> s.not_() shape: (3,) Series: 'a' [bool] [ false true true ]
- null_count() int [source]
Count the null values in this Series.
Examples
>>> s = pl.Series([1, None, None]) >>> s.null_count() 2
- pct_change(n: int | IntoExprColumn = 1) Series [source]
Computes percentage change between values.
Percentage change (as fraction) between current element and most-recent non-null element at least
n
period(s) before the current element.Computes the change from the previous row by default.
- Parameters:
- n
periods to shift for forming percent change.
Examples
>>> pl.Series(range(10)).pct_change() shape: (10,) Series: '' [f64] [ null inf 1.0 0.5 0.333333 0.25 0.2 0.166667 0.142857 0.125 ]
>>> pl.Series([1, 2, 4, 8, 16, 32, 64, 128, 256, 512]).pct_change(2) shape: (10,) Series: '' [f64] [ null null 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 ]
- peak_max() Self [source]
Get a boolean mask of the local maximum peaks.
Examples
>>> s = pl.Series("a", [1, 2, 3, 4, 5]) >>> s.peak_max() shape: (5,) Series: 'a' [bool] [ false false false false true ]
- peak_min() Self [source]
Get a boolean mask of the local minimum peaks.
Examples
>>> s = pl.Series("a", [4, 1, 3, 2, 5]) >>> s.peak_min() shape: (5,) Series: 'a' [bool] [ false true false true false ]
- property plot: hvPlotTabularPolars[source]
Create a plot namespace.
Polars does not implement plotting logic itself, but instead defers to hvplot. Please see the hvplot reference gallery for more information and documentation.
Examples
Histogram:
>>> s = pl.Series("values", [1, 4, 2]) >>> s.plot.hist()
KDE plot (note: in addition to
hvplot
, this one also requiresscipy
):>>> s.plot.kde()
For more info on what you can pass, you can use
hvplot.help
:>>> import hvplot >>> hvplot.help("hist")
- pow( ) Series [source]
Raise to the power of the given exponent.
- Parameters:
- exponent
The exponent. Accepts Series input.
Examples
>>> s = pl.Series("foo", [1, 2, 3, 4]) >>> s.pow(3) shape: (4,) Series: 'foo' [i64] [ 1 8 27 64 ]
- product() int | float [source]
Reduce this Series to the product value.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.product() 6
- qcut(
- quantiles: Sequence[float] | int,
- *,
- labels: Sequence[str] | None = None,
- left_closed: bool = False,
- allow_duplicates: bool = False,
- include_breaks: bool = False,
- break_point_label: str = 'break_point',
- category_label: str = 'category',
- as_series: bool = True,
Bin continuous values into discrete categories based on their quantiles.
Warning
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
- Parameters:
- quantiles
Either a list of quantile probabilities between 0 and 1 or a positive integer determining the number of bins with uniform probability.
- labels
Names of the categories. The number of labels must be equal to the number of cut points plus one.
- left_closed
Set the intervals to be left-closed instead of right-closed.
- allow_duplicates
If set to
True
, duplicates in the resulting quantiles are dropped, rather than raising aDuplicateError
. This can happen even with unique probabilities, depending on the data.- include_breaks
Include a column with the right endpoint of the bin each observation falls in. This will change the data type of the output from a
Categorical
to aStruct
.- break_point_label
Name of the breakpoint column. Only used if
include_breaks
is set toTrue
.Deprecated since version 0.19.0: This parameter will be removed. Use
Series.struct.rename_fields
to rename the field instead.- category_label
Name of the category column. Only used if
include_breaks
is set toTrue
.Deprecated since version 0.19.0: This parameter will be removed. Use
Series.struct.rename_fields
to rename the field instead.- as_series
If set to
False
, return a DataFrame containing the original values, the breakpoints, and the categories.Deprecated since version 0.19.0: This parameter will be removed. The same behavior can be achieved by setting
include_breaks=True
, unnesting the resulting struct Series, and adding the result to the original Series.
- Returns:
- Series
Series of data type
Categorical
ifinclude_breaks
is set toFalse
(default), otherwise a Series of data typeStruct
.
See also
Examples
Divide a column into three categories according to pre-defined quantile probabilities.
>>> s = pl.Series("foo", [-2, -1, 0, 1, 2]) >>> s.qcut([0.25, 0.75], labels=["a", "b", "c"]) shape: (5,) Series: 'foo' [cat] [ "a" "a" "b" "b" "c" ]
Divide a column into two categories using uniform quantile probabilities.
>>> s.qcut(2, labels=["low", "high"], left_closed=True) shape: (5,) Series: 'foo' [cat] [ "low" "low" "high" "high" "high" ]
Create a DataFrame with the breakpoint and category for each value.
>>> cut = s.qcut([0.25, 0.75], include_breaks=True).alias("cut") >>> s.to_frame().with_columns(cut).unnest("cut") shape: (5, 3) ┌─────┬─────────────┬────────────┐ │ foo ┆ break_point ┆ category │ │ --- ┆ --- ┆ --- │ │ i64 ┆ f64 ┆ cat │ ╞═════╪═════════════╪════════════╡ │ -2 ┆ -1.0 ┆ (-inf, -1] │ │ -1 ┆ -1.0 ┆ (-inf, -1] │ │ 0 ┆ 1.0 ┆ (-1, 1] │ │ 1 ┆ 1.0 ┆ (-1, 1] │ │ 2 ┆ inf ┆ (1, inf] │ └─────┴─────────────┴────────────┘
- quantile(
- quantile: float,
- interpolation: RollingInterpolationMethod = 'nearest',
Get the quantile value of this Series.
- Parameters:
- quantile
Quantile between 0.0 and 1.0.
- interpolation{‘nearest’, ‘higher’, ‘lower’, ‘midpoint’, ‘linear’}
Interpolation method.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.quantile(0.5) 2.0
- rank( ) Series [source]
Assign ranks to data, dealing with ties appropriately.
- Parameters:
- method{‘average’, ‘min’, ‘max’, ‘dense’, ‘ordinal’, ‘random’}
The method used to assign ranks to tied elements. The following methods are available (default is ‘average’):
‘average’ : The average of the ranks that would have been assigned to all the tied values is assigned to each value.
‘min’ : The minimum of the ranks that would have been assigned to all the tied values is assigned to each value. (This is also referred to as “competition” ranking.)
‘max’ : The maximum of the ranks that would have been assigned to all the tied values is assigned to each value.
‘dense’ : Like ‘min’, but the rank of the next highest element is assigned the rank immediately after those assigned to the tied elements.
‘ordinal’ : All values are given a distinct rank, corresponding to the order that the values occur in the Series.
‘random’ : Like ‘ordinal’, but the rank for ties is not dependent on the order that the values occur in the Series.
- descending
Rank in descending order.
- seed
If
method="random"
, use this as seed.
Examples
The ‘average’ method:
>>> s = pl.Series("a", [3, 6, 1, 1, 6]) >>> s.rank() shape: (5,) Series: 'a' [f64] [ 3.0 4.5 1.5 1.5 4.5 ]
The ‘ordinal’ method:
>>> s = pl.Series("a", [3, 6, 1, 1, 6]) >>> s.rank("ordinal") shape: (5,) Series: 'a' [u32] [ 3 4 1 2 5 ]
- rechunk(*, in_place: bool = False) Self [source]
Create a single chunk of memory for this Series.
- Parameters:
- in_place
In place or not.
Examples
>>> s1 = pl.Series("a", [1, 2, 3]) >>> s1.n_chunks() 1 >>> s2 = pl.Series("a", [4, 5, 6]) >>> s = pl.concat([s1, s2], rechunk=False) >>> s.n_chunks() 2 >>> s.rechunk(in_place=True) shape: (6,) Series: 'a' [i64] [ 1 2 3 4 5 6 ] >>> s.n_chunks() 1
- reinterpret(*, signed: bool = True) Series [source]
Reinterpret the underlying bits as a signed/unsigned integer.
This operation is only allowed for 64bit integers. For lower bits integers, you can safely use that cast operation.
- Parameters:
- signed
If True, reinterpret as
pl.Int64
. Otherwise, reinterpret aspl.UInt64
.
Examples
>>> s = pl.Series("a", [-(2**60), -2, 3]) >>> s shape: (3,) Series: 'a' [i64] [ -1152921504606846976 -2 3 ] >>> s.reinterpret(signed=False) shape: (3,) Series: 'a' [u64] [ 17293822569102704640 18446744073709551614 3 ]
- rename(name: str) Series [source]
Rename this Series.
Alias for
Series.alias()
.- Parameters:
- name
New name.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.rename("b") shape: (3,) Series: 'b' [i64] [ 1 2 3 ]
- replace(
- old: IntoExpr | Sequence[Any] | Mapping[Any, Any],
- new: IntoExpr | Sequence[Any] | NoDefault = _NoDefault.no_default,
- *,
- default: IntoExpr | NoDefault = _NoDefault.no_default,
- return_dtype: PolarsDataType | None = None,
Replace values by different values.
- Parameters:
- old
Value or sequence of values to replace. Also accepts a mapping of values to their replacement as syntactic sugar for
replace(old=Series(mapping.keys()), new=Series(mapping.values()))
.- new
Value or sequence of values to replace by. Length must match the length of
old
or have length 1.- default
Set values that were not replaced to this value. Defaults to keeping the original value. Accepts expression input. Non-expression inputs are parsed as literals.
- return_dtype
The data type of the resulting Series. If set to
None
(default), the data type is determined automatically based on the other inputs.
See also
Notes
The global string cache must be enabled when replacing categorical values.
Examples
Replace a single value by another value. Values that were not replaced remain unchanged.
>>> s = pl.Series([1, 2, 2, 3]) >>> s.replace(2, 100) shape: (4,) Series: '' [i64] [ 1 100 100 3 ]
Replace multiple values by passing sequences to the
old
andnew
parameters.>>> s.replace([2, 3], [100, 200]) shape: (4,) Series: '' [i64] [ 1 100 100 200 ]
Passing a mapping with replacements is also supported as syntactic sugar. Specify a default to set all values that were not matched.
>>> mapping = {2: 100, 3: 200} >>> s.replace(mapping, default=-1) shape: (4,) Series: '' [i64] [ -1 100 100 200 ]
The default can be another Series.
>>> default = pl.Series([2.5, 5.0, 7.5, 10.0]) >>> s.replace(2, 100, default=default) shape: (4,) Series: '' [f64] [ 2.5 100.0 100.0 10.0 ]
Replacing by values of a different data type sets the return type based on a combination of the
new
data type and either the original data type or the default data type if it was set.>>> s = pl.Series(["x", "y", "z"]) >>> mapping = {"x": 1, "y": 2, "z": 3} >>> s.replace(mapping) shape: (3,) Series: '' [str] [ "1" "2" "3" ] >>> s.replace(mapping, default=None) shape: (3,) Series: '' [i64] [ 1 2 3 ]
Set the
return_dtype
parameter to control the resulting data type directly.>>> s.replace(mapping, return_dtype=pl.UInt8) shape: (3,) Series: '' [u8] [ 1 2 3 ]
- reshape( ) Series [source]
Reshape this Series to a flat Series or a Series of Lists.
- Parameters:
- dimensions
Tuple of the dimension sizes. If a -1 is used in any of the dimensions, that dimension is inferred.
- nested_type
The nested data type to create. List only supports 2 dimension, whereas Array supports an arbitrary number of dimensions.
- Returns:
- Series
If a single dimension is given, results in a Series of the original data type. If a multiple dimensions are given, results in a Series of data type
List
with shape (rows, cols) orArray
with shapedimensions
.
See also
Series.list.explode
Explode a list column.
Examples
>>> s = pl.Series("foo", [1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> s.reshape((3, 3)) shape: (3,) Series: 'foo' [list[i64]] [ [1, 2, 3] [4, 5, 6] [7, 8, 9] ]
- reverse() Series [source]
Return Series in reverse order.
Examples
>>> s = pl.Series("a", [1, 2, 3], dtype=pl.Int8) >>> s.reverse() shape: (3,) Series: 'a' [i8] [ 3 2 1 ]
- rle() Series [source]
Compress the Series data using run-length encoding.
Run-length encoding (RLE) encodes data by storing each run of identical values as a single value and its length.
- Returns:
- Series
Series of data type
Struct
with fieldslengths
of data typeInt32
andvalues
of the original data type.
Examples
>>> s = pl.Series("s", [1, 1, 2, 1, None, 1, 3, 3]) >>> s.rle().struct.unnest() shape: (6, 2) ┌─────────┬────────┐ │ lengths ┆ values │ │ --- ┆ --- │ │ i32 ┆ i64 │ ╞═════════╪════════╡ │ 2 ┆ 1 │ │ 1 ┆ 2 │ │ 1 ┆ 1 │ │ 1 ┆ null │ │ 1 ┆ 1 │ │ 2 ┆ 3 │ └─────────┴────────┘
- rle_id() Series [source]
Get a distinct integer ID for each run of identical values.
The ID starts at 0 and increases by one each time the value of the column changes.
- Returns:
- Series
Series of data type
UInt32
.
See also
Notes
This functionality is especially useful for defining a new group for every time a column’s value changes, rather than for every distinct value of that column.
Examples
>>> s = pl.Series("s", [1, 1, 2, 1, None, 1, 3, 3]) >>> s.rle_id() shape: (8,) Series: 's' [u32] [ 0 0 1 2 3 4 5 5 ]
- rolling_apply(
- function: Callable[[Series], Any],
- window_size: int,
- weights: list[float] | None = None,
- min_periods: int | None = None,
- *,
- center: bool = False,
Apply a custom rolling window function.
Deprecated since version 0.19.0: This method has been renamed to
Series.rolling_map()
.- Parameters:
- function
Aggregation function
- window_size
The length of the window.
- weights
An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.
- min_periods
The number of values in the window that should be non-null before computing a result. If None, it will be set equal to:
the window size, if
window_size
is a fixed integer1, if
window_size
is a dynamic temporal size
- center
Set the labels at the center of the window
- rolling_map(
- function: Callable[[Series], Any],
- window_size: int,
- weights: list[float] | None = None,
- min_periods: int | None = None,
- *,
- center: bool = False,
Compute a custom rolling window function.
Warning
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
- Parameters:
- function
Custom aggregation function.
- window_size
Size of the window. The window at a given row will include the row itself and the
window_size - 1
elements before it.- weights
A list of weights with the same length as the window that will be multiplied elementwise with the values in the window.
- min_periods
The number of values in the window that should be non-null before computing a result. If None, it will be set equal to:
the window size, if
window_size
is a fixed integer1, if
window_size
is a dynamic temporal size
- center
Set the labels at the center of the window.
Warning
Computing custom functions is extremely slow. Use specialized rolling functions such as
Series.rolling_sum()
if at all possible.Examples
>>> from numpy import nansum >>> s = pl.Series([11.0, 2.0, 9.0, float("nan"), 8.0]) >>> s.rolling_map(nansum, window_size=3) shape: (5,) Series: '' [f64] [ null null 22.0 11.0 17.0 ]
- rolling_max(
- window_size: int,
- weights: list[float] | None = None,
- min_periods: int | None = None,
- *,
- center: bool = False,
Apply a rolling max (moving max) over the values in this array.
Warning
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
A window of length
window_size
will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by theweight
vector. The resulting values will be aggregated to their max.The window at a given row will include the row itself and the
window_size - 1
elements before it.- Parameters:
- window_size
The length of the window.
- weights
An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.
- min_periods
The number of values in the window that should be non-null before computing a result. If None, it will be set equal to:
the window size, if
window_size
is a fixed integer1, if
window_size
is a dynamic temporal size
- center
Set the labels at the center of the window
Examples
>>> s = pl.Series("a", [100, 200, 300, 400, 500]) >>> s.rolling_max(window_size=2) shape: (5,) Series: 'a' [i64] [ null 200 300 400 500 ]
- rolling_mean(
- window_size: int,
- weights: list[float] | None = None,
- min_periods: int | None = None,
- *,
- center: bool = False,
Apply a rolling mean (moving mean) over the values in this array.
Warning
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
A window of length
window_size
will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by theweight
vector. The resulting values will be aggregated to their mean.The window at a given row will include the row itself and the
window_size - 1
elements before it.- Parameters:
- window_size
The length of the window.
- weights
An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.
- min_periods
The number of values in the window that should be non-null before computing a result. If None, it will be set equal to:
the window size, if
window_size
is a fixed integer1, if
window_size
is a dynamic temporal size
- center
Set the labels at the center of the window
Examples
>>> s = pl.Series("a", [100, 200, 300, 400, 500]) >>> s.rolling_mean(window_size=2) shape: (5,) Series: 'a' [f64] [ null 150.0 250.0 350.0 450.0 ]
- rolling_median(
- window_size: int,
- weights: list[float] | None = None,
- min_periods: int | None = None,
- *,
- center: bool = False,
Compute a rolling median.
Warning
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
- Parameters:
- window_size
The length of the window.
- weights
An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.
- min_periods
The number of values in the window that should be non-null before computing a result. If None, it will be set equal to:
the window size, if
window_size
is a fixed integer1, if
window_size
is a dynamic temporal size
- center
Set the labels at the center of the window
- The window at a given row will include the row itself and the `window_size - 1`
- elements before it.
Examples
>>> s = pl.Series("a", [1.0, 2.0, 3.0, 4.0, 6.0, 8.0]) >>> s.rolling_median(window_size=3) shape: (6,) Series: 'a' [f64] [ null null 2.0 3.0 4.0 6.0 ]
- rolling_min(
- window_size: int,
- weights: list[float] | None = None,
- min_periods: int | None = None,
- *,
- center: bool = False,
Apply a rolling min (moving min) over the values in this array.
Warning
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
A window of length
window_size
will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by theweight
vector. The resulting values will be aggregated to their min.The window at a given row will include the row itself and the
window_size - 1
elements before it.- Parameters:
- window_size
The length of the window.
- weights
An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.
- min_periods
The number of values in the window that should be non-null before computing a result. If None, it will be set equal to:
the window size, if
window_size
is a fixed integer1, if
window_size
is a dynamic temporal size
- center
Set the labels at the center of the window
Examples
>>> s = pl.Series("a", [100, 200, 300, 400, 500]) >>> s.rolling_min(window_size=3) shape: (5,) Series: 'a' [i64] [ null null 100 200 300 ]
- rolling_quantile(
- quantile: float,
- interpolation: RollingInterpolationMethod = 'nearest',
- window_size: int = 2,
- weights: list[float] | None = None,
- min_periods: int | None = None,
- *,
- center: bool = False,
Compute a rolling quantile.
Warning
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
The window at a given row will include the row itself and the
window_size - 1
elements before it.- Parameters:
- quantile
Quantile between 0.0 and 1.0.
- interpolation{‘nearest’, ‘higher’, ‘lower’, ‘midpoint’, ‘linear’}
Interpolation method.
- window_size
The length of the window.
- weights
An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.
- min_periods
The number of values in the window that should be non-null before computing a result. If None, it will be set equal to:
the window size, if
window_size
is a fixed integer1, if
window_size
is a dynamic temporal size
- center
Set the labels at the center of the window
Examples
>>> s = pl.Series("a", [1.0, 2.0, 3.0, 4.0, 6.0, 8.0]) >>> s.rolling_quantile(quantile=0.33, window_size=3) shape: (6,) Series: 'a' [f64] [ null null 1.0 2.0 3.0 4.0 ] >>> s.rolling_quantile(quantile=0.33, interpolation="linear", window_size=3) shape: (6,) Series: 'a' [f64] [ null null 1.66 2.66 3.66 5.32 ]
- rolling_skew( ) Series [source]
Compute a rolling skew.
Warning
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
The window at a given row includes the row itself and the
window_size - 1
elements before it.- Parameters:
- window_size
Integer size of the rolling window.
- bias
If False, the calculations are corrected for statistical bias.
Examples
>>> pl.Series([1, 4, 2, 9]).rolling_skew(3) shape: (4,) Series: '' [f64] [ null null 0.381802 0.47033 ]
Note how the values match
>>> pl.Series([1, 4, 2]).skew(), pl.Series([4, 2, 9]).skew() (0.38180177416060584, 0.47033046033698594)
- rolling_std(
- window_size: int,
- weights: list[float] | None = None,
- min_periods: int | None = None,
- *,
- center: bool = False,
- ddof: int = 1,
Compute a rolling std dev.
Warning
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
A window of length
window_size
will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by theweight
vector. The resulting values will be aggregated to their std dev.The window at a given row will include the row itself and the
window_size - 1
elements before it.- Parameters:
- window_size
The length of the window.
- weights
An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.
- min_periods
The number of values in the window that should be non-null before computing a result. If None, it will be set equal to:
the window size, if
window_size
is a fixed integer1, if
window_size
is a dynamic temporal size
- center
Set the labels at the center of the window
- ddof
“Delta Degrees of Freedom”: The divisor for a length N window is N - ddof
Examples
>>> s = pl.Series("a", [1.0, 2.0, 3.0, 4.0, 6.0, 8.0]) >>> s.rolling_std(window_size=3) shape: (6,) Series: 'a' [f64] [ null null 1.0 1.0 1.527525 2.0 ]
- rolling_sum(
- window_size: int,
- weights: list[float] | None = None,
- min_periods: int | None = None,
- *,
- center: bool = False,
Apply a rolling sum (moving sum) over the values in this array.
Warning
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
A window of length
window_size
will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by theweight
vector. The resulting values will be aggregated to their sum.The window at a given row will include the row itself and the
window_size - 1
elements before it.- Parameters:
- window_size
The length of the window.
- weights
An optional slice with the same length of the window that will be multiplied elementwise with the values in the window.
- min_periods
The number of values in the window that should be non-null before computing a result. If None, it will be set equal to:
the window size, if
window_size
is a fixed integer1, if
window_size
is a dynamic temporal size
- center
Set the labels at the center of the window
Examples
>>> s = pl.Series("a", [1, 2, 3, 4, 5]) >>> s.rolling_sum(window_size=2) shape: (5,) Series: 'a' [i64] [ null 3 5 7 9 ]
- rolling_var(
- window_size: int,
- weights: list[float] | None = None,
- min_periods: int | None = None,
- *,
- center: bool = False,
- ddof: int = 1,
Compute a rolling variance.
Warning
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
A window of length
window_size
will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by theweight
vector. The resulting values will be aggregated to their variance.The window at a given row will include the row itself and the
window_size - 1
elements before it.- Parameters:
- window_size
The length of the window.
- weights
An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.
- min_periods
The number of values in the window that should be non-null before computing a result. If None, it will be set equal to:
the window size, if
window_size
is a fixed integer1, if
window_size
is a dynamic temporal size
- center
Set the labels at the center of the window
- ddof
“Delta Degrees of Freedom”: The divisor for a length N window is N - ddof
Examples
>>> s = pl.Series("a", [1.0, 2.0, 3.0, 4.0, 6.0, 8.0]) >>> s.rolling_var(window_size=3) shape: (6,) Series: 'a' [f64] [ null null 1.0 1.0 2.333333 4.0 ]
- round(decimals: int = 0) Series [source]
Round underlying floating point data by
decimals
digits.- Parameters:
- decimals
number of decimals to round by.
Examples
>>> s = pl.Series("a", [1.12345, 2.56789, 3.901234]) >>> s.round(2) shape: (3,) Series: 'a' [f64] [ 1.12 2.57 3.9 ]
- round_sig_figs(digits: int) Series [source]
Round to a number of significant figures.
- Parameters:
- digits
Number of significant figures to round to.
Examples
>>> s = pl.Series([0.01234, 3.333, 1234.0]) >>> s.round_sig_figs(2) shape: (3,) Series: '' [f64] [ 0.012 3.3 1200.0 ]
- sample(
- n: int | None = None,
- *,
- fraction: float | None = None,
- with_replacement: bool = False,
- shuffle: bool = False,
- seed: int | None = None,
Sample from this Series.
- Parameters:
- n
Number of items to return. Cannot be used with
fraction
. Defaults to 1 iffraction
is None.- fraction
Fraction of items to return. Cannot be used with
n
.- with_replacement
Allow values to be sampled more than once.
- shuffle
Shuffle the order of sampled data points.
- seed
Seed for the random number generator. If set to None (default), a random seed is generated for each sample operation.
Examples
>>> s = pl.Series("a", [1, 2, 3, 4, 5]) >>> s.sample(2, seed=0) shape: (2,) Series: 'a' [i64] [ 1 5 ]
- scatter(
- indices: Series | Iterable[int] | int | np.ndarray[Any, Any],
- values: Series | Iterable[PythonLiteral] | PythonLiteral | None,
Set values at the index locations.
- Parameters:
- indices
Integers representing the index locations.
- values
Replacement values.
Notes
Use of this function is frequently an anti-pattern, as it can block optimization (predicate pushdown, etc). Consider using
pl.when(predicate).then(value).otherwise(self)
instead.Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.scatter(1, 10) shape: (3,) Series: 'a' [i64] [ 1 10 3 ]
It is better to implement this as follows:
>>> s.to_frame().with_row_index().select( ... pl.when(pl.col("index") == 1).then(10).otherwise(pl.col("a")) ... ) shape: (3, 1) ┌─────────┐ │ literal │ │ --- │ │ i64 │ ╞═════════╡ │ 1 │ │ 10 │ │ 3 │ └─────────┘
- search_sorted(
- element: IntoExpr | np.ndarray[Any, Any] | None,
- side: SearchSortedSide = 'any',
Find indices where elements should be inserted to maintain order.
\[a[i-1] < v <= a[i]\]- Parameters:
- element
Expression or scalar value.
- side{‘any’, ‘left’, ‘right’}
If ‘any’, the index of the first suitable location found is given. If ‘left’, the index of the leftmost suitable location found is given. If ‘right’, return the rightmost suitable location found is given.
Examples
>>> s = pl.Series("set", [1, 2, 3, 4, 4, 5, 6, 7]) >>> s.search_sorted(4) 3 >>> s.search_sorted(4, "left") 3 >>> s.search_sorted(4, "right") 5 >>> s.search_sorted([1, 4, 5]) shape: (3,) Series: 'set' [u32] [ 0 3 5 ] >>> s.search_sorted([1, 4, 5], "left") shape: (3,) Series: 'set' [u32] [ 0 3 5 ] >>> s.search_sorted([1, 4, 5], "right") shape: (3,) Series: 'set' [u32] [ 1 5 6 ]
- series_equal( ) bool [source]
Check whether the Series is equal to another Series.
Deprecated since version 0.19.16: This method has been renamed to
equals()
.- Parameters:
- other
Series to compare with.
- null_equal
Consider null values as equal.
- strict
Don’t allow different numerical dtypes, e.g. comparing
pl.UInt32
with apl.Int64
will returnFalse
.
- set( ) Series [source]
Set masked values.
- Parameters:
- filter
Boolean mask.
- value
Value with which to replace the masked values.
Notes
Use of this function is frequently an anti-pattern, as it can block optimisation (predicate pushdown, etc). Consider using
pl.when(predicate).then(value).otherwise(self)
instead.Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.set(s == 2, 10) shape: (3,) Series: 'a' [i64] [ 1 10 3 ]
It is better to implement this as follows:
>>> s.to_frame().select( ... pl.when(pl.col("a") == 2).then(10).otherwise(pl.col("a")) ... ) shape: (3, 1) ┌─────────┐ │ literal │ │ --- │ │ i64 │ ╞═════════╡ │ 1 │ │ 10 │ │ 3 │ └─────────┘
- set_at_idx(
- indices: Series | ndarray[Any, Any] | Sequence[int] | int,
- values: int | float | str | bool | date | datetime | Sequence[int] | Sequence[float] | Sequence[bool] | Sequence[str] | Sequence[date] | Sequence[datetime] | Series | None,
Set values at the index locations.
Deprecated since version 0.19.14: This method has been renamed to
scatter()
.- Parameters:
- indices
Integers representing the index locations.
- values
Replacement values.
- set_sorted(*, descending: bool = False) Self [source]
Flags the Series as ‘sorted’.
Enables downstream code to user fast paths for sorted arrays.
- Parameters:
- descending
If the
Series
order is descending.
Warning
This can lead to incorrect results if this
Series
is not sorted!! Use with care!Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.set_sorted().max() 3
- property shape: tuple[int][source]
Shape of this Series.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.shape (3,)
- shift(n: int = 1, *, fill_value: IntoExpr | None = None) Series [source]
Shift values by the given number of indices.
- Parameters:
- n
Number of indices to shift forward. If a negative value is passed, values are shifted in the opposite direction instead.
- fill_value
Fill the resulting null values with this value. Accepts expression input. Non-expression inputs are parsed as literals.
Notes
This method is similar to the
LAG
operation in SQL when the value forn
is positive. With a negative value forn
, it is similar toLEAD
.Examples
By default, values are shifted forward by one index.
>>> s = pl.Series([1, 2, 3, 4]) >>> s.shift() shape: (4,) Series: '' [i64] [ null 1 2 3 ]
Pass a negative value to shift in the opposite direction instead.
>>> s.shift(-2) shape: (4,) Series: '' [i64] [ 3 4 null null ]
Specify
fill_value
to fill the resulting null values.>>> s.shift(-2, fill_value=100) shape: (4,) Series: '' [i64] [ 3 4 100 100 ]
- shift_and_fill(fill_value: int | Expr, *, n: int = 1) Series [source]
Shift values by the given number of places and fill the resulting null values.
Deprecated since version 0.19.12: Use
shift()
instead.- Parameters:
- fill_value
Fill None values with the result of this expression.
- n
Number of places to shift (may be negative).
- shrink_dtype() Series [source]
Shrink numeric columns to the minimal required datatype.
Shrink to the dtype needed to fit the extrema of this [
Series
]. This can be used to reduce memory pressure.Examples
>>> s = pl.Series("a", [1, 2, 3, 4, 5, 6]) >>> s shape: (6,) Series: 'a' [i64] [ 1 2 3 4 5 6 ] >>> s.shrink_dtype() shape: (6,) Series: 'a' [i8] [ 1 2 3 4 5 6 ]
- shrink_to_fit(*, in_place: bool = False) Series [source]
Shrink Series memory usage.
Shrinks the underlying array capacity to exactly fit the actual data. (Note that this function does not change the Series data type).
- shuffle(seed: int | None = None) Series [source]
Shuffle the contents of this Series.
- Parameters:
- seed
Seed for the random number generator. If set to None (default), a random seed is generated each time the shuffle is called.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.shuffle(seed=1) shape: (3,) Series: 'a' [i64] [ 2 1 3 ]
- sign() Series [source]
Compute the element-wise indication of the sign.
The returned values can be -1, 0, or 1:
-1 if x < 0.
0 if x == 0.
1 if x > 0.
(null values are preserved as-is).
Examples
>>> s = pl.Series("a", [-9.0, -0.0, 0.0, 4.0, None]) >>> s.sign() shape: (5,) Series: 'a' [i64] [ -1 0 0 1 null ]
- sin() Series [source]
Compute the element-wise value for the sine.
Examples
>>> import math >>> s = pl.Series("a", [0.0, math.pi / 2.0, math.pi]) >>> s.sin() shape: (3,) Series: 'a' [f64] [ 0.0 1.0 1.2246e-16 ]
- sinh() Series [source]
Compute the element-wise value for the hyperbolic sine.
Examples
>>> s = pl.Series("a", [1.0, 0.0, -1.0]) >>> s.sinh() shape: (3,) Series: 'a' [f64] [ 1.175201 0.0 -1.175201 ]
- skew(*, bias: bool = True) float | None [source]
Compute the sample skewness of a data set.
For normally distributed data, the skewness should be about zero. For unimodal continuous distributions, a skewness value greater than zero means that there is more weight in the right tail of the distribution. The function
skewtest
can be used to determine if the skewness value is close enough to zero, statistically speaking.See scipy.stats for more information.
- Parameters:
- biasbool, optional
If False, the calculations are corrected for statistical bias.
Notes
The sample skewness is computed as the Fisher-Pearson coefficient of skewness, i.e.
\[g_1=\frac{m_3}{m_2^{3/2}}\]where
\[m_i=\frac{1}{N}\sum_{n=1}^N(x[n]-\bar{x})^i\]is the biased sample \(i\texttt{th}\) central moment, and \(\bar{x}\) is the sample mean. If
bias
is False, the calculations are corrected for bias and the value computed is the adjusted Fisher-Pearson standardized moment coefficient, i.e.\[G_1 = \frac{k_3}{k_2^{3/2}} = \frac{\sqrt{N(N-1)}}{N-2}\frac{m_3}{m_2^{3/2}}\]Examples
>>> s = pl.Series([1, 2, 2, 4, 5]) >>> s.skew() 0.34776706224699483
- slice(offset: int, length: int | None = None) Series [source]
Get a slice of this Series.
- Parameters:
- offset
Start index. Negative indexing is supported.
- length
Length of the slice. If set to
None
, all rows starting at the offset will be selected.
Examples
>>> s = pl.Series("a", [1, 2, 3, 4]) >>> s.slice(1, 2) shape: (2,) Series: 'a' [i64] [ 2 3 ]
- sort(
- *,
- descending: bool = False,
- nulls_last: bool = False,
- multithreaded: bool = True,
- in_place: bool = False,
Sort this Series.
- Parameters:
- descending
Sort in descending order.
- nulls_last
Place null values last instead of first.
- multithreaded
Sort using multiple threads.
- in_place
Sort in-place.
Examples
>>> s = pl.Series("a", [1, 3, 4, 2]) >>> s.sort() shape: (4,) Series: 'a' [i64] [ 1 2 3 4 ] >>> s.sort(descending=True) shape: (4,) Series: 'a' [i64] [ 4 3 2 1 ]
- sqrt() Series [source]
Compute the square root of the elements.
Syntactic sugar for
>>> pl.Series([1, 2]) ** 0.5 shape: (2,) Series: '' [f64] [ 1.0 1.414214 ]
Examples
>>> s = pl.Series([1, 2, 3]) >>> s.sqrt() shape: (3,) Series: '' [f64] [ 1.0 1.414214 1.732051 ]
- std(ddof: int = 1) float | timedelta | None [source]
Get the standard deviation of this Series.
- Parameters:
- ddof
“Delta Degrees of Freedom”: the divisor used in the calculation is N - ddof, where N represents the number of elements. By default ddof is 1.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.std() 1.0
- sum() int | float [source]
Reduce this Series to the sum value.
Notes
Dtypes in {Int8, UInt8, Int16, UInt16} are cast to Int64 before summing to prevent overflow issues.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.sum() 6
- tail(n: int = 10) Series [source]
Get the last
n
elements.- Parameters:
- n
Number of elements to return. If a negative value is passed, return all elements except the first
abs(n)
.
Examples
>>> s = pl.Series("a", [1, 2, 3, 4, 5]) >>> s.tail(3) shape: (3,) Series: 'a' [i64] [ 3 4 5 ]
Pass a negative value to get all rows
except
the firstabs(n)
.>>> s.tail(-3) shape: (2,) Series: 'a' [i64] [ 4 5 ]
- take(indices: int | list[int] | Expr | Series | np.ndarray[Any, Any]) Series [source]
Take values by index.
Deprecated since version 0.19.14: This method has been renamed to
gather()
.- Parameters:
- indices
Index location used for selection.
- take_every(n: int, offset: int = 0) Series [source]
Take every nth value in the Series and return as new Series.
Deprecated since version 0.19.14: This method has been renamed to
gather_every()
.- Parameters:
- n
Gather every n-th row.
- offset
Starting index.
- tan() Series [source]
Compute the element-wise value for the tangent.
Examples
>>> import math >>> s = pl.Series("a", [0.0, math.pi / 2.0, math.pi]) >>> s.tan() shape: (3,) Series: 'a' [f64] [ 0.0 1.6331e16 -1.2246e-16 ]
- tanh() Series [source]
Compute the element-wise value for the hyperbolic tangent.
Examples
>>> s = pl.Series("a", [1.0, 0.0, -1.0]) >>> s.tanh() shape: (3,) Series: 'a' [f64] [ 0.761594 0.0 -0.761594 ]
- to_arrow() Array [source]
Return the underlying Arrow array.
If the Series contains only a single chunk this operation is zero copy.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s = s.to_arrow() >>> s <pyarrow.lib.Int64Array object at ...> [ 1, 2, 3 ]
- to_dummies(*, separator: str = '_', drop_first: bool = False) DataFrame [source]
Get dummy/indicator variables.
- Parameters:
- separator
Separator/delimiter used when generating column names.
- drop_first
Remove the first category from the variable being encoded.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.to_dummies() shape: (3, 3) ┌─────┬─────┬─────┐ │ a_1 ┆ a_2 ┆ a_3 │ │ --- ┆ --- ┆ --- │ │ u8 ┆ u8 ┆ u8 │ ╞═════╪═════╪═════╡ │ 1 ┆ 0 ┆ 0 │ │ 0 ┆ 1 ┆ 0 │ │ 0 ┆ 0 ┆ 1 │ └─────┴─────┴─────┘
>>> s.to_dummies(drop_first=True) shape: (3, 2) ┌─────┬─────┐ │ a_2 ┆ a_3 │ │ --- ┆ --- │ │ u8 ┆ u8 │ ╞═════╪═════╡ │ 0 ┆ 0 │ │ 1 ┆ 0 │ │ 0 ┆ 1 │ └─────┴─────┘
- to_frame(name: str | None = None) DataFrame [source]
Cast this Series to a DataFrame.
- Parameters:
- name
optionally name/rename the Series column in the new DataFrame.
Examples
>>> s = pl.Series("a", [123, 456]) >>> df = s.to_frame() >>> df shape: (2, 1) ┌─────┐ │ a │ │ --- │ │ i64 │ ╞═════╡ │ 123 │ │ 456 │ └─────┘
>>> df = s.to_frame("xyz") >>> df shape: (2, 1) ┌─────┐ │ xyz │ │ --- │ │ i64 │ ╞═════╡ │ 123 │ │ 456 │ └─────┘
- to_init_repr(n: int = 1000) str [source]
Convert Series to instantiatable string representation.
- Parameters:
- n
Only use first n elements.
Examples
>>> s = pl.Series("a", [1, 2, None, 4], dtype=pl.Int16) >>> print(s.to_init_repr()) pl.Series("a", [1, 2, None, 4], dtype=pl.Int16) >>> s_from_str_repr = eval(s.to_init_repr()) >>> s_from_str_repr shape: (4,) Series: 'a' [i16] [ 1 2 null 4 ]
- to_jax(device: jax.Device | str | None = None) jax.Array [source]
Convert this Series to a Jax Array.
New in version 0.20.27.
Warning
This functionality is currently considered unstable. It may be changed at any point without it being considered a breaking change.
- Parameters:
- device
Specify the jax
Device
on which the array will be created; can provide a string (such as “cpu”, “gpu”, or “tpu”) in which case the device is retrieved asjax.devices(string)[0]
. For more specific control you can supply the instantiatedDevice
directly. If None, arrays are created on the default device.
Examples
>>> s = pl.Series("x", [10.5, 0.0, -10.0, 5.5]) >>> s.to_jax() Array([ 10.5, 0. , -10. , 5.5], dtype=float32)
- to_list(*, use_pyarrow: bool | None = None) list[Any] [source]
Convert this Series to a Python list.
This operation copies data.
- Parameters:
- use_pyarrow
Use PyArrow to perform the conversion.
Deprecated since version 0.19.9: This parameter will be removed. The function can safely be called without the parameter - it should give the exact same result.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.to_list() [1, 2, 3] >>> type(s.to_list()) <class 'list'>
- to_numpy(
- *,
- writable: bool = False,
- allow_copy: bool = True,
- use_pyarrow: bool | None = None,
- zero_copy_only: bool | None = None,
Convert this Series to a NumPy ndarray.
This operation copies data only when necessary. The conversion is zero copy when all of the following hold:
The data type is an integer, float,
Datetime
,Duration
, orArray
.The Series contains no null values.
The Series consists of a single chunk.
The
writable
parameter is set toFalse
(default).
- Parameters:
- writable
Ensure the resulting array is writable. This will force a copy of the data if the array was created without copy as the underlying Arrow data is immutable.
- allow_copy
Allow memory to be copied to perform the conversion. If set to
False
, causes conversions that are not zero-copy to fail.- use_pyarrow
First convert to PyArrow, then call pyarrow.Array.to_numpy to convert to NumPy. If set to
False
, Polars’ own conversion logic is used.Deprecated since version 0.20.28: Polars now uses its native engine by default for conversion to NumPy. To use PyArrow’s engine, call
.to_arrow().to_numpy()
instead.- zero_copy_only
Raise an exception if the conversion to a NumPy would require copying the underlying data. Data copy occurs, for example, when the Series contains nulls or non-numeric types.
Deprecated since version 0.20.10: Use the
allow_copy
parameter instead, which is the inverse of this one.
Examples
Numeric data without nulls can be converted without copying data. The resulting array will not be writable.
>>> s = pl.Series([1, 2, 3], dtype=pl.Int8) >>> arr = s.to_numpy() >>> arr array([1, 2, 3], dtype=int8) >>> arr.flags.writeable False
Set
writable=True
to force data copy to make the array writable.>>> s.to_numpy(writable=True).flags.writeable True
Integer Series containing nulls will be cast to a float type with
nan
representing a null value. This requires data to be copied.>>> s = pl.Series([1, 2, None], dtype=pl.UInt16) >>> s.to_numpy() array([ 1., 2., nan], dtype=float32)
Set
allow_copy=False
to raise an error if data would be copied.>>> s.to_numpy(allow_copy=False) Traceback (most recent call last): ... RuntimeError: copy not allowed: cannot convert to a NumPy array without copying data
Series of data type
Array
andStruct
will result in an array with more than one dimension.>>> s = pl.Series([[1, 2, 3], [4, 5, 6]], dtype=pl.Array(pl.Int64, 3)) >>> s.to_numpy() array([[1, 2, 3], [4, 5, 6]])
- to_pandas(
- *,
- use_pyarrow_extension_array: bool = False,
- **kwargs: Any,
Convert this Series to a pandas Series.
This operation copies data if
use_pyarrow_extension_array
is not enabled.- Parameters:
- use_pyarrow_extension_array
Use a PyArrow-backed extension array instead of a NumPy array for the pandas Series. This allows zero copy operations and preservation of null values. Subsequent operations on the resulting pandas Series may trigger conversion to NumPy if those operations are not supported by PyArrow compute functions.
- **kwargs
Additional keyword arguments to be passed to
pyarrow.Array.to_pandas()
.
- Returns:
Notes
This operation requires that both
pandas
andpyarrow
are installed.Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.to_pandas() 0 1 1 2 2 3 Name: a, dtype: int64
Null values are converted to
NaN
.>>> s = pl.Series("b", [1, 2, None]) >>> s.to_pandas() 0 1.0 1 2.0 2 NaN Name: b, dtype: float64
Pass
use_pyarrow_extension_array=True
to get a pandas Series backed by a PyArrow extension array. This will preserve null values.>>> s.to_pandas(use_pyarrow_extension_array=True) 0 1 1 2 2 <NA> Name: b, dtype: int64[pyarrow]
- to_physical() Series [source]
Cast to physical representation of the logical dtype.
List(inner)
->List(physical of inner)
Other data types will be left unchanged.
Examples
Replicating the pandas pd.Series.factorize method.
>>> s = pl.Series("values", ["a", None, "x", "a"]) >>> s.cast(pl.Categorical).to_physical() shape: (4,) Series: 'values' [u32] [ 0 null 1 0 ]
- to_torch() torch.Tensor [source]
Convert this Series to a PyTorch Tensor.
New in version 0.20.23.
Warning
This functionality is currently considered unstable. It may be changed at any point without it being considered a breaking change.
Notes
PyTorch tensors do not support UInt16, UInt32, or UInt64; these dtypes will be automatically cast to Int32, Int64, and Int64, respectively.
Examples
>>> s = pl.Series("x", [1, 0, 1, 2, 0], dtype=pl.UInt8) >>> s.to_torch() tensor([1, 0, 1, 2, 0], dtype=torch.uint8) >>> s = pl.Series("x", [5.5, -10.0, 2.5], dtype=pl.Float32) >>> s.to_torch() tensor([ 5.5000, -10.0000, 2.5000])
- top_k(k: int = 5) Series [source]
Return the
k
largest elements.This has time complexity:
\[O(n + k \log{n})\]- Parameters:
- k
Number of elements to return.
See also
Examples
>>> s = pl.Series("a", [2, 5, 1, 4, 3]) >>> s.top_k(3) shape: (3,) Series: 'a' [i64] [ 5 4 3 ]
- unique(*, maintain_order: bool = False) Series [source]
Get unique elements in series.
- Parameters:
- maintain_order
Maintain order of data. This requires more work.
Examples
>>> s = pl.Series("a", [1, 2, 2, 3]) >>> s.unique().sort() shape: (3,) Series: 'a' [i64] [ 1 2 3 ]
- unique_counts() Series [source]
Return a count of the unique values in the order of appearance.
Examples
>>> s = pl.Series("id", ["a", "b", "b", "c", "c", "c"]) >>> s.unique_counts() shape: (3,) Series: 'id' [u32] [ 1 2 3 ]
- upper_bound() Self [source]
Return the upper bound of this Series’ dtype as a unit Series.
See also
lower_bound
return the lower bound of the given Series’ dtype.
Examples
>>> s = pl.Series("s", [-1, 0, 1], dtype=pl.Int8) >>> s.upper_bound() shape: (1,) Series: 's' [i8] [ 127 ]
>>> s = pl.Series("s", [1.0, 2.5, 3.0], dtype=pl.Float64) >>> s.upper_bound() shape: (1,) Series: 's' [f64] [ inf ]
- value_counts( ) DataFrame [source]
Count the occurrences of unique values.
- Parameters:
- sort
Sort the output by count in descending order. If set to
False
(default), the order of the output is random.- parallel
Execute the computation in parallel.
Note
This option should likely not be enabled in a group by context, as the computation is already parallelized per group.
- name
Give the resulting count column a specific name; defaults to “count”.
- Returns:
- DataFrame
Mapping of unique values to their count.
Examples
>>> s = pl.Series("color", ["red", "blue", "red", "green", "blue", "blue"]) >>> s.value_counts() shape: (3, 2) ┌───────┬───────┐ │ color ┆ count │ │ --- ┆ --- │ │ str ┆ u32 │ ╞═══════╪═══════╡ │ red ┆ 2 │ │ green ┆ 1 │ │ blue ┆ 3 │ └───────┴───────┘
Sort the output by count and customize the count column name.
>>> s.value_counts(sort=True, name="n") shape: (3, 2) ┌───────┬─────┐ │ color ┆ n │ │ --- ┆ --- │ │ str ┆ u32 │ ╞═══════╪═════╡ │ blue ┆ 3 │ │ red ┆ 2 │ │ green ┆ 1 │ └───────┴─────┘
- var(ddof: int = 1) float | timedelta | None [source]
Get variance of this Series.
- Parameters:
- ddof
“Delta Degrees of Freedom”: the divisor used in the calculation is N - ddof, where N represents the number of elements. By default ddof is 1.
Examples
>>> s = pl.Series("a", [1, 2, 3]) >>> s.var() 1.0
- view(*, ignore_nulls: bool = False) SeriesView [source]
Get a view into this Series data with a numpy array.
Deprecated since version 0.19.14: This method will be removed in a future version.
This operation doesn’t clone data, but does not include missing values. Don’t use this unless you know what you are doing.
- Parameters:
- ignore_nulls
If True then nulls are converted to 0. If False then an Exception is raised if nulls are present.
- zip_with(mask: Series, other: Series) Self [source]
Take values from self or other based on the given mask.
Where mask evaluates true, take values from self. Where mask evaluates false, take values from other.
- Parameters:
- mask
Boolean Series.
- other
Series of same type.
- Returns:
- Series
Examples
>>> s1 = pl.Series([1, 2, 3, 4, 5]) >>> s2 = pl.Series([5, 4, 3, 2, 1]) >>> s1.zip_with(s1 < s2, s2) shape: (5,) Series: '' [i64] [ 1 2 3 2 1 ] >>> mask = pl.Series([True, False, True, False, True]) >>> s1.zip_with(mask, s2) shape: (5,) Series: '' [i64] [ 1 4 3 2 5 ]