polars.Expr.map_batches#
- Expr.map_batches(
- function: Callable[[Series], Series | Any],
- return_dtype: PolarsDataType | DataTypeExpr | None = None,
- *,
- agg_list: bool = False,
- is_elementwise: bool = False,
- returns_scalar: bool = False,
- _is_ufunc: bool = False,
Apply a custom python function to a whole Series or sequence of Series.
The output of this custom function is presumed to be either a Series, or a NumPy array (in which case it will be automatically converted into a Series), or a scalar that will be converted into a Series. If the result is a scalar and you want it to stay as a scalar, pass in
returns_scalar=True
. If you want to apply a custom function elementwise over single values, seemap_elements()
. A reasonable use case formap
functions is transforming the values represented by an expression using a third-party library.- Parameters:
- function
Lambda/function to apply.
- return_dtype
Dtype of the output Series. If not set, the dtype will be inferred based on the first non-null value that is returned by the function.
- agg_list
First implode when in a group-by aggregation.
Deprecated since version 1.32.0: Use
expr.implode().map_batches(..)
instead.- is_elementwise
If set to true this can run in the streaming engine, but may yield incorrect results in group-by. Ensure you know what you are doing!
- returns_scalar
If the function returns a scalar, by default it will be wrapped in a list in the output, since the assumption is that the function always returns something Series-like. If you want to keep the result as a scalar, set this argument to True.
Warning
If
return_dtype
is not provided, this may lead to unexpected results. We allow this, but it is considered a bug in the user’s query. In the future this will raise inLazy
queries.See also
Examples
>>> df = pl.DataFrame( ... { ... "sine": [0.0, 1.0, 0.0, -1.0], ... "cosine": [1.0, 0.0, -1.0, 0.0], ... } ... ) >>> df.select( ... pl.all().map_batches( ... lambda x: x.to_numpy().argmax(), return_dtype=pl.Int64 ... ) ... ) shape: (1, 2) ┌──────┬────────┐ │ sine ┆ cosine │ │ --- ┆ --- │ │ i64 ┆ i64 │ ╞══════╪════════╡ │ 1 ┆ 0 │ └──────┴────────┘
Here’s an example of a function that returns a scalar, where we want it to stay as a scalar:
>>> df = pl.DataFrame( ... { ... "a": [0, 1, 0, 1], ... "b": [1, 2, 3, 4], ... } ... ) >>> df.group_by("a").agg( ... pl.col("b").map_batches( ... lambda x: x.max(), returns_scalar=True, return_dtype=pl.self_dtype() ... ) ... ) shape: (2, 2) ┌─────┬─────┐ │ a ┆ b │ │ --- ┆ --- │ │ i64 ┆ i64 │ ╞═════╪═════╡ │ 1 ┆ 4 │ │ 0 ┆ 3 │ └─────┴─────┘
Call a function that takes multiple arguments by creating a
struct
and referencing its fields inside the function call.>>> df = pl.DataFrame( ... { ... "a": [5, 1, 0, 3], ... "b": [4, 2, 3, 4], ... } ... ) >>> df.with_columns( ... a_times_b=pl.struct("a", "b").map_batches( ... lambda x: np.multiply(x.struct.field("a"), x.struct.field("b")), ... return_dtype=pl.Int64, ... ) ... ) shape: (4, 3) ┌─────┬─────┬───────────┐ │ a ┆ b ┆ a_times_b │ │ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ i64 │ ╞═════╪═════╪═══════════╡ │ 5 ┆ 4 ┆ 20 │ │ 1 ┆ 2 ┆ 2 │ │ 0 ┆ 3 ┆ 0 │ │ 3 ┆ 4 ┆ 12 │ └─────┴─────┴───────────┘