polars.Expr.map_elements#

Expr.map_elements( function: Callable[[Any], Any], return_dtype: PolarsDataType | DataTypeExpr | None = None, *, skip_nulls: bool = True, pass_name: bool = False, strategy: MapElementsStrategy = 'thread_local', returns_scalar: bool = False, ) → Expr[source]#

Map a custom/user-defined function (UDF) to each element of a column.

Warning

This method is much slower than the native expressions API. Only use it if you cannot implement your logic otherwise.

Suppose that the function is: x ↦ sqrt(x):

For mapping elements of a series, consider: pl.col("col_name").sqrt().
For mapping inner elements of lists, consider: pl.col("col_name").list.eval(pl.element().sqrt()).
For mapping elements of struct fields, consider: pl.col("col_name").struct.field("field_name").sqrt().

If you want to replace the original column or field, consider .with_columns and .with_fields.

Parameters:

function

Lambda/function to map.

return_dtype

Datatype of the output Series.

It is recommended to set this whenever possible. If this is None, it tries to infer the datatype by calling the function with dummy data and looking at the output.

skip_nulls

Don’t map the function over values that contain nulls (this is faster).

pass_name

Pass the Series name to the custom function (this is more expensive).

returns_scalar

Deprecated since version 1.32.0: Is ignored and will be removed in 2.0.

strategy{‘thread_local’, ‘threading’}

The threading strategy to use.

‘thread_local’: run the python function on a single thread.
‘threading’: run the python function on separate threads. Use with care as this can slow performance. This might only speed up your code if the amount of work per element is significant and the python function releases the GIL (e.g. via calling a c function)

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

Notes

Using map_elements is strongly discouraged as you will be effectively running python “for” loops, which will be very slow. Wherever possible you should prefer the native expression API to achieve the best performance.
If your function is expensive and you don’t want it to be called more than once for a given input, consider applying an @lru_cache decorator to it. If your data is suitable you may achieve significant speedups.
Window function application using over is considered a GroupBy context here, so map_elements can be used to map functions over window groups.
A UDF passed to map_elements must be pure, meaning that it cannot modify or depend on state other than its arguments. Polars may call the function with arbitrary input data.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 2, 3, 1],
...         "b": ["a", "b", "c", "c"],
...     }
... )

The function is applied to each element of column 'a':

>>> df.with_columns(  
...     pl.col("a")
...     .map_elements(lambda x: x * 2, return_dtype=pl.self_dtype())
...     .alias("a_times_2"),
... )
shape: (4, 3)
┌─────┬─────┬───────────┐
│ a   ┆ b   ┆ a_times_2 │
│ --- ┆ --- ┆ ---       │
│ i64 ┆ str ┆ i64       │
╞═════╪═════╪═══════════╡
│ 1   ┆ a   ┆ 2         │
│ 2   ┆ b   ┆ 4         │
│ 3   ┆ c   ┆ 6         │
│ 1   ┆ c   ┆ 2         │
└─────┴─────┴───────────┘

Tip: it is better to implement this with an expression:

>>> df.with_columns(
...     (pl.col("a") * 2).alias("a_times_2"),
... )  

>>> (
...     df.lazy()
...     .group_by("b")
...     .agg(
...         pl.col("a")
...         .implode()
...         .map_elements(lambda x: x.sum(), return_dtype=pl.Int64)
...     )
...     .collect()
... )  
shape: (3, 2)
┌─────┬─────┐
│ b   ┆ a   │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a   ┆ 1   │
│ b   ┆ 2   │
│ c   ┆ 4   │
└─────┴─────┘

Tip: again, it is better to implement this with an expression:

>>> (
...     df.lazy()
...     .group_by("b", maintain_order=True)
...     .agg(pl.col("a").sum())
...     .collect()
... )  

Window function application using over will behave as a GroupBy context, with your function receiving individual window groups:

>>> df = pl.DataFrame(
...     {
...         "key": ["x", "x", "y", "x", "y", "z"],
...         "val": [1, 1, 1, 1, 1, 1],
...     }
... )
>>> df.with_columns(
...     scaled=pl.col("val")
...     .implode()
...     .map_elements(lambda s: s * len(s), return_dtype=pl.List(pl.Int64))
...     .explode()
...     .over("key"),
... ).sort("key")
shape: (6, 3)
┌─────┬─────┬────────┐
│ key ┆ val ┆ scaled │
│ --- ┆ --- ┆ ---    │
│ str ┆ i64 ┆ i64    │
╞═════╪═════╪════════╡
│ x   ┆ 1   ┆ 3      │
│ x   ┆ 1   ┆ 3      │
│ x   ┆ 1   ┆ 3      │
│ y   ┆ 1   ┆ 2      │
│ y   ┆ 1   ┆ 2      │
│ z   ┆ 1   ┆ 1      │
└─────┴─────┴────────┘

Note that this function would also be better-implemented natively:

>>> df.with_columns(
...     scaled=(pl.col("val") * pl.col("val").count()).over("key"),
... ).sort("key")