polars.Expr.list.to_struct#
- Expr.list.to_struct(
- n_field_strategy: ListToStructWidthStrategy = 'first_non_null',
- fields: Sequence[str] | Callable[[int], str] | None = None,
- upper_bound: int = 0,
Convert the Series of type
List
to a Series of typeStruct
.- Parameters:
- n_field_strategy{‘first_non_null’, ‘max_width’}
Strategy to determine the number of fields of the struct.
“first_non_null”: set number of fields equal to the length of the first non zero-length sublist.
“max_width”: set number of fields as max length of all sublists.
- fields
If the name and number of the desired fields is known in advance a list of field names can be given, which will be assigned by index. Otherwise, to dynamically assign field names, a custom function can be used; if neither are set, fields will be
field_0, field_1 .. field_n
.- upper_bound
A polars
LazyFrame
needs to know the schema at all times, so the caller must provide an upper bound of the number of struct fields that will be created; if set incorrectly, subsequent operations may fail. (For example, anall().sum()
expression will look in the current schema to determine which columns to select).When operating on a
DataFrame
, the schema does not need to be tracked or pre-determined, as the result will be eagerly evaluated, so you can leave this parameter unset.
Notes
It is recommended to set ‘upper_bound’ to the correct output size of the struct. If this is not set, Polars will not know the output type of this operation and will set it to ‘Unknown’ which can lead to errors because Polars is not able to resolve the query.
For performance reasons, the length of the first non-null sublist is used to determine the number of output fields. If the sublists can be of different lengths then
n_field_strategy="max_width"
must be used to obtain the expected result.Examples
Convert list to struct with default field name assignment:
>>> df = pl.DataFrame({"n": [[0, 1], [0, 1, 2]]}) >>> df.with_columns( ... struct=pl.col("n").list.to_struct() ... ) shape: (2, 2) ┌───────────┬───────────┐ │ n ┆ struct │ │ --- ┆ --- │ │ list[i64] ┆ struct[2] │ # <- struct with 2 fields ╞═══════════╪═══════════╡ │ [0, 1] ┆ {0,1} │ # OK │ [0, 1, 2] ┆ {0,1} │ # NOT OK - last value missing └───────────┴───────────┘
As the shorter sublist comes first, we must use the
max_width
strategy to force a search for the longest.>>> df.with_columns( ... struct=pl.col("n").list.to_struct(n_field_strategy="max_width") ... ) shape: (2, 2) ┌───────────┬────────────┐ │ n ┆ struct │ │ --- ┆ --- │ │ list[i64] ┆ struct[3] │ # <- struct with 3 fields ╞═══════════╪════════════╡ │ [0, 1] ┆ {0,1,null} │ # OK │ [0, 1, 2] ┆ {0,1,2} │ # OK └───────────┴────────────┘
Convert list to struct with field name assignment by function/index:
>>> df = pl.DataFrame({"n": [[0, 1], [2, 3]]}) >>> df.select(pl.col("n").list.to_struct(fields=lambda idx: f"n{idx}")).rows( ... named=True ... ) [{'n': {'n0': 0, 'n1': 1}}, {'n': {'n0': 2, 'n1': 3}}]
Convert list to struct with field name assignment by index from a list of names:
>>> df.select(pl.col("n").list.to_struct(fields=["one", "two"])).rows( ... named=True ... ) [{'n': {'one': 0, 'two': 1}}, {'n': {'one': 2, 'two': 3}}]