polars.LazyFrame.match_to_schema#

LazyFrame.match_to_schema(
schema: SchemaDict | Schema,
*,
missing_columns: Literal['insert', 'raise'] | Mapping[str, Literal['insert', 'raise'] | Expr] = 'raise',
missing_struct_fields: Literal['insert', 'raise'] | Mapping[str, Literal['insert', 'raise']] = 'raise',
extra_columns: Literal['ignore', 'raise'] = 'raise',
extra_struct_fields: Literal['ignore', 'raise'] | Mapping[str, Literal['ignore', 'raise']] = 'raise',
integer_cast: Literal['upcast', 'forbid'] | Mapping[str, Literal['upcast', 'forbid']] = 'forbid',
float_cast: Literal['upcast', 'forbid'] | Mapping[str, Literal['upcast', 'forbid']] = 'forbid',
) LazyFrame[source]#

Match or evolve the schema of a LazyFrame into a specific schema.

By default, match_to_schema returns an error if the input schema does not exactly match the target schema. It also allows columns to be freely reordered, with additional coercion rules available through optional parameters.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

Parameters:
schema

Target schema to match or evolve to.

missing_columns

Raise of insert missing columns from the input with respect to the schema.

This can also be an expression per column with what to insert if it is missing.

missing_struct_fields

Raise of insert missing struct fields from the input with respect to the schema.

extra_columns

Raise of ignore extra columns from the input with respect to the schema.

extra_struct_fields

Raise of ignore extra struct fields from the input with respect to the schema.

integer_cast

Forbid of upcast for integer columns from the input to the respective column in schema.

float_cast

Forbid of upcast for float columns from the input to the respective column in schema.

Examples

Ensuring the schema matches

>>> lf = pl.LazyFrame({"a": [1, 2, 3], "b": ["A", "B", "C"]})
>>> lf.match_to_schema({"a": pl.Int64, "b": pl.String}).collect()
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════╡
│ 1   ┆ A   │
│ 2   ┆ B   │
│ 3   ┆ C   │
└─────┴─────┘
>>> (lf.match_to_schema({"a": pl.Int64}).collect())  
polars.exceptions.SchemaError: extra columns in `match_to_schema`: "b"

Adding missing columns

>>> (
...     pl.LazyFrame({"a": [1, 2, 3]})
...     .match_to_schema(
...         {"a": pl.Int64, "b": pl.String},
...         missing_columns="insert",
...     )
...     .collect()
... )
shape: (3, 2)
┌─────┬──────┐
│ a   ┆ b    │
│ --- ┆ ---  │
│ i64 ┆ str  │
╞═════╪══════╡
│ 1   ┆ null │
│ 2   ┆ null │
│ 3   ┆ null │
└─────┴──────┘
>>> (
...     pl.LazyFrame({"a": [1, 2, 3]})
...     .match_to_schema(
...         {"a": pl.Int64, "b": pl.String},
...         missing_columns={"b": pl.col.a.cast(pl.String)},
...     )
...     .collect()
... )
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════╡
│ 1   ┆ 1   │
│ 2   ┆ 2   │
│ 3   ┆ 3   │
└─────┴─────┘

Removing extra columns

>>> (
...     pl.LazyFrame({"a": [1, 2, 3], "b": ["A", "B", "C"]})
...     .match_to_schema(
...         {"a": pl.Int64},
...         extra_columns="ignore",
...     )
...     .collect()
... )
shape: (3, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
└─────┘

Upcasting integers and floats

>>> (
...     pl.LazyFrame(
...         {"a": [1, 2, 3], "b": [1.0, 2.0, 3.0]},
...         schema={"a": pl.Int32, "b": pl.Float32},
...     )
...     .match_to_schema(
...         {"a": pl.Int64, "b": pl.Float64},
...         integer_cast="upcast",
...         float_cast="upcast",
...     )
...     .collect()
... )
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═════╪═════╡
│ 1   ┆ 1.0 │
│ 2   ┆ 2.0 │
│ 3   ┆ 3.0 │
└─────┴─────┘