polars.LazyFrame.match_to_schema#
- LazyFrame.match_to_schema(
- schema: SchemaDict | Schema,
- *,
- missing_columns: Literal['insert', 'raise'] | Mapping[str, Literal['insert', 'raise'] | Expr] = 'raise',
- missing_struct_fields: Literal['insert', 'raise'] | Mapping[str, Literal['insert', 'raise']] = 'raise',
- extra_columns: Literal['ignore', 'raise'] = 'raise',
- extra_struct_fields: Literal['ignore', 'raise'] | Mapping[str, Literal['ignore', 'raise']] = 'raise',
- integer_cast: Literal['upcast', 'forbid'] | Mapping[str, Literal['upcast', 'forbid']] = 'forbid',
- float_cast: Literal['upcast', 'forbid'] | Mapping[str, Literal['upcast', 'forbid']] = 'forbid',
Match or evolve the schema of a LazyFrame into a specific schema.
By default, match_to_schema returns an error if the input schema does not exactly match the target schema. It also allows columns to be freely reordered, with additional coercion rules available through optional parameters.
Warning
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
- Parameters:
- schema
Target schema to match or evolve to.
- missing_columns
Raise of insert missing columns from the input with respect to the
schema
.This can also be an expression per column with what to insert if it is missing.
- missing_struct_fields
Raise of insert missing struct fields from the input with respect to the
schema
.- extra_columns
Raise of ignore extra columns from the input with respect to the
schema
.- extra_struct_fields
Raise of ignore extra struct fields from the input with respect to the
schema
.- integer_cast
Forbid of upcast for integer columns from the input to the respective column in
schema
.- float_cast
Forbid of upcast for float columns from the input to the respective column in
schema
.
Examples
Ensuring the schema matches
>>> lf = pl.LazyFrame({"a": [1, 2, 3], "b": ["A", "B", "C"]}) >>> lf.match_to_schema({"a": pl.Int64, "b": pl.String}).collect() shape: (3, 2) ┌─────┬─────┐ │ a ┆ b │ │ --- ┆ --- │ │ i64 ┆ str │ ╞═════╪═════╡ │ 1 ┆ A │ │ 2 ┆ B │ │ 3 ┆ C │ └─────┴─────┘ >>> (lf.match_to_schema({"a": pl.Int64}).collect()) polars.exceptions.SchemaError: extra columns in `match_to_schema`: "b"
Adding missing columns
>>> ( ... pl.LazyFrame({"a": [1, 2, 3]}) ... .match_to_schema( ... {"a": pl.Int64, "b": pl.String}, ... missing_columns="insert", ... ) ... .collect() ... ) shape: (3, 2) ┌─────┬──────┐ │ a ┆ b │ │ --- ┆ --- │ │ i64 ┆ str │ ╞═════╪══════╡ │ 1 ┆ null │ │ 2 ┆ null │ │ 3 ┆ null │ └─────┴──────┘ >>> ( ... pl.LazyFrame({"a": [1, 2, 3]}) ... .match_to_schema( ... {"a": pl.Int64, "b": pl.String}, ... missing_columns={"b": pl.col.a.cast(pl.String)}, ... ) ... .collect() ... ) shape: (3, 2) ┌─────┬─────┐ │ a ┆ b │ │ --- ┆ --- │ │ i64 ┆ str │ ╞═════╪═════╡ │ 1 ┆ 1 │ │ 2 ┆ 2 │ │ 3 ┆ 3 │ └─────┴─────┘
Removing extra columns
>>> ( ... pl.LazyFrame({"a": [1, 2, 3], "b": ["A", "B", "C"]}) ... .match_to_schema( ... {"a": pl.Int64}, ... extra_columns="ignore", ... ) ... .collect() ... ) shape: (3, 1) ┌─────┐ │ a │ │ --- │ │ i64 │ ╞═════╡ │ 1 │ │ 2 │ │ 3 │ └─────┘
Upcasting integers and floats
>>> ( ... pl.LazyFrame( ... {"a": [1, 2, 3], "b": [1.0, 2.0, 3.0]}, ... schema={"a": pl.Int32, "b": pl.Float32}, ... ) ... .match_to_schema( ... {"a": pl.Int64, "b": pl.Float64}, ... integer_cast="upcast", ... float_cast="upcast", ... ) ... .collect() ... ) shape: (3, 2) ┌─────┬─────┐ │ a ┆ b │ │ --- ┆ --- │ │ i64 ┆ f64 │ ╞═════╪═════╡ │ 1 ┆ 1.0 │ │ 2 ┆ 2.0 │ │ 3 ┆ 3.0 │ └─────┴─────┘