polars.LazyFrame.drop_nans#
- LazyFrame.drop_nans(
- subset: ColumnNameOrSelector | Collection[ColumnNameOrSelector] | None = None,
Drop all rows that contain one or more NaN values.
The original order of the remaining rows is preserved.
- Parameters:
- subset
Column name(s) for which NaN values are considered; if set to
None
(default), use all columns (note that only floating-point columns can contain NaNs).
Examples
>>> lf = pl.LazyFrame( ... { ... "foo": [-20.5, float("nan"), 80.0], ... "bar": [float("nan"), 110.0, 25.5], ... "ham": ["xxx", "yyy", None], ... } ... )
The default behavior of this method is to drop rows where any single value in the row is NaN:
>>> lf.drop_nans().collect() shape: (1, 3) ┌──────┬──────┬──────┐ │ foo ┆ bar ┆ ham │ │ --- ┆ --- ┆ --- │ │ f64 ┆ f64 ┆ str │ ╞══════╪══════╪══════╡ │ 80.0 ┆ 25.5 ┆ null │ └──────┴──────┴──────┘
This behaviour can be constrained to consider only a subset of columns, as defined by name, or with a selector. For example, dropping rows only if there is a NaN in the “bar” column:
>>> lf.drop_nans(subset=["bar"]).collect() shape: (2, 3) ┌──────┬───────┬──────┐ │ foo ┆ bar ┆ ham │ │ --- ┆ --- ┆ --- │ │ f64 ┆ f64 ┆ str │ ╞══════╪═══════╪══════╡ │ NaN ┆ 110.0 ┆ yyy │ │ 80.0 ┆ 25.5 ┆ null │ └──────┴───────┴──────┘
Dropping a row only if all values are NaN requires a different formulation:
>>> lf = pl.LazyFrame( ... { ... "a": [float("nan"), float("nan"), float("nan"), float("nan")], ... "b": [10.0, 2.5, float("nan"), 5.25], ... "c": [65.75, float("nan"), float("nan"), 10.5], ... } ... ) >>> lf.filter(~pl.all_horizontal(pl.all().is_nan())).collect() shape: (3, 3) ┌─────┬──────┬───────┐ │ a ┆ b ┆ c │ │ --- ┆ --- ┆ --- │ │ f64 ┆ f64 ┆ f64 │ ╞═════╪══════╪═══════╡ │ NaN ┆ 10.0 ┆ 65.75 │ │ NaN ┆ 2.5 ┆ NaN │ │ NaN ┆ 5.25 ┆ 10.5 │ └─────┴──────┴───────┘