polars.DataFrame.drop_nans#

DataFrame.drop_nans( subset: ColumnNameOrSelector | Collection[ColumnNameOrSelector] | None = None, ) → DataFrame[source]#

Drop all rows that contain one or more NaN values.

The original order of the remaining rows is preserved.

Parameters:

subset: Column name(s) for which NaN values are considered; if set to None (default), use all columns (note that only floating-point columns can contain NaNs).

See also

drop_nulls

Notes

A NaN value is not the same as a null value. To drop null values, use drop_nulls().

Examples

>>> df = pl.DataFrame(
...     {
...         "foo": [-20.5, float("nan"), 80.0],
...         "bar": [float("nan"), 110.0, 25.5],
...         "ham": ["xxx", "yyy", None],
...     }
... )

The default behavior of this method is to drop rows where any single value in the row is NaN:

>>> df.drop_nans()
shape: (1, 3)
┌──────┬──────┬──────┐
│ foo  ┆ bar  ┆ ham  │
│ ---  ┆ ---  ┆ ---  │
│ f64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╡
│ 80.0 ┆ 25.5 ┆ null │
└──────┴──────┴──────┘

This behaviour can be constrained to consider only a subset of columns, as defined by name, or with a selector. For example, dropping rows only if there is a NaN in the “bar” column:

>>> df.drop_nans(subset=["bar"])
shape: (2, 3)
┌──────┬───────┬──────┐
│ foo  ┆ bar   ┆ ham  │
│ ---  ┆ ---   ┆ ---  │
│ f64  ┆ f64   ┆ str  │
╞══════╪═══════╪══════╡
│ NaN  ┆ 110.0 ┆ yyy  │
│ 80.0 ┆ 25.5  ┆ null │
└──────┴───────┴──────┘

Dropping a row only if all values are NaN requires a different formulation:

>>> df = pl.DataFrame(
...     {
...         "a": [float("nan"), float("nan"), float("nan"), float("nan")],
...         "b": [10.0, 2.5, float("nan"), 5.25],
...         "c": [65.75, float("nan"), float("nan"), 10.5],
...     }
... )
>>> df.filter(~pl.all_horizontal(pl.all().is_nan()))
shape: (3, 3)
┌─────┬──────┬───────┐
│ a   ┆ b    ┆ c     │
│ --- ┆ ---  ┆ ---   │
│ f64 ┆ f64  ┆ f64   │
╞═════╪══════╪═══════╡
│ NaN ┆ 10.0 ┆ 65.75 │
│ NaN ┆ 2.5  ┆ NaN   │
│ NaN ┆ 5.25 ┆ 10.5  │
└─────┴──────┴───────┘