polars.LazyFrame.drop_nulls#

LazyFrame.drop_nulls(
subset: ColumnNameOrSelector | Collection[ColumnNameOrSelector] | None = None,
) LazyFrame[source]#

Drop all rows that contain one or more null values.

The original order of the remaining rows is preserved.

Parameters:
subset

Column name(s) for which null values are considered. If set to None (default), use all columns.

Examples

>>> lf = pl.LazyFrame(
...     {
...         "foo": [1, 2, 3],
...         "bar": [6, None, 8],
...         "ham": ["a", "b", None],
...     }
... )

The default behavior of this method is to drop rows where any single value in the row is null:

>>> lf.drop_nulls().collect()
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
└─────┴─────┴─────┘

This behaviour can be constrained to consider only a subset of columns, as defined by name or with a selector. For example, dropping rows if there is a null in any of the integer columns:

>>> import polars.selectors as cs
>>> lf.drop_nulls(subset=cs.integer()).collect()
shape: (2, 3)
┌─────┬─────┬──────┐
│ foo ┆ bar ┆ ham  │
│ --- ┆ --- ┆ ---  │
│ i64 ┆ i64 ┆ str  │
╞═════╪═════╪══════╡
│ 1   ┆ 6   ┆ a    │
│ 3   ┆ 8   ┆ null │
└─────┴─────┴──────┘

Dropping a row only if all values are null requires a different formulation:

>>> lf = pl.LazyFrame(
...     {
...         "a": [None, None, None, None],
...         "b": [1, 2, None, 1],
...         "c": [1, None, None, 1],
...     }
... )
>>> lf.filter(~pl.all_horizontal(pl.all().is_null())).collect()
shape: (3, 3)
┌──────┬─────┬──────┐
│ a    ┆ b   ┆ c    │
│ ---  ┆ --- ┆ ---  │
│ null ┆ i64 ┆ i64  │
╞══════╪═════╪══════╡
│ null ┆ 1   ┆ 1    │
│ null ┆ 2   ┆ null │
│ null ┆ 1   ┆ 1    │
└──────┴─────┴──────┘