polars.LazyFrame.filter#

LazyFrame.filter(
*predicates: IntoExprColumn | Iterable[IntoExprColumn] | bool | list[bool] | np.ndarray[Any, Any],
**constraints: Any,
) LazyFrame[source]#

Filter rows in the LazyFrame based on a predicate expression.

The original order of the remaining rows is preserved.

Rows where the filter predicate does not evaluate to True are discarded (this includes rows where the predicate evaluates as null).

Parameters:
predicates

Expression that evaluates to a boolean Series.

constraints

Column filters; use name = value to filter columns using the supplied value. Each constraint behaves the same as pl.col(name).eq(value), and is implicitly joined with the other filter conditions using &.

See also

remove

Notes

If you are transitioning from Pandas, and performing filter operations based on the comparison of two or more columns, please note that in Polars any comparison involving null values will result in a null result, not boolean True or False. As a result, these rows will not be retained. Ensure that null values are handled appropriately to avoid unexpected behaviour (see examples below).

Examples

>>> lf = pl.LazyFrame(
...     {
...         "foo": [1, 2, 3, None, 4, None, 0],
...         "bar": [6, 7, 8, None, None, 9, 0],
...         "ham": ["a", "b", "c", None, "d", "e", "f"],
...     }
... )

Filter on one condition:

>>> lf.filter(pl.col("foo") > 1).collect()
shape: (3, 3)
┌─────┬──────┬─────┐
│ foo ┆ bar  ┆ ham │
│ --- ┆ ---  ┆ --- │
│ i64 ┆ i64  ┆ str │
╞═════╪══════╪═════╡
│ 2   ┆ 7    ┆ b   │
│ 3   ┆ 8    ┆ c   │
│ 4   ┆ null ┆ d   │
└─────┴──────┴─────┘

Filter on multiple conditions:

>>> lf.filter((pl.col("foo") < 3) & (pl.col("ham") == "a")).collect()
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
└─────┴─────┴─────┘

Provide multiple filters using *args syntax:

>>> lf.filter(
...     pl.col("foo") == 1,
...     pl.col("ham") == "a",
... ).collect()
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
└─────┴─────┴─────┘

Provide multiple filters using **kwargs syntax:

>>> lf.filter(foo=1, ham="a").collect()
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
└─────┴─────┴─────┘

Filter on an OR condition:

>>> lf.filter(
...     (pl.col("foo") == 1) | (pl.col("ham") == "c"),
... ).collect()
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
│ 3   ┆ 8   ┆ c   │
└─────┴─────┴─────┘

Filter by comparing two columns against each other

>>> lf.filter(
...     pl.col("foo") == pl.col("bar"),
... ).collect()
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 0   ┆ 0   ┆ f   │
└─────┴─────┴─────┘
>>> lf.filter(
...     pl.col("foo") != pl.col("bar"),
... ).collect()
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
│ 2   ┆ 7   ┆ b   │
│ 3   ┆ 8   ┆ c   │
└─────┴─────┴─────┘

Notice how the row with None values is filtered out; using ne_missing ensures that null values compare equal, and we get similar behaviour to Pandas:

>>> lf.filter(
...     pl.col("foo").ne_missing(pl.col("bar")),
... ).collect()
shape: (5, 3)
┌──────┬──────┬─────┐
│ foo  ┆ bar  ┆ ham │
│ ---  ┆ ---  ┆ --- │
│ i64  ┆ i64  ┆ str │
╞══════╪══════╪═════╡
│ 1    ┆ 6    ┆ a   │
│ 2    ┆ 7    ┆ b   │
│ 3    ┆ 8    ┆ c   │
│ 4    ┆ null ┆ d   │
│ null ┆ 9    ┆ e   │
└──────┴──────┴─────┘