polars.DataFrame.filter#

DataFrame.filter(
*predicates: IntoExprColumn | Iterable[IntoExprColumn] | bool | list[bool] | np.ndarray[Any, Any],
**constraints: Any,
) DataFrame[source]#

Filter rows, retaining those that match the given predicate expression(s).

The original order of the remaining rows is preserved.

Only rows where the predicate resolves as True are retained; when the predicate result is False (or null), the row is discarded.

Parameters:
predicates

Expression(s) that evaluate to a boolean Series.

constraints

Column filters; use name = value to filter columns by the supplied value. Each constraint will behave the same as pl.col(name).eq(value), and be implicitly joined with the other filter conditions using &.

See also

remove

Notes

If you are transitioning from Pandas, and performing filter operations based on the comparison of two or more columns, please note that in Polars any comparison involving null values will result in a null result, not boolean True or False. As a result, these rows will not be retained. Ensure that null values are handled appropriately to avoid unexpected behaviour (see examples below).

Examples

>>> df = pl.DataFrame(
...     {
...         "foo": [1, 2, 3, None, 4, None, 0],
...         "bar": [6, 7, 8, None, None, 9, 0],
...         "ham": ["a", "b", "c", None, "d", "e", "f"],
...     }
... )

Filter rows matching a condition:

>>> df.filter(pl.col("foo") > 1)
shape: (3, 3)
┌─────┬──────┬─────┐
│ foo ┆ bar  ┆ ham │
│ --- ┆ ---  ┆ --- │
│ i64 ┆ i64  ┆ str │
╞═════╪══════╪═════╡
│ 2   ┆ 7    ┆ b   │
│ 3   ┆ 8    ┆ c   │
│ 4   ┆ null ┆ d   │
└─────┴──────┴─────┘

Filter on multiple conditions, combined with and/or operators:

>>> df.filter(
...     (pl.col("foo") < 3) & (pl.col("ham") == "a"),
... )
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
└─────┴─────┴─────┘
>>> df.filter(
...     (pl.col("foo") == 1) | (pl.col("ham") == "c"),
... )
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
│ 3   ┆ 8   ┆ c   │
└─────┴─────┴─────┘

Provide multiple filters using *args syntax:

>>> df.filter(
...     pl.col("foo") <= 2,
...     ~pl.col("ham").is_in(["b", "c"]),
... )
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
│ 0   ┆ 0   ┆ f   │
└─────┴─────┴─────┘

Provide multiple filters using **kwargs syntax:

>>> df.filter(foo=2, ham="b")
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 2   ┆ 7   ┆ b   │
└─────┴─────┴─────┘

Filter by comparing two columns against each other:

>>> df.filter(
...     pl.col("foo") == pl.col("bar"),
... )
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 0   ┆ 0   ┆ f   │
└─────┴─────┴─────┘
>>> df.filter(
...     pl.col("foo") != pl.col("bar"),
... )
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
│ 2   ┆ 7   ┆ b   │
│ 3   ┆ 8   ┆ c   │
└─────┴─────┴─────┘

Notice how the row with None values is filtered out. In order to keep the same behavior as pandas, use:

>>> df.filter(
...     pl.col("foo").ne_missing(pl.col("bar")),
... )
shape: (5, 3)
┌──────┬──────┬─────┐
│ foo  ┆ bar  ┆ ham │
│ ---  ┆ ---  ┆ --- │
│ i64  ┆ i64  ┆ str │
╞══════╪══════╪═════╡
│ 1    ┆ 6    ┆ a   │
│ 2    ┆ 7    ┆ b   │
│ 3    ┆ 8    ┆ c   │
│ 4    ┆ null ┆ d   │
│ null ┆ 9    ┆ e   │
└──────┴──────┴─────┘