polars.LazyFrame.filter#
- LazyFrame.filter(
- *predicates: IntoExprColumn | Iterable[IntoExprColumn] | bool | list[bool] | np.ndarray[Any, Any],
- **constraints: Any,
Filter the rows in the LazyFrame based on a predicate expression.
The original order of the remaining rows is preserved.
Rows where the filter does not evaluate to True are discarded, including nulls.
- Parameters:
- predicates
Expression that evaluates to a boolean Series.
- constraints
Column filters; use
name = value
to filter columns by the supplied value. Each constraint will behave the same aspl.col(name).eq(value)
, and will be implicitly joined with the other filter conditions using&
.
Notes
If you are transitioning from pandas and performing filter operations based on the comparison of two or more columns, please note that in Polars, any comparison involving null values will always result in null. As a result, these rows will be filtered out. Ensure to handle null values appropriately to avoid unintended filtering (See examples below).
Examples
>>> lf = pl.LazyFrame( ... { ... "foo": [1, 2, 3, None, 4, None, 0], ... "bar": [6, 7, 8, None, None, 9, 0], ... "ham": ["a", "b", "c", None, "d", "e", "f"], ... } ... )
Filter on one condition:
>>> lf.filter(pl.col("foo") > 1).collect() shape: (3, 3) ┌─────┬──────┬─────┐ │ foo ┆ bar ┆ ham │ │ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str │ ╞═════╪══════╪═════╡ │ 2 ┆ 7 ┆ b │ │ 3 ┆ 8 ┆ c │ │ 4 ┆ null ┆ d │ └─────┴──────┴─────┘
Filter on multiple conditions:
>>> lf.filter((pl.col("foo") < 3) & (pl.col("ham") == "a")).collect() shape: (1, 3) ┌─────┬─────┬─────┐ │ foo ┆ bar ┆ ham │ │ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str │ ╞═════╪═════╪═════╡ │ 1 ┆ 6 ┆ a │ └─────┴─────┴─────┘
Provide multiple filters using
*args
syntax:>>> lf.filter( ... pl.col("foo") == 1, ... pl.col("ham") == "a", ... ).collect() shape: (1, 3) ┌─────┬─────┬─────┐ │ foo ┆ bar ┆ ham │ │ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str │ ╞═════╪═════╪═════╡ │ 1 ┆ 6 ┆ a │ └─────┴─────┴─────┘
Provide multiple filters using
**kwargs
syntax:>>> lf.filter(foo=1, ham="a").collect() shape: (1, 3) ┌─────┬─────┬─────┐ │ foo ┆ bar ┆ ham │ │ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str │ ╞═════╪═════╪═════╡ │ 1 ┆ 6 ┆ a │ └─────┴─────┴─────┘
Filter on an OR condition:
>>> lf.filter((pl.col("foo") == 1) | (pl.col("ham") == "c")).collect() shape: (2, 3) ┌─────┬─────┬─────┐ │ foo ┆ bar ┆ ham │ │ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str │ ╞═════╪═════╪═════╡ │ 1 ┆ 6 ┆ a │ │ 3 ┆ 8 ┆ c │ └─────┴─────┴─────┘
Filter by comparing two columns against each other
>>> lf.filter(pl.col("foo") == pl.col("bar")).collect() shape: (1, 3) ┌─────┬─────┬─────┐ │ foo ┆ bar ┆ ham │ │ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str │ ╞═════╪═════╪═════╡ │ 0 ┆ 0 ┆ f │ └─────┴─────┴─────┘
>>> lf.filter(pl.col("foo") != pl.col("bar")).collect() shape: (3, 3) ┌─────┬─────┬─────┐ │ foo ┆ bar ┆ ham │ │ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str │ ╞═════╪═════╪═════╡ │ 1 ┆ 6 ┆ a │ │ 2 ┆ 7 ┆ b │ │ 3 ┆ 8 ┆ c │ └─────┴─────┴─────┘
Notice how the row with
None
values is filtered out. In order to keep the same behavior as pandas, use:>>> lf.filter(pl.col("foo").ne_missing(pl.col("bar"))).collect() shape: (5, 3) ┌──────┬──────┬─────┐ │ foo ┆ bar ┆ ham │ │ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str │ ╞══════╪══════╪═════╡ │ 1 ┆ 6 ┆ a │ │ 2 ┆ 7 ┆ b │ │ 3 ┆ 8 ┆ c │ │ 4 ┆ null ┆ d │ │ null ┆ 9 ┆ e │ └──────┴──────┴─────┘