Module polars_lazy::dsl
source · Expand description
Domain specific language for the Lazy API.
This DSL revolves around the Expr
type, which represents an abstract
operation on a DataFrame, such as mapping over a column, filtering, group_by, or aggregation.
In general, functions on LazyFrame
s consume the LazyFrame
and produce a new LazyFrame
representing
the result of applying the function and passed expressions to the consumed LazyFrame.
At runtime, when LazyFrame::collect
is called, the expressions that comprise
the LazyFrame
’s logical plan are materialized on the actual underlying Series.
For instance, let expr = col("x").pow(lit(2)).alias("x2");
would produce an expression representing the abstract
operation of squaring the column "x"
and naming the resulting column "x2"
, and to apply this operation to a
LazyFrame
, you’d use let lazy_df = lazy_df.with_column(expr);
.
(Of course, a column named "x"
must either exist in the original DataFrame or be produced by one of the preceding
operations on the LazyFrame
.)
There are many, many free functions that this module exports that produce an Expr
from scratch; col
and
lit
are two examples.
Expressions also have several methods, such as pow
and alias
, that consume them
and produce a new expression.
Several expressions are only available when the necessary feature is enabled.
Examples of features that unlock specialized expression include string
, temporal
, and dtype-categorical
.
These specialized expressions provide implementations of functions that you’d otherwise have to implement by hand.
Because of how abstract and flexible the Expr
type is, care must be take to ensure you only attempt to perform
sensible operations with them.
For instance, as mentioned above, you have to make sure any columns you reference already exist in the LazyFrame.
Furthermore, there is nothing stopping you from calling, for example, any
with an expression
that will yield an f64
column (instead of bool
), or col("string") - col("f64")
, which would attempt
to subtract an f64
Series from a string
Series.
These kinds of invalid operations will only yield an error at runtime, when
collect
is called on the LazyFrame
.
Re-exports§
pub use functions::*;
Modules§
- cat
dtype-categorical
- dt
temporal
- Functions
- python_udf
python
- string
strings
Structs§
- Specialized expressions for
Series
ofDataType::Array
. - Specialized expressions for Categorical dtypes.
- Utility struct for the
when-then-otherwise
expression. - Utility struct for the
when-then-otherwise
expression. - Arguments used by
datetime
in order to produce anExpr
of Datetime - Specialized expressions for modifying the name of existing expressions.
- Specialized expressions for
Series
ofDataType::List
. - Specialized expressions for Categorical dtypes.
- Wrapper type that has special equality properties depending on the inner type specialization
- Specialized expressions for Struct dtypes.
- Utility struct for the
when-then-otherwise
expression. - Represents a user-defined function
- Utility struct for the
when-then-otherwise
expression.
Enums§
- Expressions that can be used in various contexts. Queries consist of multiple expressions. When using the polars lazy API, don’t construct an
Expr
directly; instead, create one using the functions in thepolars_lazy::dsl
module. See that module’s docs for more info.
Traits§
- ExprEvalExtension
cumulative_eval
orlist_eval
- IntoListNameSpace
list_eval
- ListNameSpaceExtension
list_eval
- A wrapper trait for any binary closure
Fn(Series, Series) -> PolarsResult<Series>
- A wrapper trait for any closure
Fn(Vec<Series>) -> PolarsResult<Series>
Functions§
- Selects all columns. Shorthand for
col("*")
. - Create a new column with the bitwise-and of the elements in each row.
- Create a new column with the bitwise-or of the elements in each row.
- Like
map_binary
, but used in a group_by-aggregation context. - Apply a function/closure over the groups of multiple columns. This should only be used in a group_by aggregation.
- Generate a range of integers.
- arg_sort_by
range
Find the indexes that would sort these series in order of appearance. That means that the firstSeries
will be used to determine the ordering until duplicates are found. Once duplicates are found, the nextSeries
will be used and so on. - arg_where
arg_where
Get the indices wherecondition
evaluatestrue
. - Take several expressions and collect them into a
StructChunked
. - Find the mean of all the values in the column named
name
. Alias formean
. - Compute
op(l, r)
(or equivalentlyl op r
).l
andr
must have types compatible with the Operator. - business_day_count
dtype-date
- Casts the column given by
Expr
to a different type. - Folds the expressions from left to right keeping the first non-null values.
- Create a Column Expression based on a column name.
- Select multiple columns by name.
- Concat lists entries.
- concat_str
concat_str
andstrings
Horizontally concat string columns in linear time - Compute the covariance between two columns.
- cum_fold_exprs
dtype-struct
Accumulate over multiple columns horizontally / row wise. - cum_reduce_exprs
dtype-struct
Accumulate over multiple columns horizontally / row wise. - date_ranges
temporal
Create a column of date ranges from astart
andstop
expression. - datetime
temporal
Construct a column ofDatetime
from the providedDatetimeArgs
. - datetime_range
dtype-datetime
Create a datetime range from astart
andstop
expression. - datetime_ranges
dtype-datetime
Create a column of datetime ranges from astart
andstop
expression. - Select multiple columns by dtype.
- Select multiple columns by dtype.
- duration
temporal
Construct a column ofDuration
from the providedDurationArgs
- First column in a DataFrame.
- Accumulate over multiple columns horizontally / row wise.
- format_str
concat_str
andstrings
Format the results of an array of expressions using a format string - Select multiple columns by index.
- Generate a range of integers.
- Generate a range of integers for each row of the input columns.
- A column which is
false
whereverexpr
is null,true
elsewhere. - A column which is
true
whereverexpr
is null,false
elsewhere. - Last column in a DataFrame.
- Return the number of rows in the context.
- Create a Literal Expression from
L
. A literal expression behaves like a column that contains a single distinct value. - Apply a function/closure over multiple columns once the logical plan get executed.
- Apply a function/closure over multiple columns once the logical plan get executed.
- Find the maximum of all the values in the column named
name
. Shorthand forcol(name).max()
. - Create a new column with the maximum value per row.
- Find the mean of all the values in the column named
name
. Shorthand forcol(name).mean()
. - Compute the mean of all values horizontally across columns.
- Find the median of all the values in the column named
name
. Shorthand forcol(name).median()
. - Find the minimum of all the values in the column named
name
. Shorthand forcol(name).min()
. - Create a new column with the minimum value per row.
- Negates a boolean column.
- Nth column in a DataFrame.
- Compute the pearson correlation between two columns.
- Find a specific quantile of all the values in the column named
name
. - Analogous to
Iterator::reduce
. - Create a column of length
n
containingn
copies of the literalvalue
. Generally you won’t need this function, aslit(value)
already represents a column containing onlyvalue
whose length is automatically set to the correct number of rows. - rolling_corr
rolling_window
- rolling_cov
rolling_window
- spearman_rank_corr
rank
andpropagate_nans
Compute the spearman rank correlation between two columns. Missing data will be excluded from the computation. - Sum all the values in the column named
name
. Shorthand forcol(name).sum()
. - Sum all values horizontally across columns.
- time_ranges
dtype-time
Create a column of time ranges from astart
andstop
expression. - Start a
when-then-otherwise
expression.
Type Aliases§
- FieldsNameMapper
dtype-struct