Module polars_lazy::dsl
source · Expand description
Domain specific language for the Lazy API.
This DSL revolves around the Expr type, which represents an abstract
operation on a DataFrame, such as mapping over a column, filtering, group_by, or aggregation.
In general, functions on LazyFrames consume the LazyFrame and produce a new LazyFrame representing
the result of applying the function and passed expressions to the consumed LazyFrame.
At runtime, when LazyFrame::collect is called, the expressions that comprise
the LazyFrame’s logical plan are materialized on the actual underlying Series.
For instance, let expr = col("x").pow(lit(2)).alias("x2"); would produce an expression representing the abstract
operation of squaring the column "x" and naming the resulting column "x2", and to apply this operation to a
LazyFrame, you’d use let lazy_df = lazy_df.with_column(expr);.
(Of course, a column named "x" must either exist in the original DataFrame or be produced by one of the preceding
operations on the LazyFrame.)
There are many, many free functions that this module exports that produce an Expr from scratch; col and
lit are two examples.
Expressions also have several methods, such as pow and alias, that consume them
and produce a new expression.
Several expressions are only available when the necessary feature is enabled.
Examples of features that unlock specialized expression include string, temporal, and dtype-categorical.
These specialized expressions provide implementations of functions that you’d otherwise have to implement by hand.
Because of how abstract and flexible the Expr type is, care must be take to ensure you only attempt to perform
sensible operations with them.
For instance, as mentioned above, you have to make sure any columns you reference already exist in the LazyFrame.
Furthermore, there is nothing stopping you from calling, for example, any with an expression
that will yield an f64 column (instead of bool), or col("string") - col("f64"), which would attempt
to subtract an f64 Series from a string Series.
These kinds of invalid operations will only yield an error at runtime, when
collect is called on the LazyFrame.
Re-exports§
pub use functions::*;
Modules§
- cat
dtype-categorical - dt
temporal - Functions
- python_udf
python - string
strings
Structs§
- Specialized expressions for
SeriesofDataType::Array. - Specialized expressions for Categorical dtypes.
- Utility struct for the
when-then-otherwiseexpression. - Utility struct for the
when-then-otherwiseexpression. - Arguments used by
datetimein order to produce anExprof Datetime - Specialized expressions for modifying the name of existing expressions.
- Specialized expressions for
SeriesofDataType::List. - Specialized expressions for Categorical dtypes.
- Wrapper type that has special equality properties depending on the inner type specialization
- Specialized expressions for Struct dtypes.
- Utility struct for the
when-then-otherwiseexpression. - Represents a user-defined function
- Utility struct for the
when-then-otherwiseexpression.
Enums§
- Expressions that can be used in various contexts. Queries consist of multiple expressions. When using the polars lazy API, don’t construct an
Exprdirectly; instead, create one using the functions in thepolars_lazy::dslmodule. See that module’s docs for more info.
Traits§
- ExprEvalExtension
cumulative_evalorlist_eval - IntoListNameSpace
list_eval - ListNameSpaceExtension
list_eval - A wrapper trait for any binary closure
Fn(Series, Series) -> PolarsResult<Series> - A wrapper trait for any closure
Fn(Vec<Series>) -> PolarsResult<Series>
Functions§
- Selects all columns. Shorthand for
col("*"). - Create a new column with the bitwise-and of the elements in each row.
- Create a new column with the bitwise-or of the elements in each row.
- Like
map_binary, but used in a group_by-aggregation context. - Apply a function/closure over the groups of multiple columns. This should only be used in a group_by aggregation.
- Generate a range of integers.
- arg_sort_by
rangeFind the indexes that would sort these series in order of appearance. That means that the firstSerieswill be used to determine the ordering until duplicates are found. Once duplicates are found, the nextSerieswill be used and so on. - arg_where
arg_whereGet the indices whereconditionevaluatestrue. - Take several expressions and collect them into a
StructChunked. - Find the mean of all the values in the column named
name. Alias formean. - Compute
op(l, r)(or equivalentlyl op r).landrmust have types compatible with the Operator. - business_day_count
dtype-date - Casts the column given by
Exprto a different type. - Folds the expressions from left to right keeping the first non-null values.
- Create a Column Expression based on a column name.
- Select multiple columns by name.
- Concat lists entries.
- concat_str
concat_strandstringsHorizontally concat string columns in linear time - Compute the covariance between two columns.
- cum_fold_exprs
dtype-structAccumulate over multiple columns horizontally / row wise. - cum_reduce_exprs
dtype-structAccumulate over multiple columns horizontally / row wise. - date_ranges
temporalCreate a column of date ranges from astartandstopexpression. - datetime
temporalConstruct a column ofDatetimefrom the providedDatetimeArgs. - datetime_range
dtype-datetimeCreate a datetime range from astartandstopexpression. - datetime_ranges
dtype-datetimeCreate a column of datetime ranges from astartandstopexpression. - Select multiple columns by dtype.
- Select multiple columns by dtype.
- duration
temporalConstruct a column ofDurationfrom the providedDurationArgs - First column in a DataFrame.
- Accumulate over multiple columns horizontally / row wise.
- format_str
concat_strandstringsFormat the results of an array of expressions using a format string - Select multiple columns by index.
- Generate a range of integers.
- Generate a range of integers for each row of the input columns.
- A column which is
falsewhereverexpris null,trueelsewhere. - A column which is
truewhereverexpris null,falseelsewhere. - Last column in a DataFrame.
- Return the number of rows in the context.
- Create a Literal Expression from
L. A literal expression behaves like a column that contains a single distinct value. - Apply a function/closure over multiple columns once the logical plan get executed.
- Apply a function/closure over multiple columns once the logical plan get executed.
- Find the maximum of all the values in the column named
name. Shorthand forcol(name).max(). - Create a new column with the maximum value per row.
- Find the mean of all the values in the column named
name. Shorthand forcol(name).mean(). - Compute the mean of all values horizontally across columns.
- Find the median of all the values in the column named
name. Shorthand forcol(name).median(). - Find the minimum of all the values in the column named
name. Shorthand forcol(name).min(). - Create a new column with the minimum value per row.
- Negates a boolean column.
- Nth column in a DataFrame.
- Compute the pearson correlation between two columns.
- Find a specific quantile of all the values in the column named
name. - Analogous to
Iterator::reduce. - Create a column of length
ncontainingncopies of the literalvalue. Generally you won’t need this function, aslit(value)already represents a column containing onlyvaluewhose length is automatically set to the correct number of rows. - rolling_corr
rolling_window - rolling_cov
rolling_window - spearman_rank_corr
rankandpropagate_nansCompute the spearman rank correlation between two columns. Missing data will be excluded from the computation. - Sum all the values in the column named
name. Shorthand forcol(name).sum(). - Sum all values horizontally across columns.
- time_ranges
dtype-timeCreate a column of time ranges from astartandstopexpression. - Start a
when-then-otherwiseexpression.
Type Aliases§
- FieldsNameMapper
dtype-struct