Module dsl

Expand description

Domain specific language for the Lazy API.

This DSL revolves around the Expr type, which represents an abstract operation on a DataFrame, such as mapping over a column, filtering, group_by, or aggregation. In general, functions on LazyFrames consume the LazyFrame and produce a new LazyFrame representing the result of applying the function and passed expressions to the consumed LazyFrame. At runtime, when LazyFrame::collect is called, the expressions that comprise the LazyFrame’s logical plan are materialized on the actual underlying Series. For instance, let expr = col("x").pow(lit(2)).alias("x2"); would produce an expression representing the abstract operation of squaring the column "x" and naming the resulting column "x2", and to apply this operation to a LazyFrame, you’d use let lazy_df = lazy_df.with_column(expr);. (Of course, a column named "x" must either exist in the original DataFrame or be produced by one of the preceding operations on the LazyFrame.)

There are many, many free functions that this module exports that produce an Expr from scratch; col and lit are two examples. Expressions also have several methods, such as pow and alias, that consume them and produce a new expression.

Several expressions are only available when the necessary feature is enabled. Examples of features that unlock specialized expression include string, temporal, and dtype-categorical. These specialized expressions provide implementations of functions that you’d otherwise have to implement by hand.

Because of how abstract and flexible the Expr type is, care must be take to ensure you only attempt to perform sensible operations with them. For instance, as mentioned above, you have to make sure any columns you reference already exist in the LazyFrame. Furthermore, there is nothing stopping you from calling, for example, any with an expression that will yield an f64 column (instead of bool), or col("string") - col("f64"), which would attempt to subtract an f64 Series from a string Series. These kinds of invalid operations will only yield an error at runtime, when collect is called on the LazyFrame.

Re-exports§

pub use functions::*;

Modules§

binary
catdtype-categorical
dttemporal
function_expr
functions: Functions
python_dslpython
stringstrings
udf

Structs§

AnonymousScanOptions
ArrayNameSpace: Specialized expressions for Series of DataType::Array.
CategoricalNameSpace: Specialized expressions for Categorical dtypes.
ChainedThen: Utility struct for the when-then-otherwise expression.
ChainedWhen: Utility struct for the when-then-otherwise expression.
DatetimeArgs: Arguments used by datetime in order to produce an Expr of Datetime
DistinctOptionsDSL
DslBuilder
DurationArgs: Arguments used by duration in order to produce an Expr of Duration
ExprNameNameSpace: Specialized expressions for modifying the name of existing expressions.
FieldsMapper
FileScanOptions: Generic options for all file types.
FileSinkOptions
FileSinkType
GroupbyOptions
HConcatOptions
JoinOptions
ListNameSpace: Specialized expressions for Series of DataType::List.
LogicalPlanUdfOptions
MetaNameSpace: Specialized expressions for Categorical dtypes.
NDJsonReadOptionsjson
PartitionSinkType
PartitionSinkTypeIR
RollingCovOptions
ScanFlags
ScanSourceIter: An iterator for ScanSources
SinkOptions: Options that apply to all sinks.
SpecialEq: Wrapper type that has special equality properties depending on the inner type specialization
StrptimeOptions
StructNameSpace: Specialized expressions for Struct dtypes.
Then: Utility struct for the when-then-otherwise expression.
UnionArgs
UnionOptions
UnpivotArgsDSL
UserDefinedFunction: Represents a user-defined function
When: Utility struct for the when-then-otherwise expression.

Enums§

AggExpr
ArrayFunction
BinaryFunction
BitwiseFunction
BooleanFunction
BusinessFunction
CategoricalFunction
CorrelationMethod
DslPlan
Engine
Excluded
Expr: Expressions that can be used in various contexts.
FileScan
FileType
FunctionExpr
FusedOperator
JoinTypeOptionsIR
LazySerde
ListFunction
NestedType
Operator
PartitionVariant
PartitionVariantIR
PowFunction
RandomMethod
ReshapeDimension: A dimension in a reshape.
ScanSource: A single source to scan from
ScanSourceRef: A reference to a single item in ScanSources
ScanSources: Set of sources to scan from
Selector
SinkType
SinkTypeIR
StringFunction
StructFunction
TemporalFunction
TrigonometricFunction
WindowMapping
WindowType

Statics§

DSL_VERSION

Traits§

BinaryUdfOutputField
ColumnBinaryUdf: A wrapper trait for any binary closure Fn(Column, Column) -> PolarsResult<Column>
ColumnsUdf: A wrapper trait for any closure Fn(Vec<Series>) -> PolarsResult<Series>
ExprEvalExtensioncumulative_eval or list_eval
FunctionOutputField
IntoListNameSpacelist_eval
ListNameSpaceExtensionlist_eval
RenameAliasFn
UdfSchema

Functions§

all: Selects all columns. Shorthand for col("*").
all_horizontal: Create a new column with the bitwise-and of the elements in each row.
any_horizontal: Create a new column with the bitwise-or of the elements in each row.
apply_binary: Like map_binary, but used in a group_by-aggregation context.
apply_multiple: Apply a function/closure over the groups of multiple columns. This should only be used in a group_by aggregation.
arange: Generate a range of integers.
arg_sort_byrange: Find the indexes that would sort these series in order of appearance.
arg_wherearg_where: Get the indices where condition evaluates true.
as_struct: Take several expressions and collect them into a StructChunked.
avg: Find the mean of all the values in the column named name. Alias for mean.
binary_expr: Compute op(l, r) (or equivalently l op r). l and r must have types compatible with the Operator.
business_day_countdtype-date
cast: Casts the column given by Expr to a different type.
coalesce: Folds the expressions from left to right keeping the first non-null values.
col: Create a Column Expression based on a column name.
cols: Select multiple columns by name.
concat_arr: Horizontally concatenate columns into a single array-type column.
concat_expr
concat_list: Concat lists entries.
concat_strconcat_str and strings: Horizontally concat string columns in linear time
cov: Compute the covariance between two columns.
cum_fold_exprsdtype-struct: Accumulate over multiple columns horizontally / row wise.
cum_reduce_exprsdtype-struct: Accumulate over multiple columns horizontally / row wise.
date_rangestemporal: Create a column of date ranges from a start and stop expression.
datetime: Construct a column of Datetime from the provided DatetimeArgs.
datetime_rangedtype-datetime: Create a datetime range from a start and stop expression.
datetime_rangesdtype-datetime: Create a column of datetime ranges from a start and stop expression.
dtype_col: Select multiple columns by dtype.
dtype_cols: Select multiple columns by dtype.
duration: Construct a column of Duration from the provided DurationArgs
first: First column in a DataFrame.
fold_exprs: Accumulate over multiple columns horizontally / row wise.
format_strconcat_str and strings: Format the results of an array of expressions using a format string
index_cols: Select multiple columns by index.
int_range: Generate a range of integers.
int_ranges: Generate a range of integers for each row of the input columns.
is_not_null: A column which is false wherever expr is null, true elsewhere.
is_null: A column which is true wherever expr is null, false elsewhere.
last: Last column in a DataFrame.
len: Return the number of rows in the context.
linear_space: Generate a series of equally-spaced points.
linear_spaces: Create a column of linearly-spaced sequences from ‘start’, ‘end’, and ‘num_samples’ expressions.
lit: Create a Literal Expression from L. A literal expression behaves like a column that contains a single distinct value.
map_binary: Apply a closure on the two columns that are evaluated from Expr a and Expr b.
map_list_multiple: Apply a function/closure over multiple columns once the logical plan get executed.
map_multiple: Apply a function/closure over multiple columns once the logical plan get executed.
max: Find the maximum of all the values in the column named name. Shorthand for col(name).max().
max_horizontal: Create a new column with the maximum value per row.
mean: Find the mean of all the values in the column named name. Shorthand for col(name).mean().
mean_horizontal: Compute the mean of all values horizontally across columns.
median: Find the median of all the values in the column named name. Shorthand for col(name).median().
min: Find the minimum of all the values in the column named name. Shorthand for col(name).min().
min_horizontal: Create a new column with the minimum value per row.
not: Negates a boolean column.
nth: Nth column in a DataFrame.
pearson_corr: Compute the pearson correlation between two columns.
quantile: Find a specific quantile of all the values in the column named name.
reduce_exprs: Analogous to Iterator::reduce.
repeat: Create a column of length n containing n copies of the literal value.
rolling_corrrolling_window and cov
rolling_covrolling_window and cov
spearman_rank_corrrank and propagate_nans: Compute the spearman rank correlation between two columns. Missing data will be excluded from the computation.
sum: Sum all the values in the column named name. Shorthand for col(name).sum().
sum_horizontal: Sum all values horizontally across columns.
ternary_expr
time_rangesdtype-time: Create a column of time ranges from a start and stop expression.
when: Start a when-then-otherwise expression.

Type Aliases§

FieldsNameMapperdtype-struct
FileCount
GetOutput
OpaqueColumnUdf

Module dslCopy item path