Module dsl

Module dsl 

Source
Expand description

Domain specific language for the Lazy API.

This DSL revolves around the Expr type, which represents an abstract operation on a DataFrame, such as mapping over a column, filtering, group_by, or aggregation. In general, functions on LazyFrames consume the LazyFrame and produce a new LazyFrame representing the result of applying the function and passed expressions to the consumed LazyFrame. At runtime, when LazyFrame::collect is called, the expressions that comprise the LazyFrame’s logical plan are materialized on the actual underlying Series. For instance, let expr = col("x").pow(lit(2)).alias("x2"); would produce an expression representing the abstract operation of squaring the column "x" and naming the resulting column "x2", and to apply this operation to a LazyFrame, you’d use let lazy_df = lazy_df.with_column(expr);. (Of course, a column named "x" must either exist in the original DataFrame or be produced by one of the preceding operations on the LazyFrame.)

There are many, many free functions that this module exports that produce an Expr from scratch; col and lit are two examples. Expressions also have several methods, such as pow and alias, that consume them and produce a new expression.

Several expressions are only available when the necessary feature is enabled. Examples of features that unlock specialized expression include string, temporal, and dtype-categorical. These specialized expressions provide implementations of functions that you’d otherwise have to implement by hand.

Because of how abstract and flexible the Expr type is, care must be take to ensure you only attempt to perform sensible operations with them. For instance, as mentioned above, you have to make sure any columns you reference already exist in the LazyFrame. Furthermore, there is nothing stopping you from calling, for example, any with an expression that will yield an f64 column (instead of bool), or col("string") - col("f64"), which would attempt to subtract an f64 Series from a string Series. These kinds of invalid operations will only yield an error at runtime, when collect is called on the LazyFrame.

Re-exports§

pub use functions::*;

Modules§

anonymous
binary
catdtype-categorical
default_values
deletion
dttemporal
file_provider
function_expr
functions
Functions
python_datasetpython
python_dslpython
sink
stringstrings
udf

Structs§

AnonymousScanOptions
ArrayNameSpace
Specialized expressions for Series of DataType::Array.
BaseColumnUdf
CallbackSinkType
CastColumnsPolicy
Used by scans.
CategoricalNameSpace
Specialized expressions for Categorical dtypes.
ChainedThen
Utility struct for the when-then-otherwise expression.
ChainedWhen
Utility struct for the when-then-otherwise expression.
DistinctOptionsDSL
DslBuilder
ExprNameNameSpace
Specialized expressions for modifying the name of existing expressions.
ExtensionNameSpace
Specialized expressions for Categorical dtypes.
FileSinkOptions
GroupbyOptions
HConcatOptions
JoinOptions
JoinOptionsIR
ListNameSpace
Specialized expressions for Series of DataType::List.
LogicalPlanUdfOptions
MatchToSchemaPerColumn
MetaNameSpace
Specialized expressions for Categorical dtypes.
NDJsonReadOptionsjson
PartitionedSinkOptions
PartitionedSinkOptionsIR
PlanSerializationContext
PredicateFileSkip
PythonDatasetProviderVTable
RollingCovOptions
ScanFlags
ScanSourceIter
An iterator for ScanSources
SpecialEq
Wrapper type that has special equality properties depending on the inner type specialization
StrptimeOptions
StructNameSpace
Specialized expressions for Struct dtypes.
TableStatistics
Then
Utility struct for the when-then-otherwise expression.
TimeUnitSet
UnifiedScanArgs
Scan arguments shared across different scan types.
UnifiedSinkArgs
UnionArgs
UnionOptions
UnpivotArgsDSL
UserDefinedFunction
Represents a user-defined function
When
Utility struct for the when-then-otherwise expression.

Enums§

AggExpr
ArrayDataTypeFunction
ArrayFunction
BinaryFunction
BitwiseFunction
BooleanFunction
BusinessFunction
CategoricalFunction
ColumnMapping
CorrelationMethod
DataTypeExpr
DataTypeFunction
DataTypeSelector
DateRangeArgsdtype-date or dtype-datetime
DslPlan
Engine
EvalVariant
Excluded
Expr
Expressions that can be used in various contexts.
ExtensionFunction
ExtraColumnsPolicy
FileScanDsl
Note: This is cheaply cloneable.
FileScanIR
Note: This is cheaply cloneable.
FileWriteFormat
FunctionExpr
JoinTypeOptionsIR
LazySerde
ListFunction
MissingColumnsPolicy
MissingColumnsPolicyOrExpr
Operator
PartitionStrategy
PartitionStrategyIR
PowFunction
RandomMethod
RangeFunction
RenameAliasFn
ReshapeDimension
A dimension in a reshape.
RollingFunction
RollingFunctionBy
ScanSource
A single source to scan from
ScanSourceRef
A reference to a single item in ScanSources
ScanSources
Set of sources to scan from
Selector
SinkDestination
SinkTarget
SinkType
SinkTypeIR
StringFunction
StructDataTypeFunction
StructFunction
TemporalFunction
TimeZoneSet
TrigonometricFunction
UpcastOrForbid
WindowMapping

Constants§

DSL_VERSION

Statics§

DATASET_PROVIDER_VTABLE
This is for polars-python to inject so that the implementation can be done there:

Traits§

AnonymousColumnsUdf
AnonymousStreamingAgg
ColumnsUdf
A wrapper trait for any closure Fn(Vec<Series>) -> PolarsResult<Series>
UdfSchema

Functions§

apply_multiple
Apply a function/closure over the groups of multiple columns. This should only be used in a group_by aggregation.
binary_expr
Compute op(l, r) (or equivalently l op r). l and r must have types compatible with the Operator.
map_multiple
Apply a function/closure over multiple columns once the logical plan get executed.
new_column_udf
ternary_expr
when
Start a when-then-otherwise expression.

Type Aliases§

DslNameGeneratorarray_to_struct or list_to_struct
FieldsNameMapperdtype-struct
OpaqueColumnUdf
OpaqueStreamingAgg
RenameAliasRustFn