polars_lazy

Module dsl

Source
Expand description

Domain specific language for the Lazy API.

This DSL revolves around the Expr type, which represents an abstract operation on a DataFrame, such as mapping over a column, filtering, group_by, or aggregation. In general, functions on LazyFrames consume the LazyFrame and produce a new LazyFrame representing the result of applying the function and passed expressions to the consumed LazyFrame. At runtime, when LazyFrame::collect is called, the expressions that comprise the LazyFrame’s logical plan are materialized on the actual underlying Series. For instance, let expr = col("x").pow(lit(2)).alias("x2"); would produce an expression representing the abstract operation of squaring the column "x" and naming the resulting column "x2", and to apply this operation to a LazyFrame, you’d use let lazy_df = lazy_df.with_column(expr);. (Of course, a column named "x" must either exist in the original DataFrame or be produced by one of the preceding operations on the LazyFrame.)

There are many, many free functions that this module exports that produce an Expr from scratch; col and lit are two examples. Expressions also have several methods, such as pow and alias, that consume them and produce a new expression.

Several expressions are only available when the necessary feature is enabled. Examples of features that unlock specialized expression include string, temporal, and dtype-categorical. These specialized expressions provide implementations of functions that you’d otherwise have to implement by hand.

Because of how abstract and flexible the Expr type is, care must be take to ensure you only attempt to perform sensible operations with them. For instance, as mentioned above, you have to make sure any columns you reference already exist in the LazyFrame. Furthermore, there is nothing stopping you from calling, for example, any with an expression that will yield an f64 column (instead of bool), or col("string") - col("f64"), which would attempt to subtract an f64 Series from a string Series. These kinds of invalid operations will only yield an error at runtime, when collect is called on the LazyFrame.

Re-exports§

Modules§

Structs§

Enums§

Traits§

Functions§

  • Selects all columns. Shorthand for col("*").
  • Create a new column with the bitwise-and of the elements in each row.
  • Create a new column with the bitwise-or of the elements in each row.
  • Like map_binary, but used in a group_by-aggregation context.
  • Apply a function/closure over the groups of multiple columns. This should only be used in a group_by aggregation.
  • Generate a range of integers.
  • Find the indexes that would sort these series in order of appearance.
  • arg_wherearg_where
    Get the indices where condition evaluates true.
  • Take several expressions and collect them into a StructChunked.
  • Find the mean of all the values in the column named name. Alias for mean.
  • Compute op(l, r) (or equivalently l op r). l and r must have types compatible with the Operator.
  • Casts the column given by Expr to a different type.
  • Folds the expressions from left to right keeping the first non-null values.
  • Create a Column Expression based on a column name.
  • Select multiple columns by name.
  • Horizontally concatenate columns into a single array-type column.
  • Concat lists entries.
  • concat_strconcat_str and strings
    Horizontally concat string columns in linear time
  • Compute the covariance between two columns.
  • cum_fold_exprsdtype-struct
    Accumulate over multiple columns horizontally / row wise.
  • cum_reduce_exprsdtype-struct
    Accumulate over multiple columns horizontally / row wise.
  • date_rangestemporal
    Create a column of date ranges from a start and stop expression.
  • Construct a column of Datetime from the provided DatetimeArgs.
  • datetime_rangedtype-datetime
    Create a datetime range from a start and stop expression.
  • datetime_rangesdtype-datetime
    Create a column of datetime ranges from a start and stop expression.
  • Select multiple columns by dtype.
  • Select multiple columns by dtype.
  • Construct a column of Duration from the provided DurationArgs
  • First column in a DataFrame.
  • Accumulate over multiple columns horizontally / row wise.
  • format_strconcat_str and strings
    Format the results of an array of expressions using a format string
  • Select multiple columns by index.
  • Generate a range of integers.
  • Generate a range of integers for each row of the input columns.
  • A column which is false wherever expr is null, true elsewhere.
  • A column which is true wherever expr is null, false elsewhere.
  • Last column in a DataFrame.
  • Return the number of rows in the context.
  • Create a Literal Expression from L. A literal expression behaves like a column that contains a single distinct value.
  • Apply a closure on the two columns that are evaluated from Expr a and Expr b.
  • Apply a function/closure over multiple columns once the logical plan get executed.
  • Apply a function/closure over multiple columns once the logical plan get executed.
  • Find the maximum of all the values in the column named name. Shorthand for col(name).max().
  • Create a new column with the maximum value per row.
  • Find the mean of all the values in the column named name. Shorthand for col(name).mean().
  • Compute the mean of all values horizontally across columns.
  • Find the median of all the values in the column named name. Shorthand for col(name).median().
  • Find the minimum of all the values in the column named name. Shorthand for col(name).min().
  • Create a new column with the minimum value per row.
  • Negates a boolean column.
  • Nth column in a DataFrame.
  • Compute the pearson correlation between two columns.
  • Find a specific quantile of all the values in the column named name.
  • Analogous to Iterator::reduce.
  • Create a column of length n containing n copies of the literal value.
  • rolling_corrrolling_window and cov
  • rolling_covrolling_window and cov
  • spearman_rank_corrrank and propagate_nans
    Compute the spearman rank correlation between two columns. Missing data will be excluded from the computation.
  • Sum all the values in the column named name. Shorthand for col(name).sum().
  • Sum all values horizontally across columns.
  • time_rangesdtype-time
    Create a column of time ranges from a start and stop expression.
  • Start a when-then-otherwise expression.

Type Aliases§