polars_lazy/
lib.rs

1//! Lazy API of Polars
2//!
3//! The lazy API of Polars supports a subset of the eager API. Apart from the distributed compute,
4//! it is very similar to [Apache Spark](https://spark.apache.org/). You write queries in a
5//! domain specific language. These queries translate to a logical plan, which represent your query steps.
6//! Before execution this logical plan is optimized and may change the order of operations if this will increase performance.
7//! Or implicit type casts may be added such that execution of the query won't lead to a type error (if it can be resolved).
8//!
9//! # Lazy DSL
10//!
11//! The lazy API of polars replaces the eager [`DataFrame`] with the [`LazyFrame`], through which
12//! the lazy API is exposed.
13//! The [`LazyFrame`] represents a logical execution plan: a sequence of operations to perform on a concrete data source.
14//! These operations are not executed until we call [`collect`].
15//! This allows polars to optimize/reorder the query which may lead to faster queries or fewer type errors.
16//!
17//! [`DataFrame`]: polars_core::frame::DataFrame
18//! [`LazyFrame`]: crate::frame::LazyFrame
19//! [`collect`]: crate::frame::LazyFrame::collect
20//!
21//! In general, a [`LazyFrame`] requires a concrete data source — a [`DataFrame`], a file on disk, etc. — which polars-lazy
22//! then applies the user-specified sequence of operations to.
23//! To obtain a [`LazyFrame`] from an existing [`DataFrame`], we call the [`lazy`](crate::frame::IntoLazy::lazy) method on
24//! the [`DataFrame`].
25//! A [`LazyFrame`] can also be obtained through the lazy versions of file readers, such as [`LazyCsvReader`](crate::frame::LazyCsvReader).
26//!
27//! The other major component of the polars lazy API is [`Expr`](crate::dsl::Expr), which represents an operation to be
28//! performed on a [`LazyFrame`], such as mapping over a column, filtering, or groupby-aggregation.
29//! [`Expr`] and the functions that produce them can be found in the [dsl module](crate::dsl).
30//!
31//! [`Expr`]: crate::dsl::Expr
32//!
33//! Most operations on a [`LazyFrame`] consume the [`LazyFrame`] and return a new [`LazyFrame`] with the updated plan.
34//! If you need to use the same [`LazyFrame`] multiple times, you should [`clone`](crate::frame::LazyFrame::clone) it, and optionally
35//! [`cache`](crate::frame::LazyFrame::cache) it beforehand.
36//!
37//! ## Examples
38//!
39//! #### Adding a new column to a lazy DataFrame
40//!
41//!```rust
42//! #[macro_use] extern crate polars_core;
43//! use polars_core::prelude::*;
44//! use polars_lazy::prelude::*;
45//!
46//! let df = df! {
47//!     "column_a" => &[1, 2, 3, 4, 5],
48//!     "column_b" => &["a", "b", "c", "d", "e"]
49//! }.unwrap();
50//!
51//! let new = df.lazy()
52//!     // Note the reverse here!!
53//!     .reverse()
54//!     .with_column(
55//!         // always rename a new column
56//!         (col("column_a") * lit(10)).alias("new_column")
57//!     )
58//!     .collect()
59//!     .unwrap();
60//!
61//! assert!(new.column("new_column")
62//!     .unwrap()
63//!     .equals(
64//!         &Column::new("new_column".into(), &[50, 40, 30, 20, 10])
65//!     )
66//! );
67//! ```
68//! #### Modifying a column based on some predicate
69//!
70//!```rust
71//! #[macro_use] extern crate polars_core;
72//! use polars_core::prelude::*;
73//! use polars_lazy::prelude::*;
74//!
75//! let df = df! {
76//!     "column_a" => &[1, 2, 3, 4, 5],
77//!     "column_b" => &["a", "b", "c", "d", "e"]
78//! }.unwrap();
79//!
80//! let new = df.lazy()
81//!     .with_column(
82//!         // value = 100 if x < 3 else x
83//!         when(
84//!             col("column_a").lt(lit(3))
85//!         ).then(
86//!             lit(100)
87//!         ).otherwise(
88//!             col("column_a")
89//!         ).alias("new_column")
90//!     )
91//!     .collect()
92//!     .unwrap();
93//!
94//! assert!(new.column("new_column")
95//!     .unwrap()
96//!     .equals(
97//!         &Column::new("new_column".into(), &[100, 100, 3, 4, 5])
98//!     )
99//! );
100//! ```
101//! #### Groupby + Aggregations
102//!
103//!```rust
104//! use polars_core::prelude::*;
105//! use polars_core::df;
106//! use polars_lazy::prelude::*;
107//!
108//! fn example() -> PolarsResult<DataFrame> {
109//!     let df = df!(
110//!         "date" => ["2020-08-21", "2020-08-21", "2020-08-22", "2020-08-23", "2020-08-22"],
111//!         "temp" => [20, 10, 7, 9, 1],
112//!         "rain" => [0.2, 0.1, 0.3, 0.1, 0.01]
113//!     )?;
114//!
115//!     df.lazy()
116//!     .group_by([col("date")])
117//!     .agg([
118//!         col("rain").min().alias("min_rain"),
119//!         col("rain").sum().alias("sum_rain"),
120//!         col("rain").quantile(lit(0.5), QuantileMethod::Nearest).alias("median_rain"),
121//!     ])
122//!     .sort(["date"], Default::default())
123//!     .collect()
124//! }
125//! ```
126//!
127//! #### Calling any function
128//!
129//! Below we lazily call a custom closure of type `Series => Result<Series>`. Because the closure
130//! changes the type/variant of the Series we also define the return type. This is important because
131//! due to the laziness the types should be known beforehand. Note that by applying these custom
132//! functions you have access to the whole **eager API** of the Series/ChunkedArrays.
133//!
134//!```rust
135//! #[macro_use] extern crate polars_core;
136//! use polars_core::prelude::*;
137//! use polars_lazy::prelude::*;
138//!
139//! let df = df! {
140//!     "column_a" => &[1, 2, 3, 4, 5],
141//!     "column_b" => &["a", "b", "c", "d", "e"]
142//! }.unwrap();
143//!
144//! let new = df.lazy()
145//!     .with_column(
146//!         col("column_a")
147//!         // apply a custom closure Series => Result<Series>
148//!         .map(|_s| {
149//!             Ok(Some(Column::new("".into(), &[6.0f32, 6.0, 6.0, 6.0, 6.0])))
150//!         },
151//!         // return type of the closure
152//!         GetOutput::from_type(DataType::Float64)).alias("new_column")
153//!     )
154//!     .collect()
155//!     .unwrap();
156//! ```
157//!
158//! #### Joins, filters and projections
159//!
160//! In the query below we do a lazy join and afterwards we filter rows based on the predicate `a < 2`.
161//! And last we select the columns `"b"` and `"c_first"`. In an eager API this query would be very
162//! suboptimal because we join on DataFrames with more columns and rows than needed. In this case
163//! the query optimizer will do the selection of the columns (projection) and the filtering of the
164//! rows (selection) before the join, thereby reducing the amount of work done by the query.
165//!
166//! ```rust
167//! # use polars_core::prelude::*;
168//! # use polars_lazy::prelude::*;
169//!
170//! fn example(df_a: DataFrame, df_b: DataFrame) -> LazyFrame {
171//!     df_a.lazy()
172//!     .left_join(df_b.lazy(), col("b_left"), col("b_right"))
173//!     .filter(
174//!         col("a").lt(lit(2))
175//!     )
176//!     .group_by([col("b")])
177//!     .agg(
178//!         vec![col("b").first().alias("first_b"), col("c").first().alias("first_c")]
179//!      )
180//!     .select(&[col("b"), col("c_first")])
181//! }
182//! ```
183//!
184//! If we want to do an aggregation on all columns we can use the wildcard operator `*` to achieve this.
185//!
186//! ```rust
187//! # use polars_core::prelude::*;
188//! # use polars_lazy::prelude::*;
189//!
190//! fn aggregate_all_columns(df_a: DataFrame) -> LazyFrame {
191//!     df_a.lazy()
192//!     .group_by([col("b")])
193//!     .agg(
194//!         vec![col("*").first()]
195//!      )
196//! }
197//! ```
198#![allow(ambiguous_glob_reexports)]
199#![cfg_attr(docsrs, feature(doc_auto_cfg))]
200extern crate core;
201
202#[cfg(feature = "dot_diagram")]
203mod dot;
204pub mod dsl;
205pub mod frame;
206pub mod physical_plan;
207pub mod prelude;
208
209mod scan;
210#[cfg(test)]
211mod tests;