Skip to content

Usage

With the lazy API, Polars doesn't run each query line-by-line but instead processes the full query end-to-end. To get the most out of Polars it is important that you use the lazy API because:

  • the lazy API allows Polars to apply automatic query optimization with the query optimizer
  • the lazy API allows you to work with larger than memory datasets using streaming
  • the lazy API can catch schema errors before processing the data

Here we see how to use the lazy API starting from either a file or an existing DataFrame.

Using the lazy API from a file

In the ideal case we would use the lazy API right from a file as the query optimizer may help us to reduce the amount of data we read from the file.

We create a lazy query from the Reddit CSV data and apply some transformations.

By starting the query with pl.scan_csv we are using the lazy API.

scan_csv · with_columns · filter · col

q1 = (
    pl.scan_csv(f"docs/assets/data/reddit.csv")
    .with_columns(pl.col("name").str.to_uppercase())
    .filter(pl.col("comment_karma") > 0)
)

A pl.scan_ function is available for a number of file types including CSV, IPC, Parquet and JSON.

In this query we tell Polars that we want to:

  • load data from the Reddit CSV file
  • convert the name column to uppercase
  • apply a filter to the comment_karma column

The lazy query will not be executed at this point. See this page on executing lazy queries for more on running lazy queries.

Using the lazy API from a DataFrame

An alternative way to access the lazy API is to call .lazy on a DataFrame that has already been created in memory.

lazy

q3 = pl.DataFrame({"foo": ["a", "b", "c"], "bar": [0, 1, 2]}).lazy()

By calling .lazy we convert the DataFrame to a LazyFrame.