polars.read_delta#

polars.read_delta(
source: str,
*,
version: int | str | datetime | None = None,
columns: list[str] | None = None,
rechunk: bool = False,
storage_options: dict[str, Any] | None = None,
delta_table_options: dict[str, Any] | None = None,
pyarrow_options: dict[str, Any] | None = None,
) DataFrame[source]#

Reads into a DataFrame from a Delta lake table.

Parameters:
source

Path or URI to the root of the Delta lake table.

Note: For Local filesystem, absolute and relative paths are supported but for the supported object storages - GCS, Azure and S3 full URI must be provided.

version

Numerical version or timestamp version of the Delta lake table.

Note: If version is not provided, the latest version of delta lake table is read.

columns

Columns to select. Accepts a list of column names.

rechunk

Make sure that all columns are contiguous in memory by aggregating the chunks into a single array.

storage_options

Extra options for the storage backends supported by deltalake. For cloud storages, this may include configurations for authentication etc.

More info is available here.

delta_table_options

Additional keyword arguments while reading a Delta lake Table.

pyarrow_options

Keyword arguments while converting a Delta lake Table to pyarrow table.

Returns:
DataFrame

Examples

Reads a Delta table from local filesystem. Note: Since version is not provided, the latest version of the delta table is read.

>>> table_path = "/path/to/delta-table/"
>>> pl.read_delta(table_path)  

Use the pyarrow_options parameter to read only certain partitions. Note: This should be preferred over using an equivalent .filter() on the resulting DataFrame, as this avoids reading the data at all.

>>> pl.read_delta(  
...     table_path,
...     pyarrow_options={"partitions": [("year", "=", "2021")]},
... )

Reads a specific version of the Delta table from local filesystem. Note: This will fail if the provided version of the delta table does not exist.

>>> pl.read_delta(table_path, version=1)  

Time travel a delta table from local filesystem using a timestamp version.

>>> pl.read_delta(
...     table_path, version=datetime(2020, 1, 1, tzinfo=timezone.utc)
... )  

Reads a Delta table from AWS S3. See a list of supported storage options for S3 here.

>>> table_path = "s3://bucket/path/to/delta-table/"
>>> storage_options = {
...     "AWS_ACCESS_KEY_ID": "THE_AWS_ACCESS_KEY_ID",
...     "AWS_SECRET_ACCESS_KEY": "THE_AWS_SECRET_ACCESS_KEY",
... }
>>> pl.read_delta(table_path, storage_options=storage_options)  

Reads a Delta table from Google Cloud storage (GCS). See a list of supported storage options for GCS here.

>>> table_path = "gs://bucket/path/to/delta-table/"
>>> storage_options = {"SERVICE_ACCOUNT": "SERVICE_ACCOUNT_JSON_ABSOLUTE_PATH"}
>>> pl.read_delta(table_path, storage_options=storage_options)  

Reads a Delta table from Azure.

Following type of table paths are supported,

  • az://<container>/<path>

  • adl://<container>/<path>

  • abfs://<container>/<path>

See a list of supported storage options for Azure here.

>>> table_path = "az://container/path/to/delta-table/"
>>> storage_options = {
...     "AZURE_STORAGE_ACCOUNT_NAME": "AZURE_STORAGE_ACCOUNT_NAME",
...     "AZURE_STORAGE_ACCOUNT_KEY": "AZURE_STORAGE_ACCOUNT_KEY",
... }
>>> pl.read_delta(table_path, storage_options=storage_options)  

Reads a Delta table with additional delta specific options. In the below example, without_files option is used which loads the table without file tracking information.

>>> table_path = "/path/to/delta-table/"
>>> delta_table_options = {"without_files": True}
>>> pl.read_delta(
...     table_path, delta_table_options=delta_table_options
... )