polars.scan_iceberg#

polars.scan_iceberg( source: str | Table, *, snapshot_id: int | None = None, storage_options: dict[str, Any] | None = None, ) → LazyFrame[source]#

Lazily read from an Apache Iceberg table.

Parameters:

source

A PyIceberg table, or a direct path to the metadata.

Note: For Local filesystem, absolute and relative paths are supported but for the supported object storages - GCS, Azure and S3 full URI must be provided.

snapshot_id

The snapshot ID to scan from.

storage_options

Extra options for the storage backends supported by pyiceberg. For cloud storages, this may include configurations for authentication etc.

More info is available here.

Returns:

LazyFrame

Examples

Creates a scan for an Iceberg table from local filesystem, or object store.

>>> table_path = "file:/path/to/iceberg-table/metadata.json"
>>> pl.scan_iceberg(table_path).collect()  

Creates a scan for an Iceberg table from S3. See a list of supported storage options for S3 here.

>>> table_path = "s3://bucket/path/to/iceberg-table/metadata.json"
>>> storage_options = {
...     "s3.region": "eu-central-1",
...     "s3.access-key-id": "THE_AWS_ACCESS_KEY_ID",
...     "s3.secret-access-key": "THE_AWS_SECRET_ACCESS_KEY",
... }
>>> pl.scan_iceberg(
...     table_path, storage_options=storage_options
... ).collect()  

Creates a scan for an Iceberg table from Azure. Supported options for Azure are available here.

Following type of table paths are supported:

az://<container>/<path>/metadata.json
adl://<container>/<path>/metadata.json
abfs[s]://<container>/<path>/metadata.json

>>> table_path = "az://container/path/to/iceberg-table/metadata.json"
>>> storage_options = {
...     "adlfs.account-name": "AZURE_STORAGE_ACCOUNT_NAME",
...     "adlfs.account-key": "AZURE_STORAGE_ACCOUNT_KEY",
... }
>>> pl.scan_iceberg(
...     table_path, storage_options=storage_options
... ).collect()  

Creates a scan for an Iceberg table from Google Cloud Storage. Supported options for GCS are available here.

>>> table_path = "s3://bucket/path/to/iceberg-table/metadata.json"
>>> storage_options = {
...     "gcs.project-id": "my-gcp-project",
...     "gcs.oauth.token": "ya29.dr.AfM...",
... }
>>> pl.scan_iceberg(
...     table_path, storage_options=storage_options
... ).collect()  

Creates a scan for an Iceberg table with additional options. In the below example, without_files option is used which loads the table without file tracking information.

>>> table_path = "/path/to/iceberg-table/metadata.json"
>>> storage_options = {"py-io-impl": "pyiceberg.io.fsspec.FsspecFileIO"}
>>> pl.scan_iceberg(
...     table_path, storage_options=storage_options
... ).collect()  

Creates a scan for an Iceberg table using a specific snapshot ID.

>>> table_path = "/path/to/iceberg-table/metadata.json"
>>> snapshot_id = 7051579356916758811
>>> pl.scan_iceberg(table_path, snapshot_id=snapshot_id).collect()