polars.DataFrame.write_delta#

DataFrame.write_delta(
target: str | Path | deltalake.DeltaTable,
*,
mode: Literal['error', 'append', 'overwrite', 'ignore'] = 'error',
overwrite_schema: bool = False,
storage_options: dict[str, str] | None = None,
delta_write_options: dict[str, Any] | None = None,
) None[source]#

Write DataFrame as delta table.

Parameters:
target

URI of a table or a DeltaTable object.

mode{‘error’, ‘append’, ‘overwrite’, ‘ignore’}

How to handle existing data.

  • If ‘error’, throw an error if the table already exists (default).

  • If ‘append’, will add new data.

  • If ‘overwrite’, will replace table with new data.

  • If ‘ignore’, will not write anything if table already exists.

overwrite_schema

If True, allows updating the schema of the table.

storage_options

Extra options for the storage backends supported by deltalake. For cloud storages, this may include configurations for authentication etc.

  • See a list of supported storage options for S3 here.

  • See a list of supported storage options for GCS here.

  • See a list of supported storage options for Azure here.

delta_write_options

Additional keyword arguments while writing a Delta lake Table. See a list of supported write options here.

Raises:
TypeError

If the DataFrame contains unsupported data types.

ArrowInvalidError

If the DataFrame contains data types that could not be cast to their primitive type.

Notes

The Polars data types Null, Categorical and Time are not supported by the delta protocol specification and will raise a TypeError.

Some other data types are not supported but have an associated primitive type to which they can be cast. This affects the following data types:

  • Unsigned integers

  • Datetime types with millisecond or nanosecond precision

  • Utf8, Binary, and List (‘large’ types)

Polars columns are always nullable. To write data to a delta table with non-nullable columns, a custom pyarrow schema has to be passed to the delta_write_options. See the last example below.

Examples

Write a dataframe to the local filesystem as a Delta Lake table.

>>> df = pl.DataFrame(
...     {
...         "foo": [1, 2, 3, 4, 5],
...         "bar": [6, 7, 8, 9, 10],
...         "ham": ["a", "b", "c", "d", "e"],
...     }
... )
>>> table_path = "/path/to/delta-table/"
>>> df.write_delta(table_path)  

Append data to an existing Delta Lake table on the local filesystem. Note that this will fail if the schema of the new data does not match the schema of the existing table.

>>> df.write_delta(table_path, mode="append")  

Overwrite a Delta Lake table as a new version. If the schemas of the new and old data are the same, setting overwrite_schema is not required.

>>> existing_table_path = "/path/to/delta-table/"
>>> df.write_delta(
...     existing_table_path, mode="overwrite", overwrite_schema=True
... )  

Write a dataframe as a Delta Lake table to a cloud object store like S3.

>>> table_path = "s3://bucket/prefix/to/delta-table/"
>>> df.write_delta(
...     table_path,
...     storage_options={
...         "AWS_REGION": "THE_AWS_REGION",
...         "AWS_ACCESS_KEY_ID": "THE_AWS_ACCESS_KEY_ID",
...         "AWS_SECRET_ACCESS_KEY": "THE_AWS_SECRET_ACCESS_KEY",
...     },
... )  

Write DataFrame as a Delta Lake table with non-nullable columns.

>>> import pyarrow as pa
>>> existing_table_path = "/path/to/delta-table/"
>>> df.write_delta(
...     existing_table_path,
...     delta_write_options={
...         "schema": pa.schema([pa.field("foo", pa.int64(), nullable=False)])
...     },
... )