polars.DataFrame.write_delta#
- DataFrame.write_delta(
- target: str | Path | deltalake.DeltaTable,
- *,
- mode: Literal['error', 'append', 'overwrite', 'ignore', 'merge'] = 'error',
- overwrite_schema: bool | None = None,
- storage_options: dict[str, str] | None = None,
- delta_write_options: dict[str, Any] | None = None,
- delta_merge_options: dict[str, Any] | None = None,
Write DataFrame as delta table.
- Parameters:
- target
URI of a table or a DeltaTable object.
- mode{‘error’, ‘append’, ‘overwrite’, ‘ignore’, ‘merge’}
How to handle existing data.
If ‘error’, throw an error if the table already exists (default).
If ‘append’, will add new data.
If ‘overwrite’, will replace table with new data.
If ‘ignore’, will not write anything if table already exists.
If ‘merge’, return a
TableMerger
object to merge data from the DataFrame with the existing data.
- overwrite_schema
If True, allows updating the schema of the table.
Deprecated since version 0.20.14: Use the parameter
delta_write_options
instead and pass{"schema_mode": "overwrite"}
.- storage_options
Extra options for the storage backends supported by
deltalake
. For cloud storages, this may include configurations for authentication etc.- delta_write_options
Additional keyword arguments while writing a Delta lake Table. See a list of supported write options here.
- delta_merge_options
Keyword arguments which are required to
MERGE
a Delta lake Table. See a list of supported merge options here.
- Raises:
- TypeError
If the DataFrame contains unsupported data types.
- ArrowInvalidError
If the DataFrame contains data types that could not be cast to their primitive type.
- TableNotFoundError
If the delta table doesn’t exist and MERGE action is triggered
Notes
The Polars data types
Null
andTime
are not supported by the delta protocol specification and will raise a TypeError. Columns using TheCategorical
data type will be converted to normal (non-categorical) strings when written.Polars columns are always nullable. To write data to a delta table with non-nullable columns, a custom pyarrow schema has to be passed to the
delta_write_options
. See the last example below.Examples
Write a dataframe to the local filesystem as a Delta Lake table.
>>> df = pl.DataFrame( ... { ... "foo": [1, 2, 3, 4, 5], ... "bar": [6, 7, 8, 9, 10], ... "ham": ["a", "b", "c", "d", "e"], ... } ... ) >>> table_path = "/path/to/delta-table/" >>> df.write_delta(table_path)
Append data to an existing Delta Lake table on the local filesystem. Note that this will fail if the schema of the new data does not match the schema of the existing table.
>>> df.write_delta(table_path, mode="append")
Overwrite a Delta Lake table as a new version. If the schemas of the new and old data are the same, specifying the
schema_mode
is not required.>>> existing_table_path = "/path/to/delta-table/" >>> df.write_delta( ... existing_table_path, ... mode="overwrite", ... delta_write_options={"schema_mode": "overwrite"}, ... )
Write a DataFrame as a Delta Lake table to a cloud object store like S3.
>>> table_path = "s3://bucket/prefix/to/delta-table/" >>> df.write_delta( ... table_path, ... storage_options={ ... "AWS_REGION": "THE_AWS_REGION", ... "AWS_ACCESS_KEY_ID": "THE_AWS_ACCESS_KEY_ID", ... "AWS_SECRET_ACCESS_KEY": "THE_AWS_SECRET_ACCESS_KEY", ... }, ... )
Write DataFrame as a Delta Lake table with non-nullable columns.
>>> import pyarrow as pa >>> existing_table_path = "/path/to/delta-table/" >>> df.write_delta( ... existing_table_path, ... delta_write_options={ ... "schema": pa.schema([pa.field("foo", pa.int64(), nullable=False)]) ... }, ... )
Merge the DataFrame with an existing Delta Lake table. For all
TableMerger
methods, check the deltalake docs here.>>> df = pl.DataFrame( ... { ... "foo": [1, 2, 3, 4, 5], ... "bar": [6, 7, 8, 9, 10], ... "ham": ["a", "b", "c", "d", "e"], ... } ... ) >>> table_path = "/path/to/delta-table/" >>> ( ... df.write_delta( ... "table_path", ... mode="merge", ... delta_merge_options={ ... "predicate": "s.foo = t.foo", ... "source_alias": "s", ... "target_alias": "t", ... }, ... ) ... .when_matched_update_all() ... .when_not_matched_insert_all() ... .execute() ... )