polars.LazyFrame.sink_ipc#

LazyFrame.sink_ipc(
path: str | Path | IO[bytes] | PartitioningScheme,
*,
compression: IpcCompression | None = 'uncompressed',
compat_level: CompatLevel | None = None,
maintain_order: bool = True,
storage_options: dict[str, Any] | None = None,
credential_provider: CredentialProviderFunction | Literal['auto'] | None = 'auto',
retries: int = 2,
sync_on_close: SyncOnCloseMethod | None = None,
mkdir: bool = False,
lazy: bool = False,
engine: EngineType = 'auto',
optimizations: QueryOptFlags = (),
) LazyFrame | None[source]#

Evaluate the query in streaming mode and write to an IPC file.

This allows streaming results that are larger than RAM to be written to disk.

Parameters:
path

File path to which the file should be written.

compression{‘uncompressed’, ‘lz4’, ‘zstd’}

Choose “zstd” for good compression performance. Choose “lz4” for fast compression/decompression.

compat_level

Use a specific compatibility level when exporting Polars’ internal data structures.

maintain_order

Maintain the order in which data is processed. Setting this to False will be slightly faster.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

storage_options

Options that indicate how to connect to a cloud provider.

The cloud providers currently supported are AWS, GCP, and Azure. See supported keys here:

  • aws

  • gcp

  • azure

  • Hugging Face (hf://): Accepts an API key under the token parameter: {'token': '...'}, or by setting the HF_TOKEN environment variable.

If storage_options is not provided, Polars will try to infer the information from environment variables.

credential_provider

Provide a function that can be called to provide cloud storage credentials. The function is expected to return a dictionary of credential keys along with an optional credential expiry time.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

retries

Number of retries if accessing a cloud instance fails.

sync_on_close: { None, ‘data’, ‘all’ }

Sync to disk when before closing a file.

  • None does not sync.

  • data syncs the file contents.

  • all syncs the file contents and metadata.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

mkdir: bool

Recursively create all the directories in the path.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

lazy: bool

Wait to start execution until collect is called.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

engine

Select the engine used to process the query, optional. At the moment, if set to "auto" (default), the query is run using the polars streaming engine. Polars will also attempt to use the engine set by the POLARS_ENGINE_AFFINITY environment variable. If it cannot run the query using the selected engine, the query is run using the polars streaming engine.

Note

The GPU engine is currently not supported.

optimizations

The optimization passes done during query optimization.

This has no effect if lazy is set to True.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

Returns:
DataFrame

Examples

>>> lf = pl.scan_csv("/path/to/my_larger_than_ram_file.csv")  
>>> lf.sink_ipc("out.arrow")