polars.LazyFrame.sink_csv#

LazyFrame.sink_csv(
path: str | Path,
*,
include_bom: bool = False,
include_header: bool = True,
separator: str = ',',
line_terminator: str = '\n',
quote_char: str = '"',
batch_size: int = 1024,
datetime_format: str | None = None,
date_format: str | None = None,
time_format: str | None = None,
float_scientific: bool | None = None,
float_precision: int | None = None,
null_value: str | None = None,
quote_style: CsvQuoteStyle | None = None,
maintain_order: bool = True,
type_coercion: bool = True,
_type_check: bool = True,
predicate_pushdown: bool = True,
projection_pushdown: bool = True,
simplify_expression: bool = True,
slice_pushdown: bool = True,
collapse_joins: bool = True,
no_optimization: bool = False,
storage_options: dict[str, Any] | None = None,
credential_provider: CredentialProviderFunction | Literal['auto'] | None = 'auto',
retries: int = 2,
sync_on_close: SyncOnCloseMethod | None = None,
mkdir: bool = False,
lazy: bool = False,
engine: EngineType = 'auto',
) LazyFrame | None[source]#

Evaluate the query in streaming mode and write to a CSV file.

Warning

Streaming mode is considered unstable. It may be changed at any point without it being considered a breaking change.

This allows streaming results that are larger than RAM to be written to disk.

Parameters:
path

File path to which the file should be written.

include_bom

Whether to include UTF-8 BOM in the CSV output.

include_header

Whether to include header in the CSV output.

separator

Separate CSV fields with this symbol.

line_terminator

String used to end each row.

quote_char

Byte to use as quoting character.

batch_size

Number of rows that will be processed per thread.

datetime_format

A format string, with the specifiers defined by the chrono Rust crate. If no format specified, the default fractional-second precision is inferred from the maximum timeunit found in the frame’s Datetime cols (if any).

date_format

A format string, with the specifiers defined by the chrono Rust crate.

time_format

A format string, with the specifiers defined by the chrono Rust crate.

float_scientific

Whether to use scientific form always (true), never (false), or automatically (None) for Float32 and Float64 datatypes.

float_precision

Number of decimal places to write, applied to both Float32 and Float64 datatypes.

null_value

A string representing null values (defaulting to the empty string).

quote_style{‘necessary’, ‘always’, ‘non_numeric’, ‘never’}

Determines the quoting strategy used.

  • necessary (default): This puts quotes around fields only when necessary. They are necessary when fields contain a quote, delimiter or record terminator. Quotes are also necessary when writing an empty record (which is indistinguishable from a record with one empty field). This is the default.

  • always: This puts quotes around every field. Always.

  • never: This never puts quotes around fields, even if that results in invalid CSV data (e.g.: by not quoting strings containing the separator).

  • non_numeric: This puts quotes around all fields that are non-numeric. Namely, when writing a field that does not parse as a valid float or integer, then quotes will be used even if they aren`t strictly necessary.

maintain_order

Maintain the order in which data is processed. Setting this to False will be slightly faster.

type_coercion

Do type coercion optimization.

predicate_pushdown

Do predicate pushdown optimization.

projection_pushdown

Do projection pushdown optimization.

simplify_expression

Run simplify expressions optimization.

slice_pushdown

Slice pushdown optimization.

collapse_joins

Collapse a join and filters into a faster join

no_optimization

Turn off (certain) optimizations.

storage_options

Options that indicate how to connect to a cloud provider.

The cloud providers currently supported are AWS, GCP, and Azure. See supported keys here:

  • aws

  • gcp

  • azure

  • Hugging Face (hf://): Accepts an API key under the token parameter: {'token': '...'}, or by setting the HF_TOKEN environment variable.

If storage_options is not provided, Polars will try to infer the information from environment variables.

credential_provider

Provide a function that can be called to provide cloud storage credentials. The function is expected to return a dictionary of credential keys along with an optional credential expiry time.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

retries

Number of retries if accessing a cloud instance fails.

sync_on_close: { None, ‘data’, ‘all’ }

Sync to disk when before closing a file.

  • None does not sync.

  • data syncs the file contents.

  • all syncs the file contents and metadata.

mkdir: bool

Recursively create all the directories in the path.

lazy: bool

Wait to start execution until collect is called.

engine

Select the engine used to process the query, optional. At the moment, if set to "auto" (default), the query is run using the polars streaming engine. Polars will also attempt to use the engine set by the POLARS_ENGINE_AFFINITY environment variable. If it cannot run the query using the selected engine, the query is run using the polars streaming engine.

Note

The GPU engine is currently not supported.

Returns:
DataFrame

Examples

>>> lf = pl.scan_csv("/path/to/my_larger_than_ram_file.csv")  
>>> lf.sink_csv("out.csv")