polars.LazyFrame.sink_csv#

LazyFrame.sink_csv(
path: str | Path,
*,
include_bom: bool = False,
include_header: bool = True,
separator: str = ',',
line_terminator: str = '\n',
quote_char: str = '"',
batch_size: int = 1024,
datetime_format: str | None = None,
date_format: str | None = None,
time_format: str | None = None,
float_scientific: bool | None = None,
float_precision: int | None = None,
null_value: str | None = None,
quote_style: CsvQuoteStyle | None = None,
maintain_order: bool = True,
type_coercion: bool = True,
predicate_pushdown: bool = True,
projection_pushdown: bool = True,
simplify_expression: bool = True,
slice_pushdown: bool = True,
collapse_joins: bool = True,
no_optimization: bool = False,
storage_options: dict[str, Any] | None = None,
credential_provider: CredentialProviderFunction | Literal['auto'] | None = 'auto',
retries: int = 2,
) None[source]#

Evaluate the query in streaming mode and write to a CSV file.

Warning

Streaming mode is considered unstable. It may be changed at any point without it being considered a breaking change.

This allows streaming results that are larger than RAM to be written to disk.

Parameters:
path

File path to which the file should be written.

include_bom

Whether to include UTF-8 BOM in the CSV output.

include_header

Whether to include header in the CSV output.

separator

Separate CSV fields with this symbol.

line_terminator

String used to end each row.

quote_char

Byte to use as quoting character.

batch_size

Number of rows that will be processed per thread.

datetime_format

A format string, with the specifiers defined by the chrono Rust crate. If no format specified, the default fractional-second precision is inferred from the maximum timeunit found in the frame’s Datetime cols (if any).

date_format

A format string, with the specifiers defined by the chrono Rust crate.

time_format

A format string, with the specifiers defined by the chrono Rust crate.

float_scientific

Whether to use scientific form always (true), never (false), or automatically (None) for Float32 and Float64 datatypes.

float_precision

Number of decimal places to write, applied to both Float32 and Float64 datatypes.

null_value

A string representing null values (defaulting to the empty string).

quote_style{‘necessary’, ‘always’, ‘non_numeric’, ‘never’}

Determines the quoting strategy used.

  • necessary (default): This puts quotes around fields only when necessary. They are necessary when fields contain a quote, delimiter or record terminator. Quotes are also necessary when writing an empty record (which is indistinguishable from a record with one empty field). This is the default.

  • always: This puts quotes around every field. Always.

  • never: This never puts quotes around fields, even if that results in invalid CSV data (e.g.: by not quoting strings containing the separator).

  • non_numeric: This puts quotes around all fields that are non-numeric. Namely, when writing a field that does not parse as a valid float or integer, then quotes will be used even if they aren`t strictly necessary.

maintain_order

Maintain the order in which data is processed. Setting this to False will be slightly faster.

type_coercion

Do type coercion optimization.

predicate_pushdown

Do predicate pushdown optimization.

projection_pushdown

Do projection pushdown optimization.

simplify_expression

Run simplify expressions optimization.

slice_pushdown

Slice pushdown optimization.

collapse_joins

Collapse a join and filters into a faster join

no_optimization

Turn off (certain) optimizations.

storage_options

Options that indicate how to connect to a cloud provider.

The cloud providers currently supported are AWS, GCP, and Azure. See supported keys here:

  • aws

  • gcp

  • azure

  • Hugging Face (hf://): Accepts an API key under the token parameter: {'token': '...'}, or by setting the HF_TOKEN environment variable.

If storage_options is not provided, Polars will try to infer the information from environment variables.

credential_provider

Provide a function that can be called to provide cloud storage credentials. The function is expected to return a dictionary of credential keys along with an optional credential expiry time.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

retries

Number of retries if accessing a cloud instance fails.

Returns:
DataFrame

Examples

>>> lf = pl.scan_csv("/path/to/my_larger_than_ram_file.csv")  
>>> lf.sink_csv("out.csv")