polars.LazyFrame.sink_ndjson#
- LazyFrame.sink_ndjson(
- path: str | Path,
- *,
- maintain_order: bool = True,
- type_coercion: bool = True,
- predicate_pushdown: bool = True,
- projection_pushdown: bool = True,
- simplify_expression: bool = True,
- slice_pushdown: bool = True,
- collapse_joins: bool = True,
- no_optimization: bool = False,
- storage_options: dict[str, Any] | None = None,
- credential_provider: CredentialProviderFunction | Literal['auto'] | None = 'auto',
- retries: int = 2,
Evaluate the query in streaming mode and write to an NDJSON file.
Warning
Streaming mode is considered unstable. It may be changed at any point without it being considered a breaking change.
This allows streaming results that are larger than RAM to be written to disk.
- Parameters:
- path
File path to which the file should be written.
- maintain_order
Maintain the order in which data is processed. Setting this to
False
will be slightly faster.- type_coercion
Do type coercion optimization.
- predicate_pushdown
Do predicate pushdown optimization.
- projection_pushdown
Do projection pushdown optimization.
- simplify_expression
Run simplify expressions optimization.
- slice_pushdown
Slice pushdown optimization.
- collapse_joins
Collapse a join and filters into a faster join
- no_optimization
Turn off (certain) optimizations.
- storage_options
Options that indicate how to connect to a cloud provider.
The cloud providers currently supported are AWS, GCP, and Azure. See supported keys here:
Hugging Face (
hf://
): Accepts an API key under thetoken
parameter:{'token': '...'}
, or by setting theHF_TOKEN
environment variable.
If
storage_options
is not provided, Polars will try to infer the information from environment variables.- credential_provider
Provide a function that can be called to provide cloud storage credentials. The function is expected to return a dictionary of credential keys along with an optional credential expiry time.
Warning
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
- retries
Number of retries if accessing a cloud instance fails.
- Returns:
- DataFrame
Examples
>>> lf = pl.scan_csv("/path/to/my_larger_than_ram_file.csv") >>> lf.sink_ndjson("out.ndjson")