polars.scan_ndjson#

polars.scan_ndjson(
source: str | Path,
*,
infer_schema_length: int | None = 100,
batch_size: int | None = 1024,
n_rows: int | None = None,
low_memory: bool = False,
rechunk: bool = True,
row_count_name: str | None = None,
row_count_offset: int = 0,
) LazyFrame[source]#

Lazily read from a newline delimited JSON file or multiple files via glob patterns.

This allows the query optimizer to push down predicates and projections to the scan level, thereby potentially reducing memory overhead.

Parameters:
source

Path to a file.

infer_schema_length

Infer the schema from the first infer_schema_length rows.

batch_size

Number of rows to read in each batch.

n_rows

Stop reading from JSON file after reading n_rows.

low_memory

Reduce memory pressure at the expense of performance.

rechunk

Reallocate to contiguous memory when all chunks/ files are parsed.

row_count_name

If not None, this will insert a row count column with give name into the DataFrame

row_count_offset

Offset to start the row_count column (only use if the name is set)