polars.scan_ndjson#

polars.scan_ndjson( source: str | Path, *, infer_schema_length: int | None = 100, batch_size: int | None = 1024, n_rows: int | None = None, low_memory: bool = False, rechunk: bool = True, row_count_name: str | None = None, row_count_offset: int = 0, ) → LazyFrame[source]#

Lazily read from a newline delimited JSON file or multiple files via glob patterns.

This allows the query optimizer to push down predicates and projections to the scan level, thereby potentially reducing memory overhead.

Parameters:

source: Path to a file.
infer_schema_length: Infer the schema from the first infer_schema_length rows.
batch_size: Number of rows to read in each batch.
n_rows: Stop reading from JSON file after reading n_rows.
low_memory: Reduce memory pressure at the expense of performance.
rechunk: Reallocate to contiguous memory when all chunks/ files are parsed.
row_count_name: If not None, this will insert a row count column with give name into the DataFrame
row_count_offset: Offset to start the row_count column (only use if the name is set)