polars.scan_ndjson#
- polars.scan_ndjson(
- source: str | Path,
- *,
- infer_schema_length: int | None = 100,
- batch_size: int | None = 1024,
- n_rows: int | None = None,
- low_memory: bool = False,
- rechunk: bool = True,
- row_count_name: str | None = None,
- row_count_offset: int = 0,
Lazily read from a newline delimited JSON file or multiple files via glob patterns.
This allows the query optimizer to push down predicates and projections to the scan level, thereby potentially reducing memory overhead.
- Parameters:
- source
Path to a file.
- infer_schema_length
Infer the schema from the first
infer_schema_length
rows.- batch_size
Number of rows to read in each batch.
- n_rows
Stop reading from JSON file after reading
n_rows
.- low_memory
Reduce memory pressure at the expense of performance.
- rechunk
Reallocate to contiguous memory when all chunks/ files are parsed.
- row_count_name
If not None, this will insert a row count column with give name into the DataFrame
- row_count_offset
Offset to start the row_count column (only use if the name is set)