polars.scan_lines#
- polars.scan_lines(
- source: str | Path | IO[str] | IO[bytes] | bytes | list[str] | list[Path] | list[IO[str]] | list[IO[bytes]],
- *,
- name: str = 'lines',
- n_rows: int | None = None,
- row_index_name: str | None = None,
- row_index_offset: int = 0,
- glob: bool = True,
- storage_options: StorageOptionsDict | None = None,
- credential_provider: CredentialProviderFunction | Literal['auto'] | None = 'auto',
- include_file_paths: str | None = None,
Construct a LazyFrame which scans lines into a string column from a file.
Warning
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
- Parameters:
- source
Path(s) to a file or directory When needing to authenticate for scanning cloud locations, see the
storage_optionsparameter.- name
Name to use for the output column.
- n_rows
Stop reading from parquet file after reading
n_rows.- row_index_name
If not None, this will insert a row index column with the given name into the DataFrame
- row_index_offset
Offset to start the row index column (only used if the name is set)
- glob
Expand path given via globbing rules.
- storage_options
Options that indicate how to connect to a cloud provider.
The cloud providers currently supported are AWS, GCP, and Azure. See supported keys here:
Hugging Face (
hf://): Accepts an API key under thetokenparameter:{'token': '...'}, or by setting theHF_TOKENenvironment variable.
If
storage_optionsis not provided, Polars will try to infer the information from environment variables.- credential_provider
Provide a function that can be called to provide cloud storage credentials. The function is expected to return a dictionary of credential keys along with an optional credential expiry time.
Warning
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
- include_file_paths
Include the path of the source file(s) as a column with this name.
See also
Examples
>>> pl.scan_lines(b"Hello\nworld").collect() shape: (2, 1) ┌───────┐ │ lines │ │ --- │ │ str │ ╞═══════╡ │ Hello │ │ world │ └───────┘