polars.scan_lines#

polars.scan_lines(
source: str | Path | IO[str] | IO[bytes] | bytes | list[str] | list[Path] | list[IO[str]] | list[IO[bytes]],
*,
name: str = 'lines',
n_rows: int | None = None,
row_index_name: str | None = None,
row_index_offset: int = 0,
glob: bool = True,
storage_options: StorageOptionsDict | None = None,
credential_provider: CredentialProviderFunction | Literal['auto'] | None = 'auto',
include_file_paths: str | None = None,
) LazyFrame[source]#

Construct a LazyFrame which scans lines into a string column from a file.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

Parameters:
source

Path(s) to a file or directory When needing to authenticate for scanning cloud locations, see the storage_options parameter.

name

Name to use for the output column.

n_rows

Stop reading from parquet file after reading n_rows.

row_index_name

If not None, this will insert a row index column with the given name into the DataFrame

row_index_offset

Offset to start the row index column (only used if the name is set)

glob

Expand path given via globbing rules.

storage_options

Options that indicate how to connect to a cloud provider.

The cloud providers currently supported are AWS, GCP, and Azure. See supported keys here:

  • aws

  • gcp

  • azure

  • Hugging Face (hf://): Accepts an API key under the token parameter: {'token': '...'}, or by setting the HF_TOKEN environment variable.

If storage_options is not provided, Polars will try to infer the information from environment variables.

credential_provider

Provide a function that can be called to provide cloud storage credentials. The function is expected to return a dictionary of credential keys along with an optional credential expiry time.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

include_file_paths

Include the path of the source file(s) as a column with this name.

See also

read_lines

Examples

>>> pl.scan_lines(b"Hello\nworld").collect()
shape: (2, 1)
┌───────┐
│ lines │
│ ---   │
│ str   │
╞═══════╡
│ Hello │
│ world │
└───────┘