
source: FileSource,
columns: list[int] | list[str] | None = None,
n_rows: int | None = None,
row_index_name: str | None = None,
row_index_offset: int = 0,
parallel: ParallelStrategy = 'auto',
use_statistics: bool = True,
hive_partitioning: bool | None = None,
glob: bool = True,
schema: SchemaDict | None = None,
hive_schema: SchemaDict | None = None,
try_parse_hive_dates: bool = True,
rechunk: bool = False,
low_memory: bool = False,
storage_options: dict[str, Any] | None = None,
credential_provider: CredentialProviderFunction | Literal['auto'] | None = 'auto',
retries: int = 2,
use_pyarrow: bool = False,
pyarrow_options: dict[str, Any] | None = None,
memory_map: bool = True,
include_file_paths: str | None = None,
allow_missing_columns: bool = False,
) DataFrame[source]#

Read into a DataFrame from a parquet file.


Path(s) to a file or directory When needing to authenticate for scanning cloud locations, see the storage_options parameter.

File-like objects are supported (by “file-like object” we refer to objects that have a read() method, such as a file handler like the builtin open function, or a BytesIO instance). For file-like objects, the stream position may not be updated accordingly after reading.


Columns to select. Accepts a list of column indices (starting at zero) or a list of column names.


Stop reading from parquet file after reading n_rows. Only valid when use_pyarrow=False.


Insert a row index column with the given name into the DataFrame as the first column. If set to None (default), no row index column is created.


Start the row index at this offset. Cannot be negative. Only used if row_index_name is set.

parallel{‘auto’, ‘columns’, ‘row_groups’, ‘none’}

This determines the direction of parallelism. ‘auto’ will try to determine the optimal direction.


Use statistics in the parquet to determine if pages can be skipped from reading.


Infer statistics and schema from Hive partitioned URL and use them to prune reads. This is unset by default (i.e. None), meaning it is automatically enabled when a single directory is passed, and otherwise disabled.


Expand path given via globbing rules.


Specify the datatypes of the columns. The datatypes must match the datatypes in the file(s). If there are extra columns that are not in the file(s), consider also enabling allow_missing_columns.


This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.


The column names and data types of the columns by which the data is partitioned. If set to None (default), the schema of the Hive partitions is inferred.


This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.


Whether to try parsing hive values as date/datetime types.


Make sure that all columns are contiguous in memory by aggregating the chunks into a single array.


Reduce memory pressure at the expense of performance.


Options that indicate how to connect to a cloud provider.

The cloud providers currently supported are AWS, GCP, and Azure. See supported keys here:

  • aws

  • gcp

  • azure

  • Hugging Face (hf://): Accepts an API key under the token parameter: {'token': '...'}, or by setting the HF_TOKEN environment variable.

If storage_options is not provided, Polars will try to infer the information from environment variables.


Provide a function that can be called to provide cloud storage credentials. The function is expected to return a dictionary of credential keys along with an optional credential expiry time.


This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.


Number of retries if accessing a cloud instance fails.


Use PyArrow instead of the Rust-native Parquet reader. The PyArrow reader is more stable.


Keyword arguments for pyarrow.parquet.read_table.


Memory map underlying file. This will likely increase performance. Only used when use_pyarrow=True.


Include the path of the source file(s) as a column with this name. Only valid when use_pyarrow=False.


When reading a list of parquet files, if a column existing in the first file cannot be found in subsequent files, the default behavior is to raise an error. However, if allow_missing_columns is set to True, a full-NULL column is returned instead of erroring for the files that do not contain the column.
