polars.read_excel#

polars.read_excel(
source: FileSource,
*,
sheet_id: int | Sequence[int] | None = None,
sheet_name: str | list[str] | tuple[str] | None = None,
table_name: str | None = None,
engine: ExcelSpreadsheetEngine = 'calamine',
engine_options: dict[str, Any] | None = None,
read_options: dict[str, Any] | None = None,
has_header: bool = True,
columns: Sequence[int] | Sequence[str] | None = None,
schema_overrides: SchemaDict | None = None,
infer_schema_length: int | None = 100,
include_file_paths: str | None = None,
drop_empty_rows: bool = True,
drop_empty_cols: bool = True,
raise_if_empty: bool = True,
) DataFrame | dict[str, DataFrame][source]#

Read Excel spreadsheet data into a DataFrame.

Added in version 1.20: Support loading data from named table objects with table_name parameter.

Added in version 1.18: Support loading data from a list (or glob pattern) of multiple workbooks.

Changed in version 1.0: Default engine is now “calamine” (was “xlsx2csv”).

Parameters:
source

Path(s) to a file or a file-like object (by “file-like object” we refer to objects that have a read() method, such as a file handler like the builtin open function, or a BytesIO instance). For file-like objects, the stream position may not be updated after reading.

sheet_id

Sheet number(s) to convert (set 0 to load all sheets as DataFrames) and return a {sheetname:frame,} dict. (Defaults to 1 if neither this nor sheet_name are specified). Can also take a sequence of sheet numbers.

sheet_name

Sheet name(s) to convert; cannot be used in conjunction with sheet_id. If more than one is given then a {sheetname:frame,} dict is returned.

table_name

Name of a specific table to read; note that table names are unique across the workbook, so additionally specifying a sheet id or name is optional; if one of those parameters is specified, an error will be raised if the named table is not found in that particular sheet.

engine{‘calamine’, ‘openpyxl’, ‘xlsx2csv’}

Library used to parse the spreadsheet file; defaults to “calamine”.

  • “calamine”: this engine can be used for reading all major types of Excel Workbook (.xlsx, .xlsb, .xls) and is dramatically faster than the other options, using the fastexcel module to bind the Rust-based Calamine parser.

  • “openpyxl”: this engine is significantly slower than both calamine and xlsx2csv, but can provide a useful fallback if you are otherwise unable to read data from your workbook.

  • “xlsx2csv”: converts the data to an in-memory CSV before using the native polars read_csv method to parse the result.

engine_options

Additional options passed to the underlying engine’s primary parsing constructor (given below), if supported:

  • “calamine”: n/a (can only provide read_options)

  • “openpyxl”: load_workbook

  • “xlsx2csv”: Xlsx2csv

read_options

Options passed to the underlying engine method that reads the sheet data. Where supported, this allows for additional control over parsing. The specific read methods associated with each engine are:

has_header

Indicate if the first row of the table data is a header or not. If False, column names will be autogenerated in the following format: column_x, with x being an enumeration over every column in the dataset, starting at 1.

columns

Columns to read from the sheet; if not specified, all columns are read. Can be given as a sequence of column names or indices.

schema_overrides

Support type specification or override of one or more columns.

infer_schema_length

The maximum number of rows to scan for schema inference. If set to None, the entire dataset is scanned to determine the dtypes, which can slow parsing for large workbooks. Note that only the “calamine” and “xlsx2csv” engines support this parameter.

include_file_paths

Include the path of the source file(s) as a column with this name.

drop_empty_rows

Indicate whether to omit empty rows when reading data into the DataFrame.

drop_empty_cols

Indicate whether to omit empty columns (with no headers) when reading data into the DataFrame (note that empty column identification may vary depending on the underlying engine being used).

raise_if_empty

When there is no data in the sheet,`NoDataError` is raised. If this parameter is set to False, an empty DataFrame (with no columns) is returned instead.

Returns:
DataFrame

If reading a single sheet.

dict

If reading multiple sheets, a “{sheetname: DataFrame, …}” dict is returned.

See also

read_ods

Notes

  • Where possible, prefer the default “calamine” engine for reading Excel Workbooks, as it is significantly faster than the other options.

  • When using the xlsx2csv engine the target Excel sheet is first converted to CSV using xlsx2csv.Xlsx2csv(source).convert() and then parsed with Polars’ read_csv() function. You can pass additional options to read_options to influence this part of the parsing pipeline.

  • If you want to read multiple sheets and set different options (read_options, schema_overrides, etc), you should make separate calls as the options are set globally, not on a per-sheet basis.

Examples

Read the “data” worksheet from an Excel file into a DataFrame.

>>> pl.read_excel(
...     source="test.xlsx",
...     sheet_name="data",
... )  

If the correct dtypes can’t be determined, use the schema_overrides parameter to specify them, or increase the inference length with infer_schema_length.

>>> pl.read_excel(
...     source="test.xlsx",
...     schema_overrides={"dt": pl.Date},
...     infer_schema_length=None,
... )  

Using the xlsx2csv engine, read table data from sheet 3 in an Excel workbook as a DataFrame while skipping empty lines in the sheet. As sheet 3 does not have a header row, you can pass the necessary additional settings for this to the read_options parameter; these will be passed to read_csv().

>>> pl.read_excel(
...     source="test.xlsx",
...     sheet_id=3,
...     engine="xlsx2csv",
...     engine_options={"skip_empty_lines": True},
...     read_options={"has_header": False, "new_columns": ["a", "b", "c"]},
... )