polars.read_excel#

polars.read_excel(
source: str | Path | IO[bytes] | bytes,
*,
sheet_id: int | Sequence[int] | None = None,
sheet_name: str | list[str] | tuple[str] | None = None,
engine: ExcelSpreadsheetEngine | None = None,
engine_options: dict[str, Any] | None = None,
read_options: dict[str, Any] | None = None,
schema_overrides: SchemaDict | None = None,
infer_schema_length: int | None = 100,
raise_if_empty: bool = True,
) DataFrame | dict[str, DataFrame][source]#

Read Excel spreadsheet data into a DataFrame.

Changed in version 0.20.6: Added “calamine” fastexcel engine for Excel Workbooks (.xlsx, .xlsb, .xls).

Changed in version 0.19.4: Added “pyxlsb” engine for Excel Binary Workbooks (.xlsb).

Changed in version 0.19.3: Added “openpyxl” engine, and added schema_overrides parameter.

Parameters:
source

Path to a file or a file-like object (by “file-like object” we refer to objects that have a read() method, such as a file handler like the builtin open function, or a BytesIO instance).

sheet_id

Sheet number(s) to convert (set 0 to load all sheets as DataFrames) and return a {sheetname:frame,} dict. (Defaults to 1 if neither this nor sheet_name are specified). Can also take a sequence of sheet numbers.

sheet_name

Sheet name(s) to convert; cannot be used in conjunction with sheet_id. If more than one is given then a {sheetname:frame,} dict is returned.

engine

Library used to parse the spreadsheet file; currently defaults to “xlsx2csv” if not explicitly set.

  • “xlsx2csv”: converts the data to an in-memory CSV before using the native polars read_csv method to parse the result. You can pass engine_options and read_options to refine the conversion.

  • “calamine”: this engine can be used for reading all major types of Excel Workbook (.xlsx, .xlsb, .xls) and is dramatically faster than the other options, using the fastexcel module to bind the calamine reader.

  • “openpyxl”: this engine is significantly slower than xlsx2csv but supports additional automatic type inference; potentially useful if you are otherwise unable to parse your sheet with the (default) xlsx2csv engine in conjunction with the schema_overrides parameter.

  • “pyxlsb”: this engine can be used for Excel Binary Workbooks (.xlsb files). Note that you have to use schema_overrides to correctly load date/datetime columns (or these will be read as floats representing offset Julian values). You should now prefer the “calamine” engine for this Workbook type.

engine_options

Additional options passed to the underlying engine’s primary parsing constructor (given below), if supported:

  • “xlsx2csv”: Xlsx2csv

  • “calamine”: n/a (can only provide read_options)

  • “openpyxl”: load_workbook

  • “pyxlsb”: open_workbook

read_options

Options passed to the underlying engine method that reads the sheet data. Where supported, this allows for additional control over parsing. The specific read methods associated with each engine are:

  • “xlsx2csv”: pl.read_csv

  • “calamine”: ExcelReader.load_sheet_by_name

  • “openpyxl”: n/a (can only provide engine_options)

  • “pyxlsb”: n/a (can only provide engine_options)

schema_overrides

Support type specification or override of one or more columns.

infer_schema_length

The maximum number of rows to scan for schema inference. If set to None, the entire dataset is scanned to determine the dtypes, which can slow parsing for large workbooks. Note that only the “calamine” and “xlsx2csv” engines support this parameter; for all others it is a no-op.

raise_if_empty

When there is no data in the sheet,`NoDataError` is raised. If this parameter is set to False, an empty DataFrame (with no columns) is returned instead.

Returns:
DataFrame

If reading a single sheet.

dict

If reading multiple sheets, a “{sheetname: DataFrame, …}” dict is returned.

Notes

  • When using the default xlsx2csv engine the target Excel sheet is first converted to CSV using xlsx2csv.Xlsx2csv(source).convert() and then parsed with Polars’ read_csv() function. You can pass additional options to read_options to influence this part of the parsing pipeline.

  • Where possible, prefer the “calamine” engine for reading Excel Workbooks, as it is significantly faster than the other options, and is intended to become the default engine for all Excel file types in a future release.

  • If you want to read multiple sheets and set different options (read_options, schema_overrides, etc), you should make separate calls as the options are set globally, not on a per-sheet basis.

Examples

Read the “data” worksheet from an Excel file into a DataFrame.

>>> pl.read_excel(
...     source="test.xlsx",
...     sheet_name="data",
... )  

Read table data from sheet 3 in an Excel workbook as a DataFrame while skipping empty lines in the sheet. As sheet 3 does not have a header row and the default engine is xlsx2csv you can pass the necessary additional settings for this to the “read_options” parameter; these will be passed to read_csv().

>>> pl.read_excel(
...     source="test.xlsx",
...     sheet_id=3,
...     engine_options={"skip_empty_lines": True},
...     read_options={"has_header": False, "new_columns": ["a", "b", "c"]},
... )  

If the correct datatypes can’t be determined you can use schema_overrides and/or some of the read_csv() documentation to see which options you can pass to fix this issue. For example, if using xlsx2csv or calamine the “infer_schema_length” parameter can be set to None to force reading the entire dataset to infer the best dtypes. If column types are known in advance, and there is no ambiguity in the parsing, schema_overrides is typically the more efficient option.

>>> pl.read_excel(
...     source="test.xlsx",
...     schema_overrides={"dt": pl.Date},
...     infer_schema_length=None,
...     engine="calamine",
... )