polars.read_excel#
- polars.read_excel(
- source: str | Path | IO[bytes] | bytes,
- *,
- sheet_id: int | Sequence[int] | None = None,
- sheet_name: str | list[str] | tuple[str] | None = None,
- engine: ExcelSpreadsheetEngine = 'calamine',
- engine_options: dict[str, Any] | None = None,
- read_options: dict[str, Any] | None = None,
- has_header: bool = True,
- columns: Sequence[int] | Sequence[str] | None = None,
- schema_overrides: SchemaDict | None = None,
- infer_schema_length: int | None = 100,
- drop_empty_rows: bool = True,
- raise_if_empty: bool = True,
Read Excel spreadsheet data into a DataFrame.
Changed in version 1.0: Default engine is now “calamine” (was “xlsx2csv”).
Added in version 0.20.6: Added “calamine” fastexcel engine for Excel Workbooks (.xlsx, .xlsb, .xls).
Added in version 0.19.3: Added “openpyxl” engine, and added
schema_overrides
parameter.- Parameters:
- source
Path to a file or a file-like object (by “file-like object” we refer to objects that have a
read()
method, such as a file handler like the builtinopen
function, or aBytesIO
instance). For file-like objects, stream position may not be updated accordingly after reading.- sheet_id
Sheet number(s) to convert (set
0
to load all sheets as DataFrames) and return a{sheetname:frame,}
dict. (Defaults to1
if neither this norsheet_name
are specified). Can also take a sequence of sheet numbers.- sheet_name
Sheet name(s) to convert; cannot be used in conjunction with
sheet_id
. If more than one is given then a{sheetname:frame,}
dict is returned.- engine{‘calamine’, ‘xlsx2csv’, ‘openpyxl’}
Library used to parse the spreadsheet file; defaults to “calamine”.
“calamine”: this engine can be used for reading all major types of Excel Workbook (
.xlsx
,.xlsb
,.xls
) and is dramatically faster than the other options, using thefastexcel
module to bind the Calamine parser.“xlsx2csv”: converts the data to an in-memory CSV before using the native polars
read_csv
method to parse the result. You can passengine_options
andread_options
to refine the conversion.“openpyxl”: this engine is significantly slower than
xlsx2csv
but supports additional automatic type inference; potentially useful if you are otherwise unable to parse your sheet with thexlsx2csv
engine in conjunction with theschema_overrides
parameter.
- engine_options
Additional options passed to the underlying engine’s primary parsing constructor (given below), if supported:
“calamine”: n/a (can only provide
read_options
)“xlsx2csv”:
Xlsx2csv
“openpyxl”:
load_workbook
- read_options
Options passed to the underlying engine method that reads the sheet data. Where supported, this allows for additional control over parsing. The specific read methods associated with each engine are:
“calamine”:
ExcelReader.load_sheet_by_name
“xlsx2csv”:
pl.read_csv
“openpyxl”: n/a (can only provide
engine_options
)
- has_header
Indicate if the first row of the table data is a header or not. If False, column names will be autogenerated in the following format:
column_x
, withx
being an enumeration over every column in the dataset, starting at 1.- columns
Columns to read from the sheet; if not specified, all columns are read. Can be given as a sequence of column names or indices.
- schema_overrides
Support type specification or override of one or more columns.
- infer_schema_length
The maximum number of rows to scan for schema inference. If set to
None
, the entire dataset is scanned to determine the dtypes, which can slow parsing for large workbooks. Note that only the “calamine” and “xlsx2csv” engines support this parameter.- drop_empty_rows
Indicate whether to omit empty rows when reading data into the DataFrame.
- raise_if_empty
When there is no data in the sheet,`NoDataError` is raised. If this parameter is set to False, an empty DataFrame (with no columns) is returned instead.
- Returns:
- DataFrame
If reading a single sheet.
- dict
If reading multiple sheets, a “{sheetname: DataFrame, …}” dict is returned.
See also
Notes
Where possible, prefer the default “calamine” engine for reading Excel Workbooks, as it is significantly faster than the other options.
When using the
xlsx2csv
engine the target Excel sheet is first converted to CSV usingxlsx2csv.Xlsx2csv(source).convert()
and then parsed with Polars’read_csv()
function. You can pass additional options toread_options
to influence this part of the parsing pipeline.If you want to read multiple sheets and set different options (
read_options
,schema_overrides
, etc), you should make separate calls as the options are set globally, not on a per-sheet basis.
Examples
Read the “data” worksheet from an Excel file into a DataFrame.
>>> pl.read_excel( ... source="test.xlsx", ... sheet_name="data", ... )
If the correct dtypes can’t be determined, use the
schema_overrides
parameter to specify them, or increase the inference length withinfer_schema_length
.>>> pl.read_excel( ... source="test.xlsx", ... schema_overrides={"dt": pl.Date}, ... infer_schema_length=None, ... )
Using the
xlsx2csv
engine, read table data from sheet 3 in an Excel workbook as a DataFrame while skipping empty lines in the sheet. As sheet 3 does not have a header row, you can pass the necessary additional settings for this to theread_options
parameter; these will be passed toread_csv()
.>>> pl.read_excel( ... source="test.xlsx", ... sheet_id=3, ... engine="xlsx2csv", ... engine_options={"skip_empty_lines": True}, ... read_options={"has_header": False, "new_columns": ["a", "b", "c"]}, ... )