Input/output#

Avro#

`read_avro`(source, *[, columns, n_rows])	Read into a DataFrame from Apache Avro format.
`DataFrame.write_avro`(file[, compression, name])	Write to Apache Avro file.

Clipboard#

`read_clipboard`([separator])	Read text from clipboard and pass to `read_csv`.
`DataFrame.write_clipboard`(*[, separator])	Copy `DataFrame` in csv format to the system clipboard with `write_csv`.

CSV#

`read_csv`(source, *[, has_header, columns, ...])	Read a CSV file into a DataFrame.
`read_csv_batched`(source, *[, has_header, ...])	Read a CSV file in batches.
`scan_csv`(source, *[, has_header, separator, ...])	Lazily read from a CSV file or multiple files via glob patterns.
`DataFrame.write_csv`([file, include_bom, ...])	Write to comma-separated values (CSV) file.
`LazyFrame.sink_csv`(path, *[, include_bom, ...])	Evaluate the query in streaming mode and write to a CSV file.

BatchedCsvReader.next_batches(n)

Read n batches from the reader.

Database#

`read_database`(query, connection, *[, ...])	Read the results of a SQL query into a DataFrame, given a connection object.
`read_database_uri`(query, uri, *[, ...])	Read the results of a SQL query into a DataFrame, given a URI.
`DataFrame.write_database`(table_name, ...[, ...])	Write the data in a Polars DataFrame to a database.

Delta Lake#

`read_delta`(source, *[, version, columns, ...])	Reads into a DataFrame from a Delta lake table.
`scan_delta`(source, *[, version, ...])	Lazily read from a Delta lake table.
`DataFrame.write_delta`(target, *[, mode, ...])	Write DataFrame as delta table.

Excel / ODS#

`read_excel`(source, *[, sheet_id, ...])	Read Excel spreadsheet data into a DataFrame.
`read_ods`(source, *[, sheet_id, sheet_name, ...])	Read OpenOffice (ODS) spreadsheet data into a DataFrame.
`DataFrame.write_excel`([workbook, worksheet, ...])	Write frame data to a table in an Excel workbook/worksheet.

Feather / IPC#

`read_ipc`(source, *[, columns, n_rows, ...])	Read into a DataFrame from Arrow IPC (Feather v2) file.
`read_ipc_schema`(source)	Get the schema of an IPC file without reading data.
`read_ipc_stream`(source, *[, columns, ...])	Read into a DataFrame from Arrow IPC record batch stream.
`scan_ipc`(source, *[, n_rows, cache, ...])	Lazily read from an Arrow IPC (Feather v2) file or multiple files via glob patterns.
`DataFrame.write_ipc`(file[, compression, future])	Write to Arrow IPC binary stream or Feather file.
`DataFrame.write_ipc_stream`(file[, compression])	Write to Arrow IPC record batch stream.
`LazyFrame.sink_ipc`(path, *[, compression, ...])	Evaluate the query in streaming mode and write to an IPC file.

Iceberg#

scan_iceberg(source, *[, storage_options])

Lazily read from an Apache Iceberg table.

JSON#

`read_json`(source, *[, schema, ...])	Read into a DataFrame from a JSON file.
`read_ndjson`(source, *[, schema, ...])	Read into a DataFrame from a newline delimited JSON file.
`scan_ndjson`(source, *[, schema, ...])	Lazily read from a newline delimited JSON file or multiple files via glob patterns.
`DataFrame.write_json`([file, row_oriented, ...])	Serialize to JSON representation.
`DataFrame.write_ndjson`([file])	Serialize to newline delimited JSON representation.
`LazyFrame.sink_ndjson`(path, *[, ...])	Evaluate the query in streaming mode and write to an NDJSON file.

Parquet#

`read_parquet`(source, *[, columns, n_rows, ...])	Read into a DataFrame from a parquet file.
`read_parquet_schema`(source)	Get the schema of a Parquet file without reading data.
`scan_parquet`(source, *[, n_rows, ...])	Lazily read from a local or cloud-hosted parquet file (or files).
`DataFrame.write_parquet`(file, *[, ...])	Write to Apache Parquet file.
`LazyFrame.sink_parquet`(path, *[, ...])	Evaluate the query in streaming mode and write to a Parquet file.

PyArrow Datasets#

Connect to pyarrow datasets.

scan_pyarrow_dataset(source, *[, ...])

Scan a pyarrow dataset.