Skip to content

Polars user guide

Config file reference

Config file reference

This page describes the different configuration options for Polars on-premises. The config file is a standard TOML file with different sections. Any of the configuration can be overridden using environment variables in the following format: PC_CUBLET__section_name__key.

Example configuration files can be found at Example Configurations.

See the sidebar for extensive documentation on important components and their configuration together.

Top-level configuration

Key	Type	Description
`cluster_id`	string	Logical ID for the cluster; workers and scheduler that share this ID will form a single cluster. e.g. `prod-eu-1`; must be unique among all clusters.
`instance_id`	string	Unique ID for this node within the cluster, used for addressing and leader selection. e.g. `scheduler`, `worker_0`; must be unique per cluster.
`license`	path	Absolute path to the Polars on-premises license file required to start the process. e.g. `/etc/polars/license.json`.
`memory_limit`	integer	Hard memory budget for all components in this node; enforced via cgroups when delegated. e.g. `1073741824` (1 GiB), `10737418240` (10 GiB).

`[scheduler]` section

Key	Type	Description
`enabled`	boolean	Whether the scheduler component runs in this process. `true` for the leader node, `false` on pure workers.
`allow_local_sinks`	boolean	Whether workers are allowed to write to a shared/local disk visible to the scheduler. `false` for fully remote/storage-only setups, `true` if you have a shared filesystem.
`n_workers`	integer	Expected number of workers in this cluster; scheduler waits for the latter to be online before running queries. e.g. `4`.
`anonymous_result_location`	object	Destination for results of queries that do not have an explicit sink. Currently supported local mounted (must be reachable on the exact same path and `allow_local_sinks` enabled) and S3-based. Both options must be network reachable by scheduler, workers, and client. e.g. `/mnt/storage/polars/results`. e.g. `s3://bucket/path/to/key`
`anonymous_result_location.local`	object	Object used for local disk-backed anonymous results.
`anonymous_result_location.local.path`	path	Local path where anonymous results are stored. e.g. `/mnt/storage/polars/results`.
`anonymous_result_location.s3`	object	Object used for S3-backed anonymous results.
`anonymous_result_location.s3.url`	string	S3 bucket url. e.g. `s3://bucket/path/to/key`.
`anonymous_result_location.s3.aws_endpoint_url`	string	Storage option configuration, see `scan_parquet()`.
`anonymous_result_location.s3.aws_region`	string	Storage option configuration. e.g. `eu-east-1`
`anonymous_result_location.s3.aws_access_key_id`	string	Storage option configuration.
`anonymous_result_location.s3.aws_secret_access_key`	string	Storage option configuration.
`client_service`	object	Object used for configuring the bind address of the client service. This is the service used by the polars-cloud Python client. Defaults to `0.0.0.0:5051`.
`client_service.bind_addr`	string	Bind address for the client service. e.g. `0.0.0.0:5051`.
`client_service.bind_addr.ip`	string	IP address for the client service bind address. e.g. `192.168.1.1`.
`client_service.bind_addr.port`	integer	Port for the client service bind address. e.g. `5051`.
`client_service.bind_addr.hostname`	string	Alternative to `ip`, resolved once at startup. e.g. `my-host-1`.
`worker_service`	object	Object used for configuring the bind address of the worker service. This is an internal service used by the workers. Defaults to `0.0.0.0:5050`.
`worker_service.bind_addr`	string	Bind address for the worker service. e.g. `0.0.0.0:5050`.
`worker_service.bind_addr.ip`	string	IP address for the worker service bind address. e.g. `192.168.1.1`.
`worker_service.bind_addr.port`	integer	Port for the worker service bind address. e.g. `5050`.
`worker_service.bind_addr.hostname`	string	Alternative to `ip`, resolved once at startup. e.g. `my-host-2`.

`[worker]` section

Key	Type	Description
`enabled`	boolean	Whether the worker component runs in this process. `true` on worker nodes, `false` on the dedicated scheduler.
`heartbeat_period`	string	Interval for worker heartbeats towards the scheduler, used for liveness and load reporting. Either an ISO 8601 duration format or a jiff friendly duration format (see https://docs.rs/jiff/0.2.18/jiff/fmt/friendly/) e.g. `5 secs`. e.g. `PT5S`.
`shuffle_location`	object	Object used for shuffle data storage.
`shuffle_location.local`	object	Object used for local disk-backed shuffle data storage.
`shuffle_location.local.path`	path	Local path where shuffle/intermediate data is stored; fast local SSD is recommended. e.g. `/mnt/storage/polars/shuffle`.
`shuffle_location.shared_filesystem`	object	Object used for shared filesystem-backed shuffle data storage.
`shuffle_location.shared_filesystem.path`	path	Shared filesystem path where shuffle/intermediate data is stored. Must be accessible by all workers on the same path. e.g. `/mnt/storage/polars/shuffle`.
`shuffle_location.s3`	object	Object used for S3-backed shuffle data storage.
`shuffle_location.s3.url`	path	Destination for shuffle/intermediate data. e.g. `s3://bucket/path/to/key`.
`shuffle_location.s3.aws_endpoint_url`	string	Storage option configuration, see `scan_parquet()`.
`shuffle_location.s3.aws_region`	string	Storage option configuration. e.g. `eu-east-1`
`shuffle_location.s3.aws_access_key_id`	string	Storage option configuration.
`shuffle_location.s3.aws_secret_access_key`	string	Storage option configuration.
`task_service`	object	Object used for configuring the bind address of the task service. This is an internal service in the worker for receiving tasks from the scheduler. Defaults to `0.0.0.0:5052`.
`task_service.bind_addr`	string	Bind address for the task service. e.g. `0.0.0.0:5052`.
`task_service.bind_addr.ip`	string	IP address for the task service bind address. e.g. `192.168.1.1`.
`task_service.bind_addr.port`	integer	Port for the task service bind address. e.g. `5052`.
`task_service.bind_addr.hostname`	string	Alternative to `ip`, resolved once at startup. e.g. `my-host-2`.
`task_service.public_addr`	string	Address at which this service is reachable by the scheduler. Defaults to the bind address if not set. This field is required when the bind address is `0.0.0.0`. e.g. `192.168.1.1`.
`task_service.public_addr.ip`	string	IP address for the task service public address. e.g. `192.168.1.2`.
`task_service.public_addr.port`	integer	Port for the task service public address. e.g. `5052`.
`task_service.public_addr.hostname`	string	Alternative to `ip`, resolved once at startup. e.g. `my-host-2`.
`shuffle_service`	object	Object used for configuring the bind address of the task service. This is an internal service in the worker for receiving tasks from the scheduler. Defaults to `0.0.0.0:5052`.
`shuffle_service.bind_addr`	string	Bind address for the task service. e.g. `0.0.0.0:5053`.
`shuffle_service.bind_addr.ip`	string	IP address for the task service bind address. e.g. `192.168.1.1`.
`shuffle_service.bind_addr.port`	integer	Port for the task service bind address. e.g. `5053`.
`shuffle_service.bind_addr.hostname`	string	Alternative to `ip`, resolved once at startup. e.g. `my-host-2`.
`shuffle_service.public_addr`	string	Address at which this service is reachable by the scheduler. Defaults to the bind address if not set. This field is required when the bind address is `0.0.0.0`. e.g. `192.168.1.1`.
`shuffle_service.public_addr.ip`	string	IP address for the task service public address. e.g. `192.168.1.2`.
`shuffle_service.public_addr.port`	integer	Port for the task service public address. e.g. `5053`.
`shuffle_service.public_addr.hostname`	string	Alternative to `ip`, resolved once at startup. e.g. `my-host-2`.

`[observatory]` section

Key	Type	Description
`enabled`	boolean	Enable sending/receiving profiling data so clients can call `result.await_profile()`. `true` on both scheduler and workers if you want profiles on queries; `false` to disable.
`max_metrics_bytes_total`	integer	How many bytes all the worker host metrics will consume in total. If a system-wide memory limit is specified then this is added to the share that the scheduler takes. For every worker, about 50 bytes of metrics are stored per second.
`database_path`	string	Location to use for storing profiling data. An SQLite database file will be created here, or if a file already exists it will be opened. If this points to a directory, a file in that directory will be created. Polars on-premises will automatically add the `cluster_id` to this file name to ensure uniqueness within the directory.
`service`	object	Object used for configuring the bind address of the observatory service. This is an internal service in the scheduler for receiving profiling data from all nodes. Defaults to `0.0.0.0:5049`.
`service.bind_addr`	string	Bind address for the observatory service. e.g. `0.0.0.0:5049`.
`service.bind_addr.ip`	string	IP address for the observatory service bind address. e.g. `192.168.1.1`.
`service.bind_addr.port`	integer	Port for the observatory service bind address. e.g. `5049`.
`service.bind_addr.hostname`	string	Alternative to `ip`, resolved once at startup. e.g. `my-host-2`.
`rest_api.enabled`	boolean	By default enabled for exposing the observatory REST API. This is a public service for accessing the profiling data and host metrics data through a web interface.
`rest_api.service`	object	Object used for configuring the bind address of the observatory REST API service. Defaults to `0.0.0.0:3001`.
`rest_api.service.bind_addr`	string	Bind address for the observatory REST API service. e.g. `0.0.0.0:3001`.
`rest_api.service.bind_addr.ip`	string	IP address for the observatory REST API service bind address. e.g. `192.168.1.1`.
`rest_api.service.bind_addr.port`	integer	Port for the observatory REST API service bind address. e.g. `3001`.
`rest_api.service.bind_addr.hostname`	string	Alternative to `ip`, resolved once at startup. e.g. `my-host-2`.

`[monitoring]` section

Key	Type	Description
`enabled`	boolean	Enable sending/receiving monitoring data to the observatory service. If enabled, it will use the address specified in `observatory_service.public_addr`.
`host_metrics`	object	Object used for configuring the host metrics exporter.
`host_metrics.enabled`	boolean	Enable/disable exporting host metrics from this node

`[static_leader]` section

Key	Type	Description
`leader_instance_id`	string	ID of the leader node; should match the scheduler’s `instance_id`. Typically `scheduler` to match your scheduler node.
`scheduler_service.public_addr`	string	Address at which the scheduler client service is reachable from this node. e.g. `192.168.1.1`.
`scheduler_service.public_addr.ip`	string	IP address for the scheduler client service public address. e.g. `192.168.1.1`.
`scheduler_service.public_addr.port`	integer	Port for the scheduler client service public address. e.g. `5051`.
`scheduler_service.public_addr.hostname`	string	Alternative to `ip`, resolved once at startup. e.g. `my-host-2`.
`observatory_service.public_addr`	string	Address at which the observatory service is reachable from this node. e.g. `192.168.1.1`.
`observatory_service.public_addr.ip`	string	IP address for the observatory service public address. e.g. `192.168.1.1`.
`observatory_service.public_addr.port`	integer	Port for the observatory service public address. e.g. `5049`.
`observatory_service.public_addr.hostname`	string	Alternative to `ip`, resolved once at startup. e.g. `my-host-2`.