Skip to content

Polars user guide

Config file reference

Config file reference

This page describes the different configuration options for polars-on-premise. The config file is a standard TOML file with different sections. Any of the configuration can be overridden using environment variables in the following format: PC_CUBLET__section_name__key.

Top-level configuration

Key	Type	Description
`cluster_id`	string	Logical ID for the cluster; workers and scheduler that share this ID will form a single cluster. e.g. `prod-eu-1`; must be unique among all clusters.
`cublet_id`	string	Unique ID for this node ("cublet") within the cluster, used for addressing and leader selection. e.g. `scheduler`, `worker_0`; must be unique per cluster.
`license`	path	Absolute path to the polars-on-premise license file required to start the process. e.g. `/etc/polars/license.json`.
`memory_limit`	integer (bytes)	Hard memory budget for all components in this cublet; enforced via cgroups when delegated. e.g. `1073741824` (1 GiB), `10737418240` (10 GiB).

`[scheduler]` section

Key	Type	Description
`enabled`	bool	Whether the scheduler component runs in this process. `true` for the leader node, `false` on pure workers.
`anonymous_result_dst`	string (URI)	Destination for results of queries that don’t have an explicit sink. Currently S3-only storage supported. Bucket must be reachable by scheduler, workers, and clients. e.g. `s3://my-bucket/path/to/dir`
`allow_shared_disk`	bool	Whether workers are allowed to write to a shared/local disk visible to the scheduler. `false` for fully remote/storage-only setups; `true` if you have a shared filesystem.
`n_workers`	int	Expected number of workers in this cluster; scheduler waits for this many to be online before running queries. e.g. `4`

`[worker]` section

Key	Type	Description
`enabled`	bool	Whether the worker component runs in this process. `true` on worker nodes, `false` on the dedicated scheduler.
`worker_ip`	string	Public or routable IP address other workers/scheduler use to reach this worker. e.g. `192.168.1.2`
`flight_port`	int	Port for shuffle traffic between workers. e.g. `5052`
`service_port`	int	Port on which the worker receives task instructions from the scheduler. e.g. `5053`
`heartbeat_interval_secs`	int	Interval for worker heartbeats towards the scheduler, used for liveness and load reporting. e.g. `5`
`shuffle_data_path`	path	Local path where shuffle / intermediate data is stored; fast local SSD is recommended. e.g. `/opt/shuffle-data-path`

`[observatory]` section

Key	Type	Description
`enabled`	bool	Enable sending/receiving profiling data so clients can call `result.await_profile()`. `true` on both scheduler and workers if you want profiles on queries; `false` to disable.

`[static_leader]` section

Key	Type	Description
`leader_key`	string	ID of the leader cublet; should match the scheduler’s `cublet_id`. Typically `scheduler` to match your scheduler node.
`public_leader_addr`	string	Host/IP where the leader’s `[service]` is reachable from this node. e.g. `192.168.1.1`

`[service]` section

Key	Type	Description
`public_address`	string	ID of the leader cublet; should match the scheduler’s `cublet_id`. Typically `scheduler` to match your scheduler node.
`auth`	string	Host/IP where the leader’s `[service]` is reachable from this node. e.g. `192.168.1.1`
`connection`	string	Host/IP where the leader’s `[service]` is reachable from this node. e.g. `192.168.1.1`