Skip to content

Config file reference

This page describes the different configuration options for polars-on-premise. The config file is a standard TOML file with different sections. Any of the configuration can be overridden using environment variables in the following format: PC_CUBLET__section_name__key.

Top-level configuration

Key Type Description
cluster_id string Logical ID for the cluster; workers and scheduler that share this ID will form a single cluster.
e.g. prod-eu-1; must be unique among all clusters.
cublet_id string Unique ID for this node ("cublet") within the cluster, used for addressing and leader selection.
e.g. scheduler, worker_0; must be unique per cluster.
license path Absolute path to the polars-on-premise license file required to start the process.
e.g. /etc/polars/license.json.
memory_limit integer (bytes) Hard memory budget for all components in this cublet; enforced via cgroups when delegated.
e.g. 1073741824 (1 GiB), 10737418240 (10 GiB).

[scheduler] section

Key Type Description
enabled bool Whether the scheduler component runs in this process.
true for the leader node, false on pure workers.
anonymous_result_dst string (URI) Destination for results of queries that don’t have an explicit sink. Currently S3-only storage supported. Bucket must be reachable by scheduler, workers, and clients.
e.g. s3://my-bucket/path/to/dir
allow_shared_disk bool Whether workers are allowed to write to a shared/local disk visible to the scheduler.
false for fully remote/storage-only setups; true if you have a shared filesystem.
n_workers int Expected number of workers in this cluster; scheduler waits for this many to be online before running queries.
e.g. 4

[worker] section

Key Type Description
enabled bool Whether the worker component runs in this process.
true on worker nodes, false on the dedicated scheduler.
worker_ip string Public or routable IP address other workers/scheduler use to reach this worker.
e.g. 192.168.1.2
flight_port int Port for shuffle traffic between workers.
e.g. 5052
service_port int Port on which the worker receives task instructions from the scheduler.
e.g. 5053
heartbeat_interval_secs int Interval for worker heartbeats towards the scheduler, used for liveness and load reporting.
e.g. 5
shuffle_data_path path Local path where shuffle / intermediate data is stored; fast local SSD is recommended.
e.g. /opt/shuffle-data-path

[observatory] section

Key Type Description
enabled bool Enable sending/receiving profiling data so clients can call result.await_profile().
true on both scheduler and workers if you want profiles on queries; false to disable.

[static_leader] section

Key Type Description
leader_key string ID of the leader cublet; should match the scheduler’s cublet_id.
Typically scheduler to match your scheduler node.
public_leader_addr string Host/IP where the leader’s [service] is reachable from this node.
e.g. 192.168.1.1

[service] section

Key Type Description
public_address string ID of the leader cublet; should match the scheduler’s cublet_id.
Typically scheduler to match your scheduler node.
auth string Host/IP where the leader’s [service] is reachable from this node.
e.g. 192.168.1.1
connection string Host/IP where the leader’s [service] is reachable from this node.
e.g. 192.168.1.1