Config file reference
This page describes the different configuration options for polars-on-premises. The config file is a
standard TOML file with different sections. Any of the configuration can be overridden using
environment variables in the following format: PC_CUBLET__section_name__key.
Top-level configuration
| Key | Type | Description |
|---|---|---|
cluster_id |
string | Logical ID for the cluster; workers and scheduler that share this ID will form a single cluster. e.g. prod-eu-1; must be unique among all clusters. |
cublet_id |
string | Unique ID for this node ("cublet") within the cluster, used for addressing and leader selection. e.g. scheduler, worker_0; must be unique per cluster. |
license |
path | Absolute path to the polars-on-premises license file required to start the process. e.g. /etc/polars/license.json. |
memory_limit |
integer (bytes) | Hard memory budget for all components in this cublet; enforced via cgroups when delegated. e.g. 1073741824 (1 GiB), 10737418240 (10 GiB). |
[scheduler] section
| Key | Type | Description |
|---|---|---|
enabled |
bool | Whether the scheduler component runs in this process.true for the leader node, false on pure workers. |
anonymous_result_dst |
string (URI) | Destination for results of queries that don’t have an explicit sink. Currently S3-only storage supported. Bucket must be reachable by scheduler, workers, and clients. e.g. s3://my-bucket/path/to/dir |
allow_shared_disk |
bool | Whether workers are allowed to write to a shared/local disk visible to the scheduler.false for fully remote/storage-only setups; true if you have a shared filesystem. |
n_workers |
int | Expected number of workers in this cluster; scheduler waits for this many to be online before running queries. e.g. 4 |
[worker] section
| Key | Type | Description |
|---|---|---|
enabled |
bool | Whether the worker component runs in this process.true on worker nodes, false on the dedicated scheduler. |
worker_ip |
string | Public or routable IP address other workers/scheduler use to reach this worker. e.g. 192.168.1.2 |
flight_port |
int | Port for shuffle traffic between workers. e.g. 5052 |
service_port |
int | Port on which the worker receives task instructions from the scheduler. e.g. 5053 |
heartbeat_interval_secs |
int | Interval for worker heartbeats towards the scheduler, used for liveness and load reporting. e.g. 5 |
shuffle_data_path |
path | Local path where shuffle / intermediate data is stored; fast local SSD is recommended. e.g. /opt/shuffle-data-path |
[observatory] section
| Key | Type | Description |
|---|---|---|
enabled |
bool | Enable sending/receiving profiling data so clients can call result.await_profile().true on both scheduler and workers if you want profiles on queries; false to disable. |
max_metrics_bytes_total |
int | How many bytes all the worker host metrics will consume in total. If a system-wide memory limit is specified then this is added to the share that the scheduler takes. Note that the worker host metrics is not yet available, so this configuration can be set to 0. |
[static_leader] section
| Key | Type | Description |
|---|---|---|
leader_key |
string | ID of the leader cublet; should match the scheduler’s cublet_id.Typically scheduler to match your scheduler node. |
public_leader_addr |
string | Host/IP where the leader’s [service] is reachable from this node.e.g. 192.168.1.1 |
[service] section
| Key | Type | Description |
|---|---|---|
public_address |
string | ID of the leader cublet; should match the scheduler’s cublet_id.Typically scheduler to match your scheduler node. |
auth |
string | Host/IP where the leader’s [service] is reachable from this node.e.g. 192.168.1.1 |
connection |
string | Host/IP where the leader’s [service] is reachable from this node.e.g. 192.168.1.1 |