Skip to content

Config file reference

This page describes the different configuration options for Polars on-premises. The config file is a standard TOML file with different sections. Any of the configuration can be overridden using environment variables in the following format: PC_CUBLET__section_name__key.

Top-level configuration

The polars-on-premises binary requires a license which path is provided as a configuration option, listed below. The license itself has the following shape:

{ "params": { "expiry": "2026-01-31T23:59:59Z", "name": "Company" }, "signature": "..." }
Key Type Description
cluster_id string Logical ID for the cluster; workers and scheduler that share this ID will form a single cluster.
e.g. prod-eu-1; must be unique among all clusters.
instance_id string Unique ID for this node within the cluster, used for addressing and leader selection.
e.g. scheduler, worker_0; must be unique per cluster.
license path Absolute path to the Polars on-premises license file required to start the process.
e.g. /etc/polars/license.json.
memory_limit integer Hard memory budget for all components in this node; enforced via cgroups when delegated.
e.g. 1073741824 (1 GiB), 10737418240 (10 GiB).

Example:

cluster_id = "polars-cluster-dev"
instance_id = "scheduler"
license = "/etc/polars/license.json"
memory_limit = 1073741824 # 1 GiB

[scheduler] section

For remote Polars queries without a specific output sink, Polars on-premises can automatically add persistent sink. We call these sinks "anonymous results" sinks. Infrastructure-wise, these sinks are backed by S3-compatible storage accessible from all worker nodes and the Python client. The data written to this location is not automatically deleted, so you need to configure a retention policy for this data yourself.

You may configure the credentials using the options listed below; the key names correspond to the storage_options parameter from the scan_parquet() method (e.g. aws_access_key_id, aws_secret_access_key, aws_session_token, aws_region). We currently only support the AWS keys of the storage_options dictionary, but note that you can use any other cloud provider that supports the S3 API, such as MinIO or DigitalOcean Spaces.

Key Type Description
enabled boolean Whether the scheduler component runs in this process.
true for the leader node, false on pure workers.
allow_local_sinks boolean Whether workers are allowed to write to a shared/local disk visible to the scheduler.
false for fully remote/storage-only setups, true if you have a shared filesystem.
n_workers integer Expected number of workers in this cluster; scheduler waits for the latter to be online before running queries.
e.g. 4.
anonymous_result_location object Destination for results of queries that do not have an explicit sink. Currently supported local mounted (must be reachable on the exact same path and allow_local_sinks enabled) and S3-based. Both options must be network reachable by scheduler, workers, and client.
e.g. /mnt/storage/polars/results.
e.g. s3://bucket/path/to/key
anonymous_result_location.local object Object used for local disk-backed anonymous results.
anonymous_result_location.local.path path Local path where anonymous results are stored.
e.g. /mnt/storage/polars/results.
anonymous_result_location.s3 object Object used for S3-backed anonymous results.
anonymous_result_location.s3.url string S3 bucket url.
e.g. s3://bucket/path/to/key.
anonymous_result_location.s3.aws_endpoint_url string Storage option configuration, see scan_parquet().
anonymous_result_location.s3.aws_region string Storage option configuration.
e.g. eu-east-1
anonymous_result_location.s3.aws_access_key_id string Storage option configuration.
anonymous_result_location.s3.aws_secret_access_key string Storage option configuration.
client_service object Object used for configuring the bind address of the client service. This is the service used by the polars-cloud Python client. Defaults to 0.0.0.0:5051.
client_service.bind_addr string Bind address for the client service.
e.g. 0.0.0.0:5051.
client_service.bind_addr.ip string IP address for the client service bind address.
e.g. 192.168.1.1.
client_service.bind_addr.port integer Port for the client service bind address.
e.g. 5051.
client_service.bind_addr.hostname string Alternative to ip, resolved once at startup.
e.g. my-host-1.
worker_service object Object used for configuring the bind address of the worker service. This is an internal service used by the workers. Defaults to 0.0.0.0:5050.
worker_service.bind_addr string Bind address for the worker service.
e.g. 0.0.0.0:5050.
worker_service.bind_addr.ip string IP address for the worker service bind address.
e.g. 192.168.1.1.
worker_service.bind_addr.port integer Port for the worker service bind address.
e.g. 5050.
worker_service.bind_addr.hostname string Alternative to ip, resolved once at startup.
e.g. my-host-2.

Example:

[scheduler]
enabled = true
allow_local_sinks = false
n_workers = 4
anonymous_result_location.s3.url = "s3://bucket/path/to/key"
anonymous_result_location.s3.aws_secret_access_key = "YOURSECRETKEY"
anonymous_result_location.s3.aws_access_key_id = "YOURACCESSKEY"
client_service.bind_addr = "0.0.0.0:5051"
worker_service.bind_addr.hostname = "my-host-2"

Example with mounted local disk as anonymous result destination:

[scheduler]
enabled = true
allow_local_sinks = true
anonymous_result_location = "/mnt/storage/polars/results"

[worker] section

During distributed query execution, data may be shuffled between workers. A local path can be provided, but shuffles can also be configured to use S3-compatible storage (accessible from all worker nodes). You may configure the credentials using the options listed below; the key names correspond to the storage_options parameter from the scan_parquet() method (e.g. aws_access_key_id, aws_secret_access_key, aws_session_token, aws_region).

Key Type Description
enabled boolean Whether the worker component runs in this process.
true on worker nodes, false on the dedicated scheduler.
heartbeat_period string Interval for worker heartbeats towards the scheduler, used for liveness and load reporting. Either an ISO 8601 duration format or a jiff friendly duration format (see https://docs.rs/jiff/0.2.18/jiff/fmt/friendly/)
e.g. 5 secs.
e.g. PT5S.
shuffle_location object Object used for shuffle data storage.
shuffle_location.local object Object used for local disk-backed shuffle data storage.
shuffle_location.local.path path Local path where shuffle/intermediate data is stored; fast local SSD is recommended.
e.g. /mnt/storage/polars/shuffle.
shuffle_location.s3 object Object used for S3-backed shuffle data storage.
shuffle_location.s3.url path Destination for shuffle/intermediate data.
e.g. s3://bucket/path/to/key.
shuffle_location.s3.aws_endpoint_url string Storage option configuration, see scan_parquet().
shuffle_location.s3.aws_region string Storage option configuration.
e.g. eu-east-1
shuffle_location.s3.aws_access_key_id string Storage option configuration.
shuffle_location.s3.aws_secret_access_key string Storage option configuration.
task_service object Object used for configuring the bind address of the task service. This is an internal service in the worker for receiving tasks from the scheduler. Defaults to 0.0.0.0:5052.
task_service.bind_addr string Bind address for the task service.
e.g. 0.0.0.0:5052.
task_service.bind_addr.ip string IP address for the task service bind address.
e.g. 192.168.1.1.
task_service.bind_addr.port integer Port for the task service bind address.
e.g. 5052.
task_service.bind_addr.hostname string Alternative to ip, resolved once at startup.
e.g. my-host-2.
task_service.public_addr string Address at which this service is reachable by the scheduler. Defaults to the bind address if not set. This field is required when the bind address is 0.0.0.0.
e.g. 192.168.1.1.
task_service.public_addr.ip string IP address for the task service public address.
e.g. 192.168.1.2.
task_service.public_addr.port integer Port for the task service public address.
e.g. 5052.
task_service.public_addr.hostname string Alternative to ip, resolved once at startup.
e.g. my-host-2.
shuffle_service object Object used for configuring the bind address of the task service. This is an internal service in the worker for receiving tasks from the scheduler. Defaults to 0.0.0.0:5052.
shuffle_service.bind_addr string Bind address for the task service.
e.g. 0.0.0.0:5053.
shuffle_service.bind_addr.ip string IP address for the task service bind address.
e.g. 192.168.1.1.
shuffle_service.bind_addr.port integer Port for the task service bind address.
e.g. 5053.
shuffle_service.bind_addr.hostname string Alternative to ip, resolved once at startup.
e.g. my-host-2.
shuffle_service.public_addr string Address at which this service is reachable by the scheduler. Defaults to the bind address if not set. This field is required when the bind address is 0.0.0.0.
e.g. 192.168.1.1.
shuffle_service.public_addr.ip string IP address for the task service public address.
e.g. 192.168.1.2.
shuffle_service.public_addr.port integer Port for the task service public address.
e.g. 5053.
shuffle_service.public_addr.hostname string Alternative to ip, resolved once at startup.
e.g. my-host-2.

Example:

[worker]
enabled = true
heartbeat_period = "5 secs"
task_service.bind_addr = "0.0.0.0:1234"
task_service.public_addr.hostname = "my-host-2"
shuffle_service.public_addr.hostname = "my-host-2"
shuffle_location.local.path = "/mnt/storage/polars/shuffle"

[observatory] section

Key Type Description
enabled boolean Enable sending/receiving profiling data so clients can call result.await_profile().
true on both scheduler and workers if you want profiles on queries; false to disable.
max_metrics_bytes_total integer How many bytes all the worker host metrics will consume in total. If a system-wide memory limit is specified then this is added to the share that the scheduler takes. Note that the worker host metrics is not yet available, so this configuration can be set to 0.
service object Object used for configuring the bind address of the observatory service. This is an internal service in the scheduler for receiving profiling data from all nodes. Defaults to 0.0.0.0:5049.
service.bind_addr string Bind address for the observatory service.
e.g. 0.0.0.0:5049.
service.bind_addr.ip string IP address for the observatory service bind address.
e.g. 192.168.1.1.
service.bind_addr.port integer Port for the observatory service bind address.
e.g. 5049.
service.bind_addr.hostname string Alternative to ip, resolved once at startup.
e.g. my-host-2.

Example:

[observatory]
enabled = true
max_metrics_bytes_total = 0

[monitoring] section

Key Type Description
enabled boolean Enable sending/receiving monitoring data to the observatory service. If enabled, it will use the address specified in observatory_service.public_addr.

Example:

[monitoring]
enabled = true

[static_leader] section

Key Type Description
leader_instance_id string ID of the leader node; should match the scheduler’s instance_id.
Typically scheduler to match your scheduler node.
scheduler_service.public_addr string Address at which the scheduler client service is reachable from this node.
e.g. 192.168.1.1.
scheduler_service.public_addr.ip string IP address for the scheduler client service public address.
e.g. 192.168.1.1.
scheduler_service.public_addr.port integer Port for the scheduler client service public address.
e.g. 5051.
scheduler_service.public_addr.hostname string Alternative to ip, resolved once at startup.
e.g. my-host-2.
observatory_service.public_addr string Address at which the observatory service is reachable from this node.
e.g. 192.168.1.1.
observatory_service.public_addr.ip string IP address for the observatory service public address.
e.g. 192.168.1.1.
observatory_service.public_addr.port integer Port for the observatory service public address.
e.g. 5049.
observatory_service.public_addr.hostname string Alternative to ip, resolved once at startup.
e.g. my-host-2.

Example:

[static_leader]
leader_instance_id = "scheduler"
observatory_service.public_addr = "127.0.0.1"
scheduler_service.public_addr = "127.0.0.1"