Skip to content

Config file reference

This page describes the different configuration options for Polars on-premises. The config file is a standard TOML file with different sections. Any of the configuration can be overridden using environment variables in the following format: PC_CUBLET__section_name__key.

Example configuration files can be found at Example Configurations.

See the sidebar for extensive documentation on important components and their configuration together.

Top-level configuration

Key Type Description
cluster_id string Logical ID for the cluster; workers and scheduler that share this ID will form a single cluster.
e.g. prod-eu-1; must be unique among all clusters.
instance_id string Unique ID for this node within the cluster, used for addressing and leader selection.
e.g. scheduler, worker_0; must be unique per cluster.
license path Absolute path to the Polars on-premises license file required to start the process.
e.g. /etc/polars/license.json.
memory_limit integer Hard memory budget for all components in this node; enforced via cgroups when delegated.
e.g. 1073741824 (1 GiB), 10737418240 (10 GiB).

[scheduler] section

Key Type Description
enabled boolean Whether the scheduler component runs in this process.
true for the leader node, false on pure workers.
allow_local_sinks boolean Whether workers are allowed to write to a shared/local disk visible to the scheduler.
false for fully remote/storage-only setups, true if you have a shared filesystem.
n_workers integer Expected number of workers in this cluster; scheduler waits for the latter to be online before running queries.
e.g. 4.
anonymous_result_location object Destination for results of queries that do not have an explicit sink. Currently supported local mounted (must be reachable on the exact same path and allow_local_sinks enabled) and S3-based. Both options must be network reachable by scheduler, workers, and client.
e.g. /mnt/storage/polars/results.
e.g. s3://bucket/path/to/key
anonymous_result_location.local object Object used for local disk-backed anonymous results.
anonymous_result_location.local.path path Local path where anonymous results are stored.
e.g. /mnt/storage/polars/results.
anonymous_result_location.s3 object Object used for S3-backed anonymous results.
anonymous_result_location.s3.url string S3 bucket url.
e.g. s3://bucket/path/to/key.
anonymous_result_location.s3.aws_endpoint_url string Storage option configuration, see scan_parquet().
anonymous_result_location.s3.aws_region string Storage option configuration.
e.g. eu-east-1
anonymous_result_location.s3.aws_access_key_id string Storage option configuration.
anonymous_result_location.s3.aws_secret_access_key string Storage option configuration.
client_service object Object used for configuring the bind address of the client service. This is the service used by the polars-cloud Python client. Defaults to 0.0.0.0:5051.
client_service.bind_addr string Bind address for the client service.
e.g. 0.0.0.0:5051.
client_service.bind_addr.ip string IP address for the client service bind address.
e.g. 192.168.1.1.
client_service.bind_addr.port integer Port for the client service bind address.
e.g. 5051.
client_service.bind_addr.hostname string Alternative to ip, resolved once at startup.
e.g. my-host-1.
worker_service object Object used for configuring the bind address of the worker service. This is an internal service used by the workers. Defaults to 0.0.0.0:5050.
worker_service.bind_addr string Bind address for the worker service.
e.g. 0.0.0.0:5050.
worker_service.bind_addr.ip string IP address for the worker service bind address.
e.g. 192.168.1.1.
worker_service.bind_addr.port integer Port for the worker service bind address.
e.g. 5050.
worker_service.bind_addr.hostname string Alternative to ip, resolved once at startup.
e.g. my-host-2.

[worker] section

Key Type Description
enabled boolean Whether the worker component runs in this process.
true on worker nodes, false on the dedicated scheduler.
heartbeat_period string Interval for worker heartbeats towards the scheduler, used for liveness and load reporting. Either an ISO 8601 duration format or a jiff friendly duration format (see https://docs.rs/jiff/0.2.18/jiff/fmt/friendly/)
e.g. 5 secs.
e.g. PT5S.
shuffle_location object Object used for shuffle data storage.
shuffle_location.local object Object used for local disk-backed shuffle data storage.
shuffle_location.local.path path Local path where shuffle/intermediate data is stored; fast local SSD is recommended.
e.g. /mnt/storage/polars/shuffle.
shuffle_location.shared_filesystem object Object used for shared filesystem-backed shuffle data storage.
shuffle_location.shared_filesystem.path path Shared filesystem path where shuffle/intermediate data is stored. Must be accessible by all workers on the same path.
e.g. /mnt/storage/polars/shuffle.
shuffle_location.s3 object Object used for S3-backed shuffle data storage.
shuffle_location.s3.url path Destination for shuffle/intermediate data.
e.g. s3://bucket/path/to/key.
shuffle_location.s3.aws_endpoint_url string Storage option configuration, see scan_parquet().
shuffle_location.s3.aws_region string Storage option configuration.
e.g. eu-east-1
shuffle_location.s3.aws_access_key_id string Storage option configuration.
shuffle_location.s3.aws_secret_access_key string Storage option configuration.
task_service object Object used for configuring the bind address of the task service. This is an internal service in the worker for receiving tasks from the scheduler. Defaults to 0.0.0.0:5052.
task_service.bind_addr string Bind address for the task service.
e.g. 0.0.0.0:5052.
task_service.bind_addr.ip string IP address for the task service bind address.
e.g. 192.168.1.1.
task_service.bind_addr.port integer Port for the task service bind address.
e.g. 5052.
task_service.bind_addr.hostname string Alternative to ip, resolved once at startup.
e.g. my-host-2.
task_service.public_addr string Address at which this service is reachable by the scheduler. Defaults to the bind address if not set. This field is required when the bind address is 0.0.0.0.
e.g. 192.168.1.1.
task_service.public_addr.ip string IP address for the task service public address.
e.g. 192.168.1.2.
task_service.public_addr.port integer Port for the task service public address.
e.g. 5052.
task_service.public_addr.hostname string Alternative to ip, resolved once at startup.
e.g. my-host-2.
shuffle_service object Object used for configuring the bind address of the task service. This is an internal service in the worker for receiving tasks from the scheduler. Defaults to 0.0.0.0:5052.
shuffle_service.bind_addr string Bind address for the task service.
e.g. 0.0.0.0:5053.
shuffle_service.bind_addr.ip string IP address for the task service bind address.
e.g. 192.168.1.1.
shuffle_service.bind_addr.port integer Port for the task service bind address.
e.g. 5053.
shuffle_service.bind_addr.hostname string Alternative to ip, resolved once at startup.
e.g. my-host-2.
shuffle_service.public_addr string Address at which this service is reachable by the scheduler. Defaults to the bind address if not set. This field is required when the bind address is 0.0.0.0.
e.g. 192.168.1.1.
shuffle_service.public_addr.ip string IP address for the task service public address.
e.g. 192.168.1.2.
shuffle_service.public_addr.port integer Port for the task service public address.
e.g. 5053.
shuffle_service.public_addr.hostname string Alternative to ip, resolved once at startup.
e.g. my-host-2.

[observatory] section

Key Type Description
enabled boolean Enable sending/receiving profiling data so clients can call result.await_profile().
true on both scheduler and workers if you want profiles on queries; false to disable.
max_metrics_bytes_total integer How many bytes all the worker host metrics will consume in total. If a system-wide memory limit is specified then this is added to the share that the scheduler takes. For every worker, about 50 bytes of metrics are stored per second.
database_path string Location to use for storing profiling data. An SQLite database file will be created here, or if a file already exists it will be opened. If this points to a directory, a file in that directory will be created. Polars on-premises will automatically add the cluster_id to this file name to ensure uniqueness within the directory.
service object Object used for configuring the bind address of the observatory service. This is an internal service in the scheduler for receiving profiling data from all nodes. Defaults to 0.0.0.0:5049.
service.bind_addr string Bind address for the observatory service.
e.g. 0.0.0.0:5049.
service.bind_addr.ip string IP address for the observatory service bind address.
e.g. 192.168.1.1.
service.bind_addr.port integer Port for the observatory service bind address.
e.g. 5049.
service.bind_addr.hostname string Alternative to ip, resolved once at startup.
e.g. my-host-2.
rest_api.enabled boolean By default enabled for exposing the observatory REST API. This is a public service for accessing the profiling data and host metrics data through a web interface.
rest_api.service object Object used for configuring the bind address of the observatory REST API service. Defaults to 0.0.0.0:3001.
rest_api.service.bind_addr string Bind address for the observatory REST API service.
e.g. 0.0.0.0:3001.
rest_api.service.bind_addr.ip string IP address for the observatory REST API service bind address.
e.g. 192.168.1.1.
rest_api.service.bind_addr.port integer Port for the observatory REST API service bind address.
e.g. 3001.
rest_api.service.bind_addr.hostname string Alternative to ip, resolved once at startup.
e.g. my-host-2.

[monitoring] section

Key Type Description
enabled boolean Enable sending/receiving monitoring data to the observatory service. If enabled, it will use the address specified in observatory_service.public_addr.
host_metrics object Object used for configuring the host metrics exporter.
host_metrics.enabled boolean Enable/disable exporting host metrics from this node

[static_leader] section

Key Type Description
leader_instance_id string ID of the leader node; should match the scheduler’s instance_id.
Typically scheduler to match your scheduler node.
scheduler_service.public_addr string Address at which the scheduler client service is reachable from this node.
e.g. 192.168.1.1.
scheduler_service.public_addr.ip string IP address for the scheduler client service public address.
e.g. 192.168.1.1.
scheduler_service.public_addr.port integer Port for the scheduler client service public address.
e.g. 5051.
scheduler_service.public_addr.hostname string Alternative to ip, resolved once at startup.
e.g. my-host-2.
observatory_service.public_addr string Address at which the observatory service is reachable from this node.
e.g. 192.168.1.1.
observatory_service.public_addr.ip string IP address for the observatory service public address.
e.g. 192.168.1.1.
observatory_service.public_addr.port integer Port for the observatory service public address.
e.g. 5049.
observatory_service.public_addr.hostname string Alternative to ip, resolved once at startup.
e.g. my-host-2.