Config file reference
This page describes the different configuration options for Polars on-premises. The config file is a
standard TOML file with different sections. Any of the configuration can be overridden using
environment variables in the following format: PC_CUBLET__section_name__key.
Example configuration files can be found at Example Configurations.
See the sidebar for extensive documentation on important components and their configuration together.
Top-level configuration
| Key | Type | Description |
|---|---|---|
cluster_id |
string | Logical ID for the cluster; workers and scheduler that share this ID will form a single cluster. e.g. prod-eu-1; must be unique among all clusters. |
instance_id |
string | Unique ID for this node within the cluster, used for addressing and leader selection. e.g. scheduler, worker_0; must be unique per cluster. |
license |
path | Absolute path to the Polars on-premises license file required to start the process. e.g. /etc/polars/license.json. |
memory_limit |
integer | Hard memory budget for all components in this node; enforced via cgroups when delegated. e.g. 1073741824 (1 GiB), 10737418240 (10 GiB). |
[scheduler] section
| Key | Type | Description |
|---|---|---|
enabled |
boolean | Whether the scheduler component runs in this process.true for the leader node, false on pure workers. |
allow_local_sinks |
boolean | Whether workers are allowed to write to a shared/local disk visible to the scheduler.false for fully remote/storage-only setups, true if you have a shared filesystem. |
n_workers |
integer | Expected number of workers in this cluster; scheduler waits for the latter to be online before running queries. e.g. 4. |
anonymous_result_location |
object | Destination for results of queries that do not have an explicit sink. Currently supported local mounted (must be reachable on the exact same path and allow_local_sinks enabled) and S3-based. Both options must be network reachable by scheduler, workers, and client.e.g. /mnt/storage/polars/results.e.g. s3://bucket/path/to/key |
anonymous_result_location.local |
object | Object used for local disk-backed anonymous results. |
anonymous_result_location.local.path |
path | Local path where anonymous results are stored. e.g. /mnt/storage/polars/results. |
anonymous_result_location.s3 |
object | Object used for S3-backed anonymous results. |
anonymous_result_location.s3.url |
string | S3 bucket url. e.g. s3://bucket/path/to/key. |
anonymous_result_location.s3.aws_endpoint_url |
string | Storage option configuration, see scan_parquet(). |
anonymous_result_location.s3.aws_region |
string | Storage option configuration. e.g. eu-east-1 |
anonymous_result_location.s3.aws_access_key_id |
string | Storage option configuration. |
anonymous_result_location.s3.aws_secret_access_key |
string | Storage option configuration. |
client_service |
object | Object used for configuring the bind address of the client service. This is the service used by the polars-cloud Python client. Defaults to 0.0.0.0:5051. |
client_service.bind_addr |
string | Bind address for the client service. e.g. 0.0.0.0:5051. |
client_service.bind_addr.ip |
string | IP address for the client service bind address. e.g. 192.168.1.1. |
client_service.bind_addr.port |
integer | Port for the client service bind address. e.g. 5051. |
client_service.bind_addr.hostname |
string | Alternative to ip, resolved once at startup.e.g. my-host-1. |
worker_service |
object | Object used for configuring the bind address of the worker service. This is an internal service used by the workers. Defaults to 0.0.0.0:5050. |
worker_service.bind_addr |
string | Bind address for the worker service. e.g. 0.0.0.0:5050. |
worker_service.bind_addr.ip |
string | IP address for the worker service bind address. e.g. 192.168.1.1. |
worker_service.bind_addr.port |
integer | Port for the worker service bind address. e.g. 5050. |
worker_service.bind_addr.hostname |
string | Alternative to ip, resolved once at startup.e.g. my-host-2. |
[worker] section
| Key | Type | Description |
|---|---|---|
enabled |
boolean | Whether the worker component runs in this process.true on worker nodes, false on the dedicated scheduler. |
heartbeat_period |
string | Interval for worker heartbeats towards the scheduler, used for liveness and load reporting. Either an ISO 8601 duration format or a jiff friendly duration format (see https://docs.rs/jiff/0.2.18/jiff/fmt/friendly/) e.g. 5 secs.e.g. PT5S. |
shuffle_location |
object | Object used for shuffle data storage. |
shuffle_location.local |
object | Object used for local disk-backed shuffle data storage. |
shuffle_location.local.path |
path | Local path where shuffle/intermediate data is stored; fast local SSD is recommended. e.g. /mnt/storage/polars/shuffle. |
shuffle_location.shared_filesystem |
object | Object used for shared filesystem-backed shuffle data storage. |
shuffle_location.shared_filesystem.path |
path | Shared filesystem path where shuffle/intermediate data is stored. Must be accessible by all workers on the same path. e.g. /mnt/storage/polars/shuffle. |
shuffle_location.s3 |
object | Object used for S3-backed shuffle data storage. |
shuffle_location.s3.url |
path | Destination for shuffle/intermediate data. e.g. s3://bucket/path/to/key. |
shuffle_location.s3.aws_endpoint_url |
string | Storage option configuration, see scan_parquet(). |
shuffle_location.s3.aws_region |
string | Storage option configuration. e.g. eu-east-1 |
shuffle_location.s3.aws_access_key_id |
string | Storage option configuration. |
shuffle_location.s3.aws_secret_access_key |
string | Storage option configuration. |
task_service |
object | Object used for configuring the bind address of the task service. This is an internal service in the worker for receiving tasks from the scheduler. Defaults to 0.0.0.0:5052. |
task_service.bind_addr |
string | Bind address for the task service. e.g. 0.0.0.0:5052. |
task_service.bind_addr.ip |
string | IP address for the task service bind address. e.g. 192.168.1.1. |
task_service.bind_addr.port |
integer | Port for the task service bind address. e.g. 5052. |
task_service.bind_addr.hostname |
string | Alternative to ip, resolved once at startup.e.g. my-host-2. |
task_service.public_addr |
string | Address at which this service is reachable by the scheduler. Defaults to the bind address if not set. This field is required when the bind address is 0.0.0.0.e.g. 192.168.1.1. |
task_service.public_addr.ip |
string | IP address for the task service public address. e.g. 192.168.1.2. |
task_service.public_addr.port |
integer | Port for the task service public address. e.g. 5052. |
task_service.public_addr.hostname |
string | Alternative to ip, resolved once at startup.e.g. my-host-2. |
shuffle_service |
object | Object used for configuring the bind address of the task service. This is an internal service in the worker for receiving tasks from the scheduler. Defaults to 0.0.0.0:5052. |
shuffle_service.bind_addr |
string | Bind address for the task service. e.g. 0.0.0.0:5053. |
shuffle_service.bind_addr.ip |
string | IP address for the task service bind address. e.g. 192.168.1.1. |
shuffle_service.bind_addr.port |
integer | Port for the task service bind address. e.g. 5053. |
shuffle_service.bind_addr.hostname |
string | Alternative to ip, resolved once at startup.e.g. my-host-2. |
shuffle_service.public_addr |
string | Address at which this service is reachable by the scheduler. Defaults to the bind address if not set. This field is required when the bind address is 0.0.0.0.e.g. 192.168.1.1. |
shuffle_service.public_addr.ip |
string | IP address for the task service public address. e.g. 192.168.1.2. |
shuffle_service.public_addr.port |
integer | Port for the task service public address. e.g. 5053. |
shuffle_service.public_addr.hostname |
string | Alternative to ip, resolved once at startup.e.g. my-host-2. |
[observatory] section
| Key | Type | Description |
|---|---|---|
enabled |
boolean | Enable sending/receiving profiling data so clients can call result.await_profile().true on both scheduler and workers if you want profiles on queries; false to disable. |
max_metrics_bytes_total |
integer | How many bytes all the worker host metrics will consume in total. If a system-wide memory limit is specified then this is added to the share that the scheduler takes. For every worker, about 50 bytes of metrics are stored per second. |
database_path |
string | Location to use for storing profiling data. An SQLite database file will be created here, or if a file already exists it will be opened. If this points to a directory, a file in that directory will be created. Polars on-premises will automatically add the cluster_id to this file name to ensure uniqueness within the directory. |
service |
object | Object used for configuring the bind address of the observatory service. This is an internal service in the scheduler for receiving profiling data from all nodes. Defaults to 0.0.0.0:5049. |
service.bind_addr |
string | Bind address for the observatory service. e.g. 0.0.0.0:5049. |
service.bind_addr.ip |
string | IP address for the observatory service bind address. e.g. 192.168.1.1. |
service.bind_addr.port |
integer | Port for the observatory service bind address. e.g. 5049. |
service.bind_addr.hostname |
string | Alternative to ip, resolved once at startup.e.g. my-host-2. |
rest_api.enabled |
boolean | By default enabled for exposing the observatory REST API. This is a public service for accessing the profiling data and host metrics data through a web interface. |
rest_api.service |
object | Object used for configuring the bind address of the observatory REST API service. Defaults to 0.0.0.0:3001. |
rest_api.service.bind_addr |
string | Bind address for the observatory REST API service. e.g. 0.0.0.0:3001. |
rest_api.service.bind_addr.ip |
string | IP address for the observatory REST API service bind address. e.g. 192.168.1.1. |
rest_api.service.bind_addr.port |
integer | Port for the observatory REST API service bind address. e.g. 3001. |
rest_api.service.bind_addr.hostname |
string | Alternative to ip, resolved once at startup.e.g. my-host-2. |
[monitoring] section
| Key | Type | Description |
|---|---|---|
enabled |
boolean | Enable sending/receiving monitoring data to the observatory service. If enabled, it will use the address specified in observatory_service.public_addr. |
host_metrics |
object | Object used for configuring the host metrics exporter. |
host_metrics.enabled |
boolean | Enable/disable exporting host metrics from this node |
[static_leader] section
| Key | Type | Description |
|---|---|---|
leader_instance_id |
string | ID of the leader node; should match the scheduler’s instance_id.Typically scheduler to match your scheduler node. |
scheduler_service.public_addr |
string | Address at which the scheduler client service is reachable from this node. e.g. 192.168.1.1. |
scheduler_service.public_addr.ip |
string | IP address for the scheduler client service public address. e.g. 192.168.1.1. |
scheduler_service.public_addr.port |
integer | Port for the scheduler client service public address. e.g. 5051. |
scheduler_service.public_addr.hostname |
string | Alternative to ip, resolved once at startup.e.g. my-host-2. |
observatory_service.public_addr |
string | Address at which the observatory service is reachable from this node. e.g. 192.168.1.1. |
observatory_service.public_addr.ip |
string | IP address for the observatory service public address. e.g. 192.168.1.1. |
observatory_service.public_addr.port |
integer | Port for the observatory service public address. e.g. 5049. |
observatory_service.public_addr.hostname |
string | Alternative to ip, resolved once at startup.e.g. my-host-2. |