Config file reference
This page describes the different configuration options for Polars on-premises. The config file is a
standard TOML file with different sections. Any of the configuration can be overridden using
environment variables in the following format: PC_CUBLET__section_name__key.
Top-level configuration
The polars-on-premises binary requires a license which path is provided as a configuration option,
listed below. The license itself has the following shape:
{ "params": { "expiry": "2026-01-31T23:59:59Z", "name": "Company" }, "signature": "..." }
| Key | Type | Description |
|---|---|---|
cluster_id |
string | Logical ID for the cluster; workers and scheduler that share this ID will form a single cluster. e.g. prod-eu-1; must be unique among all clusters. |
instance_id |
string | Unique ID for this node within the cluster, used for addressing and leader selection. e.g. scheduler, worker_0; must be unique per cluster. |
license |
path | Absolute path to the Polars on-premises license file required to start the process. e.g. /etc/polars/license.json. |
memory_limit |
integer | Hard memory budget for all components in this node; enforced via cgroups when delegated. e.g. 1073741824 (1 GiB), 10737418240 (10 GiB). |
Example:
cluster_id = "polars-cluster-dev"
instance_id = "scheduler"
license = "/etc/polars/license.json"
memory_limit = 1073741824 # 1 GiB
[scheduler] section
For remote Polars queries without a specific output sink, Polars on-premises can automatically add persistent sink. We call these sinks "anonymous results" sinks. Infrastructure-wise, these sinks are backed by S3-compatible storage accessible from all worker nodes and the Python client. The data written to this location is not automatically deleted, so you need to configure a retention policy for this data yourself.
You may configure the credentials using the options listed below; the key names correspond to the
storage_options parameter from the scan_parquet() method
(e.g. aws_access_key_id, aws_secret_access_key, aws_session_token, aws_region). We
currently only support the AWS keys of the storage_options dictionary, but note that you can use
any other cloud provider that supports the S3 API, such as MinIO or DigitalOcean Spaces.
| Key | Type | Description |
|---|---|---|
enabled |
boolean | Whether the scheduler component runs in this process.true for the leader node, false on pure workers. |
allow_local_sinks |
boolean | Whether workers are allowed to write to a shared/local disk visible to the scheduler.false for fully remote/storage-only setups, true if you have a shared filesystem. |
n_workers |
integer | Expected number of workers in this cluster; scheduler waits for the latter to be online before running queries. e.g. 4. |
anonymous_result_location |
object | Destination for results of queries that do not have an explicit sink. Currently supported local mounted (must be reachable on the exact same path and allow_local_sinks enabled) and S3-based. Both options must be network reachable by scheduler, workers, and client.e.g. /mnt/storage/polars/results.e.g. s3://bucket/path/to/key |
anonymous_result_location.local |
object | Object used for local disk-backed anonymous results. |
anonymous_result_location.local.path |
path | Local path where anonymous results are stored. e.g. /mnt/storage/polars/results. |
anonymous_result_location.s3 |
object | Object used for S3-backed anonymous results. |
anonymous_result_location.s3.url |
string | S3 bucket url. e.g. s3://bucket/path/to/key. |
anonymous_result_location.s3.aws_endpoint_url |
string | Storage option configuration, see scan_parquet(). |
anonymous_result_location.s3.aws_region |
string | Storage option configuration. e.g. eu-east-1 |
anonymous_result_location.s3.aws_access_key_id |
string | Storage option configuration. |
anonymous_result_location.s3.aws_secret_access_key |
string | Storage option configuration. |
client_service |
object | Object used for configuring the bind address of the client service. This is the service used by the polars-cloud Python client. Defaults to 0.0.0.0:5051. |
client_service.bind_addr |
string | Bind address for the client service. e.g. 0.0.0.0:5051. |
client_service.bind_addr.ip |
string | IP address for the client service bind address. e.g. 192.168.1.1. |
client_service.bind_addr.port |
integer | Port for the client service bind address. e.g. 5051. |
client_service.bind_addr.hostname |
string | Alternative to ip, resolved once at startup.e.g. my-host-1. |
worker_service |
object | Object used for configuring the bind address of the worker service. This is an internal service used by the workers. Defaults to 0.0.0.0:5050. |
worker_service.bind_addr |
string | Bind address for the worker service. e.g. 0.0.0.0:5050. |
worker_service.bind_addr.ip |
string | IP address for the worker service bind address. e.g. 192.168.1.1. |
worker_service.bind_addr.port |
integer | Port for the worker service bind address. e.g. 5050. |
worker_service.bind_addr.hostname |
string | Alternative to ip, resolved once at startup.e.g. my-host-2. |
Example:
[scheduler]
enabled = true
allow_local_sinks = false
n_workers = 4
anonymous_result_location.s3.url = "s3://bucket/path/to/key"
anonymous_result_location.s3.aws_secret_access_key = "YOURSECRETKEY"
anonymous_result_location.s3.aws_access_key_id = "YOURACCESSKEY"
client_service.bind_addr = "0.0.0.0:5051"
worker_service.bind_addr.hostname = "my-host-2"
Example with mounted local disk as anonymous result destination:
[scheduler]
enabled = true
allow_local_sinks = true
anonymous_result_location = "/mnt/storage/polars/results"
[worker] section
During distributed query execution, data may be shuffled between workers. A local path can be
provided, but shuffles can also be configured to use S3-compatible storage (accessible from all
worker nodes). You may configure the credentials using the options listed below; the key names
correspond to the
storage_options parameter from the scan_parquet() method
(e.g. aws_access_key_id, aws_secret_access_key, aws_session_token, aws_region).
| Key | Type | Description |
|---|---|---|
enabled |
boolean | Whether the worker component runs in this process.true on worker nodes, false on the dedicated scheduler. |
heartbeat_period |
string | Interval for worker heartbeats towards the scheduler, used for liveness and load reporting. Either an ISO 8601 duration format or a jiff friendly duration format (see https://docs.rs/jiff/0.2.18/jiff/fmt/friendly/) e.g. 5 secs.e.g. PT5S. |
shuffle_location |
object | Object used for shuffle data storage. |
shuffle_location.local |
object | Object used for local disk-backed shuffle data storage. |
shuffle_location.local.path |
path | Local path where shuffle/intermediate data is stored; fast local SSD is recommended. e.g. /mnt/storage/polars/shuffle. |
shuffle_location.s3 |
object | Object used for S3-backed shuffle data storage. |
shuffle_location.s3.url |
path | Destination for shuffle/intermediate data. e.g. s3://bucket/path/to/key. |
shuffle_location.s3.aws_endpoint_url |
string | Storage option configuration, see scan_parquet(). |
shuffle_location.s3.aws_region |
string | Storage option configuration. e.g. eu-east-1 |
shuffle_location.s3.aws_access_key_id |
string | Storage option configuration. |
shuffle_location.s3.aws_secret_access_key |
string | Storage option configuration. |
task_service |
object | Object used for configuring the bind address of the task service. This is an internal service in the worker for receiving tasks from the scheduler. Defaults to 0.0.0.0:5052. |
task_service.bind_addr |
string | Bind address for the task service. e.g. 0.0.0.0:5052. |
task_service.bind_addr.ip |
string | IP address for the task service bind address. e.g. 192.168.1.1. |
task_service.bind_addr.port |
integer | Port for the task service bind address. e.g. 5052. |
task_service.bind_addr.hostname |
string | Alternative to ip, resolved once at startup.e.g. my-host-2. |
task_service.public_addr |
string | Address at which this service is reachable by the scheduler. Defaults to the bind address if not set. This field is required when the bind address is 0.0.0.0.e.g. 192.168.1.1. |
task_service.public_addr.ip |
string | IP address for the task service public address. e.g. 192.168.1.2. |
task_service.public_addr.port |
integer | Port for the task service public address. e.g. 5052. |
task_service.public_addr.hostname |
string | Alternative to ip, resolved once at startup.e.g. my-host-2. |
shuffle_service |
object | Object used for configuring the bind address of the task service. This is an internal service in the worker for receiving tasks from the scheduler. Defaults to 0.0.0.0:5052. |
shuffle_service.bind_addr |
string | Bind address for the task service. e.g. 0.0.0.0:5053. |
shuffle_service.bind_addr.ip |
string | IP address for the task service bind address. e.g. 192.168.1.1. |
shuffle_service.bind_addr.port |
integer | Port for the task service bind address. e.g. 5053. |
shuffle_service.bind_addr.hostname |
string | Alternative to ip, resolved once at startup.e.g. my-host-2. |
shuffle_service.public_addr |
string | Address at which this service is reachable by the scheduler. Defaults to the bind address if not set. This field is required when the bind address is 0.0.0.0.e.g. 192.168.1.1. |
shuffle_service.public_addr.ip |
string | IP address for the task service public address. e.g. 192.168.1.2. |
shuffle_service.public_addr.port |
integer | Port for the task service public address. e.g. 5053. |
shuffle_service.public_addr.hostname |
string | Alternative to ip, resolved once at startup.e.g. my-host-2. |
Example:
[worker]
enabled = true
heartbeat_period = "5 secs"
task_service.bind_addr = "0.0.0.0:1234"
task_service.public_addr.hostname = "my-host-2"
shuffle_service.public_addr.hostname = "my-host-2"
shuffle_location.local.path = "/mnt/storage/polars/shuffle"
[observatory] section
| Key | Type | Description |
|---|---|---|
enabled |
boolean | Enable sending/receiving profiling data so clients can call result.await_profile().true on both scheduler and workers if you want profiles on queries; false to disable. |
max_metrics_bytes_total |
integer | How many bytes all the worker host metrics will consume in total. If a system-wide memory limit is specified then this is added to the share that the scheduler takes. Note that the worker host metrics is not yet available, so this configuration can be set to 0. |
service |
object | Object used for configuring the bind address of the observatory service. This is an internal service in the scheduler for receiving profiling data from all nodes. Defaults to 0.0.0.0:5049. |
service.bind_addr |
string | Bind address for the observatory service. e.g. 0.0.0.0:5049. |
service.bind_addr.ip |
string | IP address for the observatory service bind address. e.g. 192.168.1.1. |
service.bind_addr.port |
integer | Port for the observatory service bind address. e.g. 5049. |
service.bind_addr.hostname |
string | Alternative to ip, resolved once at startup.e.g. my-host-2. |
Example:
[observatory]
enabled = true
max_metrics_bytes_total = 0
[monitoring] section
| Key | Type | Description |
|---|---|---|
enabled |
boolean | Enable sending/receiving monitoring data to the observatory service. If enabled, it will use the address specified in observatory_service.public_addr. |
Example:
[monitoring]
enabled = true
[static_leader] section
| Key | Type | Description |
|---|---|---|
leader_instance_id |
string | ID of the leader node; should match the scheduler’s instance_id.Typically scheduler to match your scheduler node. |
scheduler_service.public_addr |
string | Address at which the scheduler client service is reachable from this node. e.g. 192.168.1.1. |
scheduler_service.public_addr.ip |
string | IP address for the scheduler client service public address. e.g. 192.168.1.1. |
scheduler_service.public_addr.port |
integer | Port for the scheduler client service public address. e.g. 5051. |
scheduler_service.public_addr.hostname |
string | Alternative to ip, resolved once at startup.e.g. my-host-2. |
observatory_service.public_addr |
string | Address at which the observatory service is reachable from this node. e.g. 192.168.1.1. |
observatory_service.public_addr.ip |
string | IP address for the observatory service public address. e.g. 192.168.1.1. |
observatory_service.public_addr.port |
integer | Port for the observatory service public address. e.g. 5049. |
observatory_service.public_addr.hostname |
string | Alternative to ip, resolved once at startup.e.g. my-host-2. |
Example:
[static_leader]
leader_instance_id = "scheduler"
observatory_service.public_addr = "127.0.0.1"
scheduler_service.public_addr = "127.0.0.1"