Skip to content

Google Kubernetes Engine (GKE)

Initial configuration

This page expects that you've already set up a Polars cluster once through the Polars Cloud onboarding or the getting started guide.

Data access using Workload Identity

Through GKE Workload Identity, you can securely access private Google Cloud Storage (GCS) buckets without needing to manage service account keys or credentials. In most scenarios, it comes down to enabling Workload Identity Federation, creating a Kubernetes service account, and creating an IAM policy binding. See the guide in the official GKE documentation.

helm upgrade --install polars polars-inc/polars \
  --set scheduler.serviceAccount.name=<YOUR_SERVICE_ACCOUNT_NAME> \
  --set worker.serviceAccount.name=<YOUR_SERVICE_ACCOUNT_NAME> \
# ...

Assuming you have a bucket already set up (see quick-start here), you can then scan or sink directly from the bucket.

path = f"gs://YOUR_BUCKET_NAME/PATH/TO/DATA/"
storage_options = {
    "project": "YOUR_PROJECT_NAME",
}
q = (
    pl.scan_parquet(path, storage_options=storage_options)
# ...
)

You may also use Google Cloud Storage as an anonymous results location by configuring the values as such:

anonymousResults:
  gcs:
    enabled: true
    endpoint: "gs://YOUR_BUCKET_NAME/PATH/TO/DATA/"
    options:
    - name: project
      value: "YOUR_PROJECT_NAME"

Accessing GCS from private nodes

In clusters where nodes have no external IP addresses, GCS is unreachable without Cloud NAT; if the latter is deployed, GCS traffic is billed according to the volume of data processed. Enabling Private Google Access (PGA) on your GKE node subnet gives internally-addressed nodes a direct path to Google APIs (including GCS) and requires no changes to your Polars deployment. In case both Cloud NAT and PGA are deployed the latter takes precedence for Google APIs and no costs are incurred.

Note that the GCS bucket must be in the same region as the GKE cluster (cross-region traffic incurs charges regardless of whether Private Google Access is enabled).

See the related guide in the official GCP documentation.