Google Kubernetes Engine (GKE)
Initial configuration
This page expects that you've already set up a Polars cluster once through the Polars Cloud onboarding or the getting started guide.
Data access using Workload Identity
Through GKE Workload Identity, you can securely access private Google Cloud Storage buckets without needing to manage service account keys or credentials. In most scencarios, it comes down to enabling Workload Identity Federation, creating a Kubernetes service account, and creating an IAM policy binding. See the guide in the official GKE documentation.
helm upgrade --install polars polars-inc/polars \
--set scheduler.serviceAccount.name=<YOUR_SERVICE_ACCOUNT_NAME> \
--set worker.serviceAccount.name=<YOUR_SERVICE_ACCOUNT_NAME> \
# ...
Assuming you have a bucket already set up (see quick-start here), you can then scan or sink directly from the bucket.
path = f"gs://YOUR_BUCKET_NAME/PATH/TO/DATA/"
storage_options = {
"project": "YOUR_PROJECT_NAME",
}
q = (
pl.scan_parquet(path, storage_options=storage_options)
# ..
)
You may also use Google Cloud Storage as an anonymous results location by configuring the values as such:
anonymousResults:
gcs:
enabled: true
endpoint: "gs://YOUR_BUCKET_NAME/PATH/TO/DATA/"
options:
- name: project
value: "YOUR_PROJECT_NAME"