Supporting Python Primitives
In addition to the core building blocks (Compute, Image, and Modules), Kubetorch provides supporting primitives that extend functionality for real-world ML workflows.
These primitives are not strictly required to get started, but they unlock important capabilities like persistent storage, shared caches, and secure credential management.
Volume
The kt.Volume
class enables persistent storage for your workloads, allowing data to persist beyond individual pod
lifecycles. Kubetorch automatically manages Kubernetes PersistentVolumeClaims (PVCs) while providing a simple Python
interface for storage configuration. Kubetorch also integrates with Kubernetes policy engines such as
Kyverno.
Volumes can be created and managed through the CLI for quick setup and configuration, or defined programmatically in Python for dynamic workflows.
$ kt volumes create my-data --size 50Gi # Standard volume (ReadWriteOnce) $ kt volumes list # List all Kubetorch volumes
import kubetorch as kt kt.fn(my_fn_obj).to( compute=kt.Compute( ...compute_kwargs... volumes=[ kt.Volume(name="my-data", size="5Gi"), # Standard volume (ReadWriteOnce) kt.Volume(name="shared-data", size="10Gi", storage_class="juicefs-sc-shared", access_mode="ReadWriteMany") # Shared volume (ReadWriteMany, requires JuiceFS or similar) "previously-created-volume", # Reference existing volume by name ] ) )
See the Python API or CLI docs or for more info.
Once created, you can configure your local Kubetorch config to set global defaults to automatically mount volumes
to all new services, unless explicitly overriden in the Compute
configuration.
kt config set volumes my-data
Use Cases
- Persistent Datasets: Keep large datasets or model caches mounted once and reuse them across multiple training jobs. This avoids repeated downloads and ensures consistency across services.
- UV Cache: Speed up package installation by caching dependencies globally. With a shared cache, package managers like uv or pip reuse wheels across services instead of rebuilding them at every launch, cutting cold-start times from minutes to seconds.
- Shared Model Checkpoints: Use a ReadWriteMany volume (e.g. JuiceFS) for storing model checkpoints that need to be read and updated by multiple pods in parallel. This enables distributed training and evaluation jobs to share the same artifacts without extra synchronization steps.
Secrets
The kt.Secret
class lets you define and manage secrets for your workloads. It builds on top of Kubernetes Secrets,
while providing a simple Python and CLI interface to create, load, and reuse them across services.
Secrets can reference cloud credentials, API keys, or any other sensitive data your workloads need. Once created in the cluster, they can be referenced by name in multiple workloads — just like using environment variables locally.
There are three main ways to define a secret:
- Provider credentials: Automatically pull credentials for common providers (AWS, GCP, Hugging Face, etc.) from your local environment.
- Environment Variables: Supply key–value pairs directly.
- File Paths: Point to a file containing sensitive values (e.g. service account JSON, kubeconfig).
The secret can be created in one of two ways. Once it is created on your Kubernetes cluster, you may easily reload it by name, and use it in multiple different workloads. The secrets are used just as if you were running any workload locally.
- Inside
kt.Compute
: The secret is constructed at launch time, when you callkt_fn_or_cls.to(compute)
. - Using CLI: Use the
kt secret create <args>
CLI command to create a secret based on the supplied arguments.
Once created, secrets are stored in Kubernetes and can be reused by name in future workloads.
$ kt secret create --provider aws # create from builtin provider $ kt secret create my-env-secret --from-env FOO,BAR # create from env vars $ kt secret create my-file-secret --from-file /path/to/creds.json # create from a file $ kt secrets delete my-env-secret
import kubetorch as kt # Create inline from environment variables custom_secret = kt.Secret( name="my-secret", env_vars={"API_KEY": "supersecretvalue"} )
For more details, refer to the Python API or CLI docs.