Compute API
Kubetorch offers significant flexibility in kubetorch.Compute()
to allow you to request the right resources for your workloads and control how that compute behaves. The best practice for how to use these arguments should be a discussion between practitioners and the platform engineering owner of the cluster.
Arguments
The arguments available to you are the following, broken into a few different categories of features:
Compute Resources
cpus
(str, optional): Number of CPU cores (e.g., "1.0" or "500m").memory
(str, optional): Memory in bytes or using binary/decimal units.disk_size
(str, optional): Ephemeral storage. Uses the same format as memory.gpus
(str, optional): Number of whole GPUs to request.runtime_class
(str, optional): Runtime class for GPU type (e.g., "nvidia-t4").gpu_type
(str, optional): Node selector for GPU type (e.g., "NVIDIA-T4"). Requires GPU product discovery plugin to be installed on the cluster.gpu_memory
(str, optional): Amount of GPU memory to request (still allocates whole GPU).
Compute Environment Setup
image
(Image, optional): Container image configuration, taking a KubetorchImage
which is composed of a base image (optional) and small changes to that (e.g. pip installs)env_vars
(Dict, optional): Environment variables to inject.secrets
(List[Union[str, Secret]], optional): Secrets to mount or expose.freeze
(bool, optional): Disallow further syncing of code or configuration on the compute, and rely only on the code/environment as defined in the Image (useful for prod).
Kubernetes Management
namespace
(str, optional): Kubernetes namespace (default is "default").kubeconfig_path
(str, optional): Path to local kubeconfig file.labels
(Dict, optional): Labels to apply to the pod.annotations
(Dict, optional): Annotations to apply to the pod.image_pull_policy
(str, optional): Kubernetes image pull policy.inactivity_ttl
(str, optional): Auto-destroy TTL after inactivity (e.g. 5m, 1h)gpu_anti_affinity
(bool, optional): Avoid GPUs when none are requested.launch_timeout
(int, optional): How long to wait for setup before failing (total of autoscaling wait time, Docker pull time, image setup)service_account_name
(str, optional): Kubernetes service account name.
Example
As an example:
import kubetorch gpus = kt.Compute( gpus=1, cpus=3, memory=12, inactivity_ttl = "24h", launch_timeout = 60, freeze = (os.environ.get("ENVIRONMENT", "DEV") == "PROD"), image=kt.Image(image_id="nvcr.io/nvidia/pytorch:23.10-py3").pip_install(["vllm"]), ).distribute("pytorch", workers=4, scale_to_zero_grace_period=60)