Compute API

Kubetorch offers significant flexibility in kubetorch.Compute() to allow you to request the right resources for your workloads and control how that compute behaves. The best practice for how to use these arguments should be a discussion between practitioners and the platform engineering owner of the cluster.

Arguments

The arguments available to you are the following, broken into a few different categories of features:

Compute Resources

  • cpus (str, optional): Number of CPU cores (e.g., "1.0" or "500m").
  • memory (str, optional): Memory in bytes or using binary/decimal units.
  • disk_size (str, optional): Ephemeral storage. Uses the same format as memory.
  • gpus (str, optional): Number of whole GPUs to request.
  • runtime_class (str, optional): Runtime class for GPU type (e.g., "nvidia-t4").
  • gpu_type (str, optional): Node selector for GPU type (e.g., "NVIDIA-T4"). Requires GPU product discovery plugin to be installed on the cluster.
  • gpu_memory (str, optional): Amount of GPU memory to request (still allocates whole GPU).

Compute Environment Setup

  • image (Image, optional): Container image configuration, taking a Kubetorch Image which is composed of a base image (optional) and small changes to that (e.g. pip installs)
  • env_vars (Dict, optional): Environment variables to inject.
  • secrets (List[Union[str, Secret]], optional): Secrets to mount or expose.
  • freeze (bool, optional): Disallow further syncing of code or configuration on the compute, and rely only on the code/environment as defined in the Image (useful for prod).

Kubernetes Management

  • namespace (str, optional): Kubernetes namespace (default is "default").
  • kubeconfig_path (str, optional): Path to local kubeconfig file.
  • labels (Dict, optional): Labels to apply to the pod.
  • annotations (Dict, optional): Annotations to apply to the pod.
  • image_pull_policy (str, optional): Kubernetes image pull policy.
  • inactivity_ttl (str, optional): Auto-destroy TTL after inactivity (e.g. 5m, 1h)
  • gpu_anti_affinity (bool, optional): Avoid GPUs when none are requested.
  • launch_timeout (int, optional): How long to wait for setup before failing (total of autoscaling wait time, Docker pull time, image setup)
  • service_account_name (str, optional): Kubernetes service account name.

Example

As an example:

import kubetorch gpus = kt.Compute( gpus=1, cpus=3, memory=12, inactivity_ttl = "24h", launch_timeout = 60, freeze = (os.environ.get("ENVIRONMENT", "DEV") == "PROD"), image=kt.Image(image_id="nvcr.io/nvidia/pytorch:23.10-py3").pip_install(["vllm"]), ).distribute("pytorch", workers=4, scale_to_zero_grace_period=60)