Core Python Primitives
Compute
The kt.Compute
class allows you to define the resources and environment needed for your workloads,
while controlling how the compute is managed and scaled based on demand. This includes specifying
hardware requirements that can be either generic or tailored to your specific Kubernetes infrastructure and setup.
-
Resource Requirements: Specify any resource requests that your infrastructure supports, including the number of CPUs/GPUs, specific GPU types, memory, or disk size.
-
Base Environment: Highly customizable runtime dependencies through the
Image
class. Use a pre-built image or customize it at launch time with additional installations, environment variables, and setup commands. -
Distribution & Scaling: Support for distributed computing patterns, including PyTorch and Ray distribution types, autoscaling configurations with replica and concurrency controls, and resource management that can automatically scale down or tear down idle services to optimize cluster utilization.
-
Kubernetes Configuration: Fine-grained control over Kubernetes-specific configurations such as namespace, labels/annotations, secrets, service accounts, and other RBAC features.
-
Production Controls: Freeze settings to prevent code syncs and updates for stable production deployments, ensuring consistent behavior across environments.
Example:
gpus = kt.Compute( gpus=1, cpus=4, memory="12Gi", image=kt.Image(image_id="nvcr.io/nvidia/pytorch:23.10-py3").pip_install(["transformers"]), ).distribute("pytorch", workers=4)
Image
The kt.Image
class enables you to define and customize the containerized environment for your workloads.
You can specify a pre-built Docker image as your foundation and layer on additional setup steps that run
at launch time, eliminating the need to rebuild images for every code change.
These additional steps are as follows:
- pip_install: Run pip install for the given packages.
- set_env_vars: Set environment variables.
- sync_package: Sync over a locally installed package, and add it to the PATH.
- rsync: Rsync over local files or folders.
- run_bash: Run the specified commands.
The Image object is passed into the Compute object to define the containerized environment. When you launch your ML workflow, the pre-built image acts as the base image of the underlying Knative service. Afterwards, we run the additional setup steps prior to running your application code. The image can be updated and propagated through at any time for future deployments; we detect and run any differing setup steps in the order of seconds, without needing to recompile or rebuild an image.
Example:
kt.Image(image_id="nvcr.io/nvidia/pytorch:23.10-py3") \ .pip_install("transformers") \ .set_env_vars({"HF_TOKEN": os.environ["HF_TOKEN"]}) \ .rsync("/data_folder")
Modules
The kt.fn
and kt.cls
classes are wrappers around your locally defined Python function or class. Once wrapped,
these objects can then be sent .to(compute)
, which will launch a Knative service (taking into account the compute
requirements) and sync over the necessary files to be able to run the function remotely. Once a service is launched,
you can continue to update your Python code locally and redeploy to your compute with .to
, which will re-sync over
updates in seconds and return a new function/class that is immediately ready to be used.
The returned object is a callable that functions just like the original Python method, only that it runs remotely on your infrastructure, rather than locally. Because the same code runs locally as remotely, we ensure reproducibility and bridge the gap between research and production gaps.
When you call the object, it makes an HTTP call to the deployed service. Log streaming, flexible error handling, and observability are all built into the system. Debugging is made simple with the fast iteration loops and ability to ssh and directly interact with your compute.
Example:
def sum(a: int, b: int): return a + b if __name__ == "__main__": compute = kt.Compute(cpus=1) remote_hello = kt.fn(sum).to(compute) results = sum(1, 3) print(results) # prints 4
Deployment Modes
Kubetorch supports three deployment modes for modules.
(1) Deployment: Traditional Kubernetes deployments provide reliable, long-running services with built-in health checks and restart policies. This mode is ideal for production workloads that need consistent availability and can handle multiple replicas for load distribution. Deployments are perfect for stateless services, APIs, and background processing tasks that require guaranteed uptime.
(2) Ray Cluster: Distributed computing mode using Ray for parallel and distributed workloads. RayCluster is optimized for ML workloads that benefit from parallel execution across multiple nodes.
(3) Knative Service: Serverless deployment mode that can automatically scale from zero to handle varying traffic loads. Knative services are ideal for intermittent workloads, development environments, and applications with unpredictable usage patterns.
The deployment mode is automatically selected based on your compute configuration, or can be explicitly controlled
through the Compute
class parameters such as autoscaling settings and distribution configuration.