System Overview

Kubetorch is a system designed for fast, secure installation. It runs directly in your VPC / Kubernetes cluster(s), and provides a unified point of entry for any and all of your Python code. End users and systems install the kubetorch Python library and use it to interact with the remote compute. In Kubernetes, a simple helm installation is all that is needed to unblock initial execution.

High Level Architecture

Python Client

For users, Kubetorch consists of the following key primitives, all defined in Python. These are described more in depth in Python Primitives.

Compute: resource requirements and environment specifications on which to run application code.
Image: environment specifications to set up on the compute at start time, including pre-built Docker images and additional setup steps.
Function/Class: a wrapper around your Python function or class, to be synced onto your compute. Once deployed, it returns a callable object that works just as the original function or class, only that it is called remotely instead of locally.

You can string together the primitives to dispatch your function or class to your specified compute with the image setup:

def sum(a: int, b: int):
    return a + b

if __name__ == "__main__":
    compute = kt.Compute(cpus=1)

    remote_hello = kt.fn(sum).to(compute)
    results = sum(1, 3)

    print(results)  # prints 4

Kubetorch launches compute resources following the specified requirements (CPUs, GPUs, memory), executes image setup steps, and deploys a Kubernetes service with your local code synced over. The returned object is a callable that behaves identically to the original Python function or class, but executes remotely on the compute, with built-in logging, error handling, and request tracing. On subsequent .to() calls, if the compute is already running, only local code changes are synced over and setup updates are applied, enabling rapid iteration cycles on the order of seconds, without needing to rebuild Docker images or redeploy Kubernetes resources.

You can also specify additional custom features for your workflow, such as volumes, secrets, autoscaling, and distributed. You can find more information about this in the various concepts pages.

Helm Chart

Kubetorch is installed onto your Kubernetes cluster by you or a Platform team owner via Helm. The basic install is extremely simple, and takes about 5 minutes to install resources into the kubetorch namespace.

While end users do not need to think about Kubernetes or write YAML, the platform is transparent to platform owners or advanced users.

Base Installation

The default Helm chart includes everything needed to run Kubetorch out of the box:

Core Kubetorch control plane and runtime
Integrated proxy and sync services
Log streaming (ephemeral, in-cluster only)
Support for distributed training, job orchestration, and Python-native deployments

Optional Add-Ons

Kubetorch can seamlessly integrate with additional components to unlock richer capabilities. These are not installed by default, so you can adapt Kubetorch to your cluster’s existing stack.

Autoscaling: We typically recommend installing Knative to enable autoscaling (especially for inference user cases)
Ray: Through KubeRay, Kubetorch supports the launch and usage of Ray clusters for distributed programs alongside regular embarrassing parallelism or PyTorch Distributed

See the installation guide for more info.

Advanced Features

Kubetorch also supports deeper integrations that extend the platform’s orchestration, storage, and observability layers. These are typically configured in collaboration with the Runhouse team to ensure best practices and compatibility with enterprise infrastructure.

Queueing: Integrate with a queueing system (e.g. NVIDIA KAI Scheduler) for GPU sharing, gang scheduling, workload prioritization, and advanced queueing behavior
Filesystem and Data Layer: Mount distributed or cloud-native persistent storage systems for shared datasets and checkpointing
Ingress: Configure a custom ingress controller or domain routing strategy (e.g., through NGINX or Gateway API) to manage how internal and external traffic reaches Kubetorch services
Idle Timeout (TTL): Define workload inactivity timeouts to automatically scale down or tear down idle jobs and free up compute resources
Persistent Logs: For long term retention

For guidance on enabling or customizing these advanced capabilities, please get in touch with the Runhouse team.