System Overview
Kubetorch is a system designed for fast, secure installation. It runs directly in your VPC / Kubernetes cluster(s),
and provides a unified point of entry for any and all of your Python code. End users and systems install the
kubetorch Python library and use it to interact with the remote compute. In Kubernetes, a simple helm installation
is all that is needed to unblock initial execution.

Python Client
For users, Kubetorch consists of the following key primitives, all defined in Python. These are described more in depth in Python Primitives.
- Compute: resource requirements and environment specifications on which to run application code.
- Image: environment specifications to set up on the compute at start time, including pre-built Docker images and additional setup steps.
- Function/Class: a wrapper around your Python function or class, to be synced onto your compute. Once deployed, it returns a callable object that works just as the original function or class, only that it is called remotely instead of locally.
You can string together the primitives to dispatch your function or class to your specified compute with the image setup:
def sum(a: int, b: int): return a + b if __name__ == "__main__": compute = kt.Compute(cpus=1) remote_hello = kt.fn(sum).to(compute) results = sum(1, 3) print(results) # prints 4
Kubetorch launches compute resources following the specified requirements (CPUs, GPUs, memory), executes image setup steps,
and deploys a Kubernetes service with your local code synced over. The returned object is a callable that behaves identically
to the original Python function or class, but executes remotely on the compute, with built-in logging, error handling, and
request tracing. On subsequent .to() calls, if the compute is already running, only local code changes are synced over and
setup updates are applied, enabling rapid iteration cycles on the order of seconds, without needing to rebuild Docker images
or redeploy Kubernetes resources.
You can also specify additional custom features for your workflow, such as volumes, secrets, autoscaling, and distributed. You can find more information about this in the various concepts pages.
Helm Chart
Kubetorch is installed onto your Kubernetes cluster by you or a Platform team owner via Helm. The
basic install is extremely simple, and takes about 5 minutes to install resources into the kubetorch
namespace.
While end users do not need to think about Kubernetes or write YAML, the platform is transparent to platform owners or advanced users.
Base Installation
The default Helm chart includes everything needed to run Kubetorch out of the box:
- Core Kubetorch control plane and runtime
- Integrated proxy and sync services
- Log streaming (ephemeral, in-cluster only)
- Support for distributed training, job orchestration, and Python-native deployments
Optional Add-Ons
Kubetorch can seamlessly integrate with additional components to unlock richer capabilities. These are not installed by default, so you can adapt Kubetorch to your cluster’s existing stack.
- Autoscaling: We typically recommend installing Knative to enable autoscaling (especially for inference user cases)
- Ray: Through KubeRay, Kubetorch supports the launch and usage of Ray clusters for distributed programs alongside regular embarrassing parallelism or PyTorch Distributed
See the installation guide for more info.
Advanced Features
Kubetorch also supports deeper integrations that extend the platform’s orchestration, storage, and observability layers. These are typically configured in collaboration with the Runhouse team to ensure best practices and compatibility with enterprise infrastructure.
- Queueing: Integrate with a queueing system (e.g. NVIDIA KAI Scheduler) for GPU sharing, gang scheduling, workload prioritization, and advanced queueing behavior
- Filesystem and Data Layer: Mount distributed or cloud-native persistent storage systems for shared datasets and checkpointing
- Ingress: Configure a custom ingress controller or domain routing strategy (e.g., through NGINX or Gateway API) to manage how internal and external traffic reaches Kubetorch services
- Idle Timeout (TTL): Define workload inactivity timeouts to automatically scale down or tear down idle jobs and free up compute resources
- Persistent Logs: For long term retention
For guidance on enabling or customizing these advanced capabilities, please get in touch with the Runhouse team.