Installation Guide

This guide will help you get to a working setup with Kubetorch using its base default settings. You will:

  • Install the Python client with pip or uv
  • Download and helm install the Kubetorch chart on your cluster
  • Optionally enable features like autoscaling or Ray with additional installs.

After that, we recommend running our hello world to ensure everything is working. For advanced configuration and other cloud-specific options, please contact the Kubetorch team at hello@run.house.

Python Client Installation

Kubetorch provides a Python client for interacting with your cluster, and should be installed for both local development and within your Docker images. You can install it either with pip or with uv, which offers faster resolution and reproducible lockfiles.

Installing with pip

This works anywhere Python is available and is the simplest option if you just need to get started quickly.

pip install "kubetorch[client]"

Installing with uv

uv pip install "kubetorch[client]"

Note

If you are running Kubetorch from a Mac, you should update rsync with: brew install rsync.

Mac devices ship with an older version of rsync that is missing modern features required by Kubetorch for code and data syncing.

Kubernetes Installation

You can install Kubetorch on an existing cluster or with a new one.

Kubetorch Helm charts are hosted publicly on GitHub Container Registry (GHCR), so you can pull or install them directly โ€” no authentication or token required.

helm registry login ghcr.io --username run-house --password-stdin

Install Kubetorch

You can install Kubetorch in several ways:

Option 1: Pull the chart locally

Download and extract the chart:

helm pull oci://ghcr.io/run-house/charts/kubetorch --version <VERSION> --untar

This creates a local directory named kubetorch. Update values.yaml if needed, then install:

helm upgrade --install kubetorch ./kubetorch -n kubetorch --create-namespace

Option 2: Install from OCI

Skip downloading and install directly from OCI:

helm upgrade --install kubetorch oci://ghcr.io/run-house/charts/kubetorch \ --version <VERSION> -n kubetorch --create-namespace

Option 3: Install with Helmfile

If you prefer Helmfile, define the release in helmfile.yaml:

releases: - name: kubetorch namespace: kubetorch chart: oci://ghcr.io/run-house/charts/kubetorch version: <VERSION> values: - ./values.yaml # Adjust the path as needed

Then sync your releases:

helmfile sync

Autoscaling Kubetorch services requires Knative to be present on your cluster. You may skip this step if you are not planning to use autoscaling.

If Knative isnโ€™t already installed, you can add the Operator by running:

helm repo add knative-operator https://knative.github.io/operator helm repo update helm install knative-operator --create-namespace --namespace knative-operator knative-operator/knative-operator

Note

If your Kubernetes cluster is version < 1.31.0, install Knative Operator < 1.18.0 with the --version flag

Next, we'll create a KnativeServing custom resource that configures and enables Knative Serving in the knative-serving namespace by applying the YAML in the Helm chart:

kubectl create namespace knative-serving kubectl apply -f ./kubetorch/knative/serving.yaml

Install Ray (Optional)

Kubetorch supports Ray out of the box. To enable Ray, install the KubeRay Operator by running the following commands:

helm repo add kuberay https://ray-project.github.io/kuberay-helm/ helm repo update # Install both CRDs and KubeRay operator v1.4.0. helm install kuberay-operator kuberay/kuberay-operator --version 1.4.0

For more information on installation and usage, see the KubeRay Operator documentation.

New Kubernetes Cluster

If you want to create a new Kubernetes cluster with Kubetorch installed, please use the Terraform script provided to you by the Kubetorch team.

This script will:

  • Create a new Kubernetes cluster
  • Install the Kubetorch Helm chart
  • Set up all necessary dependencies (including log streaming)

Additional Configuration

The following sections are optional and generally not necessary for a minimal working setup.

DNS Resolver

By default, Kubetorch will use the kube-dns resolver, which is the EKS/GKE default. If your cluster is using a different DNS resolver (like coredns), you can use the resolver field in the nginx section of the values.yaml file to point to your DNS resolver service:

nginx: resolver: "coredns.kube-system.svc.cluster.local"

Or if running with Helm directly:

helm upgrade --install kubetorch oci://ghcr.io/run-house/charts/kubetorch \ --version <version> -n kubetorch --create-namespace \ --set nginx.resolver="coredns.kube-system.svc.cluster.local"

Code & Data Sync

Kubetorch provides a built-in mechanism for syncing your local code and data into the cluster. This sync service is deployed automatically with the Kubetorch stack and is required for running workloads.

You can configure concurrency limits, timeouts, and resource allocations (CPU, memory, ephemeral storage) to match your workload needs in the values.yaml of the Helm chart:

rsync: image: ghcr.io/run-house/kubetorch-rsync:v5 maxConnections: 500 # Maximum concurrent rsync connections (increase for many worker pods) timeout: 600 # Connection timeout in seconds maxVerbosity: 0 # Log verbosity (0-4, use 0 for production, higher for debugging) maxConnectionsPerModule: 0 # Per-module limit (0 = unlimited, inherits global limit) cpu: request: 2 limit: 4 memory: request: 4Gi limit: 8Gi ephemeralStorage: # adjust based on expected node disk size request: xxGi # <โ€” update this based on your expected workload size limit: xxGi # <โ€” typically 2โ€“3ร— the request cleanupCron: enabled: false # set to true to enable pod cleanup

By default, the sync service uses ephemeral storage, meaning files will not persist if the pod restarts. If you need persistence across restarts, you can attach a PersistentVolumeClaim (e.g. JuiceFS, EBS).