Kubetorch Examples

Kubetorch is the easiest way to execute ML workloads on Kubernetes at any scale. In the examples, you will see a range of ML applications from training to inference, hyperparameter optimization, and batch data processing. We have many other examples, just send us a ping if you'd like to see anything specific!

Simply write regular, undecorated Python programs, define the compute resources and environment you need, and dispatch them to run on your remote cluster with .to() or with a decorator and running kubetorch deploy.

Kubetorch is a generational improvement on existing systems, including Kubeflow or custom CD applications.

  • Platform engineers who like Kubernetes rely on standard observability, auth, quota management, and logging.
  • ML/AI engineers and researchers who prefer Python work in Python as if local, from defining compute requirements to deploying to logs streaming back.
  • In development, iteration loops are ~2 seconds as code changes are hot-synced to Kubernetes, eliminating the slow 20-30 minute delays from constant Docker image rebuilding.
  • In production, dispatch and execution happen identically for perfectly reproducible execution across research and production (and back to research).
  • Kubetorch is completely unopinionated. You can adopt any distributed framework (Ray, Spark, PyTorch Distributed, Dask, etc), use any orchestrator, use any model registry, and add any cloud.

Installation

Kubetorch is deployed onto your own Kubernetes clusters via Helm chart, and any end users (or systems) with a kubeconfig can use the Kubetorch Python client deploy workloads to Kubernetes. If you do not use Kubernetes today, we have Terraform examples that provide reasonable defaults.

We are currently under a private beta. If you are interested in trying it out, shoot us a quick note at support@run.house, and we will share the required deployment resources with you.