Kubetorch Examples

Kubetorch streamlines machine learning workloads on Kubernetes by eliminating the traditional barriers between research and production. It provides a unified Python-first interface that scales seamlessly from local development to production clusters.

Our examples demonstrate how Kubetorch can be used in the ML development lifecycle across different use cases. If you'd like to see specific examples not covered here, feel free to send us a ping.

Training

In the research and development phase, Kubetorch enables fast, <2 second iteration loops for your code updates, at any scale. Even if you aren't working on extremely large models, the ability to scale to multi-node is extremely valuable for speeding up training via data parallelism or parallelized hyper-parameter optimization.

There is also no gap between research and reproducible production training. The code that you ran locally slots in as-is into CI or orchestrators (or whatever "production" is), compared to a multi-week process translating a research notebook into Airflow and Docker.

Fault-Tolerance

Kubetorch gives you direct programmatic control over compute and execution, and preserves the execution environment in the face of a fault, making it easy for you to create control flows that overcome common errors like node preemption and CUDA OOM errors. This eliminates the manual intervention and over-provisioning typical in traditional approaches.

Inference / Batch Processing

Kubetorch enables a range of online and offline inference mechanisms, with a simple-to-use API for Pythonic deployment that includes features like autoscaling and scale-to-zero built in. As with training, Kubetorch provides 2-second iteration cycles for inference services, replacing the 15-30 minute redeploy cycles of YAML-based approaches. For composite applications like RAG, teams can independently and quickly iterate on each component using identical-to-production services.

Reinforcement Learning

RL workloads require heterogeneous compute and images for training, inference, and evaluation components. Existing frameworks struggle with compute heterogeneity (Slurm) or image heterogeneity (Ray). Kubetorch lets you define component-specific resource allocation (image, compute, distribution type) and deploy them to Kubernetes, and can be directly orchestrated (asynchronously) from a single driver.

Installation

We are currently under a private beta. If you are interested in trying it out, shoot us a quick note at support@run.house, and we will share the required deployment resources with you.