๐โโ๏ธ Runhouse ๐
Blog
Examples
Docs
About
Kubetorch Examples
Hello, World
Training: PyTorch DDP
Inference: vLLM
Training
MNIST Torchvision
Automated Re-Training (Airflow)
Supervised Fine Tuning (Llama3)
Ray (Tune - HPO)
Ray (Train, Data - DLRM)
Lightning (ImageNet)
TensorFlow
XGBoost on GPU
Pytorch DDP (Resnet)
Fault Tolerance
Training Pod Preemption Recovery
Find Batch Size
Fail to Larger Compute
Reinforcement Learning
Basic GRPO with Kubetorch
Async GRPO
TRL with a Code Sandbox
VERL Training
Lauch Code Sandboxes
Inference
DeepSeek - vLLM
OpenAI OSS - Transformers
Triton Inference Server
Batch Embeddings
RAG App (Composite AI System)
Examples Archive
A full list of current and past Runhouse examples:
ResNet Lightning Training
LangChain RAG App on AWS EC2
Use Code Sandboxes
Using Airflow with Runhouse for Model Training
Embarrassingly Parallel GPU Jobs - Batch Embeddings
Using Argo with Runhouse for Model Training
RL with TRL and SWE-ReX for Code Sandboxes
Inference: Llama3 with vLLM
Accelerating XGBoost with a GPU
Retry on Larger Compute after OOM
Dask Distributed Processing and LightGBM Training
TensorFlow Multi-Node Distributed Training
Ray Hyperparameter Optimization
Llama 2 7B Model with TGI on AWS EC2
Training: PyTorch Multi-Node Distributed
Llama 2 Fine Tuning with LoRA on AWS EC2
Stable Diffusion XL 1.0 on AWS EC2
Triton Embedding Inference
Using Kubeflow with Runhouse for Model Training
Launching a Single Training Pipeline Multi-Cloud with Airflow
RAG App with Vector Embedding and LLM Generation
Train ModernBERT with Medical Knowledge
PyTorch MNIST Training on Airflow with Kubetorch
Llama 3 8B Chat Model Inference on AWS EC2
Automatically Find Batch Size for PyTorch Distributed
Ray DLRM Training
Embedding Batch Inference
Llama 2 7B Model on AWS Inferentia
DeepSeek R1 Inference with vLLM (Llama70B Distill)
Mistral 7B Model with TGI on AWS EC2
Stable Diffusion XL 1.0 on AWS Inferentia
Llama 2 Chat Model Inference on AWS EC2
Mistral 7B Model on AWS Inferentia
OpenAI GPT OSS-120B Inference
Run Llama 3 8B with vLLM on GCP
Launch Reinforcement Learning (RL) with verl
Train a Torch Model for Image Classification
Fine-Tune Llama 3 with LoRA on AWS EC2
Pod Preemption Recovery in Distributed Training
Simple Synchronous GRPO Training
Pytorch DDP (Resnet)
Simple Async GRPO