Generated abstract image of a runner and house on a blue background

Getting past the AI hackathon cold start problem — Remote cluster auto-setup using Runhouse

Reproducible, cloud-agnostic GPU setup and remote code deployment using Runhouse, a walkthrough of Hugging Face’s Keras Dreambooth Event

Caroline Chen

Founding Engineer @ 🏃‍♀️Runhouse🏠

April 11, 2023

Runhouse Installation and Setup
Hardware Auto-Setup
Dataset Setup on Remote Cluster
Dreambooth Training
Perform Model Inference
Resources & More Information

AI Hackathons are a great way to get involved with the AI community and try out new technologies and tools for custom use cases; oftentimes, these events even partner with cloud providers to give participants free compute! However, as eager as we all are to dive straight into the ML code to begin model training and producing results, there’s always a messy, time consuming hurdle we need to fit get past — getting our remote dev environment set up.

Just last month, I participated in a Hugging Face community event, for fine-tuning Stable Diffusion (a text-to-image model) using Keras Dreambooth, where some free LambdaLabs compute was supplied. While we were provided with some very comprehensive starter code for actually running DreamBooth, there was quite a long list of instructions for getting set up with the remote LambdaLabs cloud compute before we could actually run any code! We needed to become familiar with cloud-provider specific UI (which varies across hackathons) to launch and SSH into instances, manually install requirements and set environment variables, and also shuttle data and code between local and remote hardware.

Runhouse facilitates this process by providing a local, cloud-agnostic Python interface for launching and setting up remote hardware and data, in a dependable and reproducible way. Runhouse helps to bridge the gap between local and remote code development in ML so that you never need to manually SSH into a cluster, or type the same setup code twice.

More concretely, Runhouse lets you

Automate reproducible hardware setup (cloud or on-prem instance), all in local Python code, without ever needing to launch an instance through cloud UI or SSH into the instance
Seamlessly sync local data and code to/from remote clusters or storage. For example, access local training images on a remote cluster for preprocessing.
Run locally defined functions on remote clusters. Plus, if you have a Runhouse account (free!), save and reuse these functions across local and remote environments without needing to redefine them.

To see how Runhouse achieves this, let’s jump into some sample code from the Hugging Face Keras DreamBooth event, and see how in just a few extra lines of local code, Runhouse lets you consistently and automatically set up and deploy code to your remote compute environment. The full example notebook can be found on my Github.

Runhouse Installation and Setup

To install Runhouse:

pip install runhouse

To sync cloud credentials to be able to launch AWS, GCP, Azure, or LambdaLabs clusters, run sky check and follow the instructions. No additional setup is necessary if using an on-prem or local cluster.

Hardware Auto-Setup

Runhouse lets you launch remote cloud instances and dependable configure reproducible environment dependencies and variables, all through local Python code. No more manually launching clusters from cloud UI, or typing “erase data on instance” to terminate your instance.

To spin up an on-demand cluster and create a local reference to the cluster:

import runhouse as rh

gpu = rh.cluster(name="rh-100", instance_type="A100:1", provider="lambda")
gpu.up_if_not()

Now that the cluster is up and running, setting up the environment is as easy as gpu.run(list_of_cli_commands). Runhouse uses gRPC to run this on your cluster, so that you don’t need to SSH into the machine yourself.

command = "conda install -y -c conda-forge cudatoolkit=11.2.2 cudnn=8.1.0; \
          mkdir -p $CONDA_PREFIX/etc/conda/activate.d; \
          echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh; \
          python3 -m pip install tensorflow"
gpu.run([command])
gpu.restart_grpc_server()  # restart server to use env variables set above

Dataset Setup on Remote Cluster

Now, to use our local dataset of images for fine-tuning on the cluster, we can use Runhouse to shuttle over these images to the remote cluster, using rh.folder(folder_on_local).to(gpu, folder_on_gpu)

# sync images to the cluster using Runhouse
rh.folder(path=instance_images_root).to(system=gpu, path=instance_images_root)
rh.folder(path=class_images_root).to(system=gpu, path=class_images_root)

Now that the images are on the cluster, we’d like to assemble them on the cluster as well. To do so, take our local function assemble_dataset, wrap it in a Runhouse function, and send it to our GPU. The new function is called just as we would the original function, but the magic is that it runs on our remote hardware, rather than our local environment.

assemble_dataset_gpu = rh.function(fn=assemble_dataset).to(system=gpu)

save_data_path = '~/.keras/datasets/train_dataset'
train_dataset_path = assemble_dataset_gpu(new_instance_image_paths, class_image_paths, embedded_text, save_data_path)

In addition to folders, Runhouse also lets you sync tables and blob storage, between file storage (like S3) and local or remote hardware.

Dreambooth Training

Similarly, we move our Dreambooth training code into a train_dreambooth function, so that we can wrap it as a Runhouse function to send to our cluster. As before, we use the Runhouse function as if it were a local function, but all the training happens on our remote GPU.

def train_dreambooth(resolution, max_prompt_length, use_mp, optimizer_params, train_dataset_path, ckpt_path):
  # code to train dreambooth

gpu.run(['cp /usr/lib/cuda/nvvm/libdevice/libdevice.10.bc .'])  # set up libdevice.10.bc to be discoverable by tensorflow
train_dreambooth_gpu = rh.function(fn=train_dreambooth, system=gpu)
ckpt_path_gpu = train_dreambooth_gpu(resolution, max_prompt_length, use_mp, optimizer_params, train_dataset_path, ckpt_path)

Perform Model Inference

Below, we define an inference function run_inference, and show how we can perform inference remotely as well, while returning the results back to the local side for us to visualize.

def run_inference(model_path, resolution, max_prompt_length, prompt, num_imgs=3):
   import keras_cv
   from keras_cv.models.stable_diffusion.diffusion_model import DiffusionModel

   # Load in and instantiate model
   sd_dreambooth_model = keras_cv.models.StableDiffusion(img_width=resolution, img_height=resolution, jit_compile=True)
   sd_diffuser_model = DiffusionModel(resolution, resolution, max_prompt_length)
   sd_diffuser_model.load_weights(model_path)
   sd_dreambooth_model._diffusion_model = sd_diffuser_model

   # Run inference
   generated_img = sd_dreambooth_model.text_to_image(prompt, batch_size=num_imgs)
   return generated_img


prompt = f"A photo of two {unique_id} {class_label}"
run_inference_gpu = rh.function(run_inference).to(system=gpu)
inference_images = run_inference_gpu(ckpt_path_gpu, resolution, max_prompt_length, prompt)

While the inference is run remotely, the resulting inference_images is a local variable which we can use to view the results locally! (Alternatively, if you prefer to leave the results on your cluster, you can use .remote() to get an object ref to the images on the cluster).

Terminating the Instance

To terminate the instance once you’re done, simply run gpu.teardown() in Python, or sky down gpu in CLI.

Resources & More Information

The full tutorial notebook running the Keras Dreambooth code from the event can be found here. For more examples of using Runhouse for Hugging Face autosetup, check out our HF Transformers and Accelerate integrations.