Dev and Prod Workflows

Switching between research/development and production mode is straightforward with Kubetorch. Simply adjust a flag in your compute to signal whether or not to freeze the contents of the compute at runtime, leaving the set up code the same.

Production: running functions and code that are pre-installed on your Docker image
Local development: making local edits to your repo, and propagating over those local changes to override the cluster version, for development and experimentation

Imports and Environment

Import kubetorch, as well as the function that you would like to run remotely, in this case, train_fn from my_repo.

import kubetorch as kt
from my_repo import train_fn

We also define here a boolean variable prod, which indicates whether we are in production mode. Here we assume that the env var "ENVIRONMENT" will be set to "PROD" in the production environment.

import os
prod = os.environ.get("ENVIRONMENT") == "PROD"

Image Definition

Now, define the Kubetorch image, which consists of your production Docker image. In the next step, we pass this image into the Kubetorch compute, so when the service is launched, the compute will use your production Docker image.

In the simplest case, in which the only function that differs in dev to prod is the train_fn itself, we can reuse the same image. The flag introduced later will then signal whether or not to sync over the local function.

base_image = kt.Image(image_id="prod_docker_image")

If there are additional differences for the development case, such as differing local dependencies or other package versions, these would also be addressed in the image definition.

if prod:
    base_image = kt.Image(image_id="prod_docker_image")
else:
    base_image = kt.Image()
        .from_docker("prod_docker_image")
        .sync_package("path/to/local/other_package")
        .pip_install("other_package")
        .pip_install("pip_package==0.2.0")
        .set_env_vars({"var: VAR"})

Compute Definition and Flag

Next, define the compute, passing in the image defined above. Here in the Compute construction, there is a flag freeze to differentiate between a development and research setting. The freeze flag signals to freeze the state of the compute config, and not sync over any local code changes or updates when setting up the remote function.

compute = kt.Compute(cpus=".1", image=base_image, freeze=prod)

Define and Run the Function

Now that the compute is defined, creating and running the function is straightforward:

remote_train = kt.fn(train_fn).to(compute)
remote_train()

In the production case, if freeze=True is set in the compute, the compute is frozen, and will use the train_fn defined in the Docker container. In the development case, with freeze=False (the default), when we call .to(compute), the local train_fn, which differs from the one in Docker, will be synced over onto the pod to be used in the remote function call.

Summary

To summarize the above in a script that lets you easily reuse code and toggle between dev and prod:

import kubetorch as kt
from my_repo import train_fn

prod = os.environ.get("ENVIRONMENT") == "PROD"
if prod:
    base_image = kt.Image(image_id="prod_docker_image")
else:
    base_image = kt.Image(image_id="prod_docker_image")
        .sync_package("path/to/local/other_package")
        .pip_install("other_package")
        ... # any other differing setup steps to override

compute = kt.Compute(cpus=".1", image=base_image, freeze=prod)
remote_train = kt.fn(train_fn).to(compute)