Dev and Prod Workflows
Switching between research/development and production mode is straightforward with Kubetorch. Simply adjust a flag in your compute to signal whether or not to freeze the contents of the compute at runtime, leaving the set up code the same.
- Production: running functions and code that are pre-installed on your Docker image
- Local development: making local edits to your repo, and propagating over those local changes to override the cluster version, for development and experimentation
Imports and Environment
Import kubetorch, as well as the function that you would like to run
remotely, in this case, train_fn
from my_repo
.
import kubetorch as kt from my_repo import train_fn
We also define here a boolean variable prod
, which indicates whether
we are in production mode. Here we assume that the env var "ENVIRONMENT"
will be set to "PROD"
in the production environment.
import os prod = os.environ.get("ENVIRONMENT") == "PROD"
Image Definition
Now, define the Kubetorch image, which consists of your production Docker image. In the next step, we pass this image into the Kubetorch compute, so when the service is launched, the compute will use your production Docker image.
In the simplest case, in which the only function that differs in
dev to prod is the train_fn
itself, we can reuse the same image.
The flag introduced later will then signal whether or not to sync
over the local function.
base_image = kt.Image(image_id="prod_docker_image")
If there are additional differences for the development case, such as differing local dependencies or other package versions, these would also be addressed in the image definition.
if prod: base_image = kt.Image(image_id="prod_docker_image") else: base_image = kt.Image() .from_docker("prod_docker_image") .sync_package("path/to/local/other_package") .pip_install("other_package") .pip_install("pip_package==0.2.0") .set_env_vars({"var: VAR"})
Compute Definition and Flag
Next, define the compute, passing in the image defined above. Here in
the Compute construction, there is a flag freeze
to differentiate
between a development and research setting. The freeze
flag signals
to freeze the state of the compute config, and not sync over any local
code changes or updates when setting up the remote function.
compute = kt.Compute(cpus=".1", image=base_image, freeze=prod)
Define and Run the Function
Now that the compute is defined, creating and running the function is straightforward:
remote_train = kt.fn(train_fn).to(compute) remote_train()
In the production case, if freeze=True
is set in the compute,
the compute is frozen, and will use the train_fn
defined in
the Docker container. In the development case, with freeze=False
(the default), when we call .to(compute)
, the local train_fn
,
which differs from the one in Docker, will be synced over onto the
pod to be used in the remote function call.
Summary
To summarize the above in a script that lets you easily reuse code and toggle between dev and prod:
import kubetorch as kt from my_repo import train_fn prod = os.environ.get("ENVIRONMENT") == "PROD" if prod: base_image = kt.Image(image_id="prod_docker_image") else: base_image = kt.Image(image_id="prod_docker_image") .sync_package("path/to/local/other_package") .pip_install("other_package") ... # any other differing setup steps to override compute = kt.Compute(cpus=".1", image=base_image, freeze=prod) remote_train = kt.fn(train_fn).to(compute)