Kubetorch offers a rich set of commands to offer you insight into running workloads at the individual and cluster level.
For more details on the inputs, you can run kt <method> --help.
Run a comprehensive health check for a deployed service.
Checks:
Deployment pod comes up and becomes ready (if not scaled to zero)
Rsync has succeeded
Service is marked as ready and service pod(s) are ready to serve traffic
GPU support configured (if applicable)
Log streaming configuration (if applicable)
If a step fails, will dump kubectl describe and pod logs for relevant pods.
Manage Kubetorch configuration settings.
Examples:
$ kt config set username johndoe $ kt config set volumes "volume_name_one, volume_name_two" $ kt config set volumes volume_name_one $ kt config unset username $ kt config get username $ kt config list
Start an interactive debugging session on the pod, which will connect to the debug server inside the service. Before running this command, you must call a method on the service with debug=True or add a breakpoint() call into your code to enable debugging.
Debug modes: - “pdb” (default): Standard PDB over WebSocket PTY (works over SSH and inside cluster) - “pdb-ui”: Web-based PDB UI (requires running locally)
Deploy a Python file or module to Kubetorch. This will deploy all functions and modules decorated with @kt.compute in the file or module.
Show basic info for calling the service depending on whether an ingress is configured.
List all Kubetorch services.
Examples:
$ kt list $ kt list -t dev-branch
Port forward a local port to the specified Kubetorch service.
Examples:
$ kt port-forward my-service $ kt port-forward my-service 32300 $ kt port-forward my-service -n custom-namespace $ kt port-forward my-service -p my-pod
This allows you to access the service locally using curl http://localhost:<port>.
Build and deploy a kubetorch app that runs the provided CLI command. In order for the app to be deployed, the file being run must be a Python file specifying a kt.app construction at the top of the file.
Examples:
$ kt run python train.py --epochs 5 $ kt run fastapi run my_app.py --name fastapi-app
Manage secrets used in Kubetorch services.
Examples:
$ kt secrets # list secrets in the default namespace $ kt secrets list -n my_namespace # list secrets in `my_namespace` namespace $ kt secrets -A # list secrets in all namespaces $ kt secrets create --provider aws # create a secret with the aws credentials in `default` namespace $ kt secrets create my_secret -v ENV_VAR_1 -v ENV_VAR_2 -n my_namespace # create a secret using env vars $ kt secrets delete my_secret -n my_namespace # delete a secret called `my_secret` from `my_namespace` namespace $ kt secrets delete aws # delete a secret called `aws` from `default` namespace
SSH into a remote service. By default, will SSH into the first running pod. For Ray clusters, prioritizes the head node.
Examples:
$ kt ssh my_service
Delete a service and all its associated resources (deployments, configmaps, etc).
Examples:
$ kt teardown my-service -y # force teardown resources corresponding to service $ kt teardown --all # teardown all resources corresponding to username $ kt teardown --prefix test # teardown resources with prefix "test"
Manage volumes used in Kubetorch services.
Examples:
$ kt volumes $ kt volumes -A $ kt volumes create my-vol $ kt volumes create my-vol -c gp3-csi -s 20Gi $ kt volumes delete my-vol $ kt volumes ssh my-vol
Launch a JupyterLab notebook server on a new or existing Kubetorch service. The notebook service will continue running after you exit, and you can reconnect to it until the service is torn down.
Examples:
$ kt notebook tune-hpo # Launch notebook into new or existing service with name "tune-hpo" $ kt notebook --cpus 4 --memory 8Gi # Launch with specific resources $ kt notebook --gpus 1 --cpus 8 --memory 16Gi --image nvcr.io/nvidia/pytorch:23.10-py3 # Launch with GPU and custom image $ kt notebook --gpus 1 --cpus 8 --memory 16Gi --no-restart # Don't restart kernels on reconnect