Kubetorch offers a rich set of commands to offer you insight into running workloads at the individual and cluster level.
For more details on the inputs, you can run kt <method> --help.
Run a comprehensive health check for a deployed service.
Checks:
Deployment pod comes up and becomes ready (if not scaled to zero)
Data store connection has succeeded
Service is marked as ready and service pod(s) are ready to serve traffic
GPU support configured (if applicable)
Log streaming configuration (if applicable)
If a step fails, will dump kubectl describe and pod logs for relevant pods.
Manage Kubetorch configuration settings.
Examples:
$ kt config set username johndoe $ kt config set volumes "volume_name_one, volume_name_two" $ kt config set volumes volume_name_one $ kt config unset username $ kt config get username $ kt config list
Start an interactive debugging session on the pod, which will connect to the debug server inside the service. Before running this command, you must call a method on the service with debug=True or add a breakpoint() call into your code to enable debugging.
Debug modes: - “pdb” (default): Standard PDB over WebSocket PTY (works over SSH and inside cluster) - “pdb-ui”: Web-based PDB UI (requires running locally)
Deploy a Python file or module to Kubetorch. This will deploy all functions and modules decorated with @kt.compute in the file or module.
Show basic info for calling the service depending on whether an ingress is configured.
List all Kubetorch resources.
Examples:
$ kt list $ kt list -t dev-branch $ kt list --pods # Show pod names
Port forward a local port to the specified Kubetorch service.
Examples:
$ kt port-forward my-service $ kt port-forward my-service 32300 $ kt port-forward my-service -n custom-namespace $ kt port-forward my-service -p my-pod
This allows you to access the service locally using curl http://localhost:<port>.
Build and deploy a kubetorch app that runs the provided CLI command. In order for the app to be deployed, the file being run must be a Python file specifying a kt.app construction at the top of the file.
Examples:
$ kt run python train.py --epochs 5 $ kt run fastapi run my_app.py --name fastapi-app
Apply a kubernetes resource from a YAML manifest file and optional Dockerfile via Kubetorch fast deployment.
Automatically injects kubetorch server into the pod, applies the manifest, syncs Dockerfile dependencies (if provided), and runs the original manifest command. Supports hot-reloading on subsequent applies.
Only runtime instructions are supported: FROM, RUN, CMD, ENTRYPOINT, ENV, COPY. Build-time instructions (ARG, WORKDIR, EXPOSE, etc.) and multiline instructions (backslash continuations) are currently not supported. If you need these features, run docker build separately and reference the built image in your manifest.
If your app starts an HTTP server, use –port to specify the port. This enables HTTP proxying through the kubetorch server at the /http endpoint.
Examples:
$ kt apply deployment.yaml $ kt apply deployment.yaml --dockerfile Dockerfile $ kt apply fastapi-deployment.yaml --port 8000 --health-check /health
Manage secrets used in Kubetorch services.
Examples:
$ kt secrets # list secrets in the default namespace $ kt secrets list -n my_namespace # list secrets in `my_namespace` namespace $ kt secrets -A # list secrets in all namespaces (note: requires cluster-wide RBAC) $ kt secrets create --provider aws # create a secret with the aws credentials in `default` namespace $ kt secrets create my_secret -v ENV_VAR_1 -v ENV_VAR_2 -n my_namespace # create a secret using env vars $ kt secrets delete my_secret -n my_namespace # delete a secret called `my_secret` from `my_namespace` namespace $ kt secrets delete aws # delete a secret called `aws` from `default` namespace
SSH into a remote service. By default, will SSH into the first running pod. For Ray clusters, prioritizes the head node.
Examples:
$ kt ssh my_service
Delete a service and all its associated resources (deployments, configmaps, etc).
Examples:
$ kt teardown my-service -y # force teardown resources corresponding to service $ kt teardown --all # teardown all resources corresponding to username $ kt teardown --prefix test # teardown resources with prefix "test"
Manage volumes used in Kubetorch services.
Examples:
$ kt volumes $ kt volumes -A $ kt volumes create my-vol $ kt volumes create my-vol -c gp3-csi -s 20Gi $ kt volumes create my-vol --pv existing-pv-name $ kt volumes delete my-vol $ kt volumes ssh my-vol
Launch a JupyterLab notebook server on a new or existing Kubetorch service. The notebook service will continue running after you exit, and you can reconnect to it until the service is torn down.
Examples:
$ kt notebook tune-hpo # Launch notebook into new or existing service with name "tune-hpo" $ kt notebook --cpus 4 --memory 8Gi # Launch with specific resources $ kt notebook --gpus 1 --cpus 8 --memory 16Gi --image nvcr.io/nvidia/pytorch:23.10-py3 # Launch with GPU and custom image $ kt notebook --gpus 1 --cpus 8 --memory 16Gi --no-restart # Don't restart kernels on reconnect
Store files or directories in the cluster using a key-value interface
Retrieve files or directories from the cluster using a key-value interface
List files and directories in the cluster store
Delete files or directories from the cluster store
List registered workloads.
Start the Kubetorch server.
Used in BYO-compute deployments where the server must be launched inside a user-provided pod. Handles remote execution and optional self-registration with the Kubetorch controller.
Examples:
$ kubetorch server start --workload my-workers --controller-url http://kubetorch-controller:8080 $ export KT_SERVICE=my-workers $ export KT_CONTROLLER_URL=http://kubetorch-controller.kubetorch.svc.cluster.local:8080 $ kubetorch server start