Command Line Interface

Kubetorch offers a rich set of commands to offer you insight into running workloads at the individual and cluster level. For more details on the inputs, you can run kt <method> --help.

kubetorch.cli.kt_check(name: str = <typer.models.ArgumentInfo object>, namespace: str = <typer.models.OptionInfo object>)

Run a comprehensive health check for a deployed service.

Checks:

  • Deployment pod comes up and becomes ready (if not scaled to zero)

  • Rsync has succeeded

  • Service is marked as ready and service pod(s) are ready to serve traffic

  • GPU support configured (if applicable)

  • Log streaming configuration (if applicable)

If a step fails, will dump kubectl describe and pod logs for relevant pods.

kubetorch.cli.kt_config(action: str = <typer.models.ArgumentInfo object>, key: str = <typer.models.ArgumentInfo object>, value: str = <typer.models.ArgumentInfo object>)

Manage Kubetorch configuration settings.

Examples:

$ kt config set username johndoe $ kt config set volumes "volume_name_one, volume_name_two" $ kt config set volumes volume_name_one $ kt config unset username $ kt config get username $ kt config list
kubetorch.cli.kt_debug(pod: str = <typer.models.ArgumentInfo object>, namespace: str = <typer.models.OptionInfo object>, port: int = <typer.models.OptionInfo object>, mode: str = <typer.models.OptionInfo object>, pod_ip: str = <typer.models.OptionInfo object>)

Start an interactive debugging session on the pod, which will connect to the debug server inside the service. Before running this command, you must call a method on the service with debug=True or add a breakpoint() call into your code to enable debugging.

Debug modes: - “pdb” (default): Standard PDB over WebSocket PTY (works over SSH and inside cluster) - “pdb-ui”: Web-based PDB UI (requires running locally)

kubetorch.cli.kt_deploy(target: str = <typer.models.ArgumentInfo object>)

Deploy a Python file or module to Kubetorch. This will deploy all functions and modules decorated with @kt.compute in the file or module.

kubetorch.cli.kt_describe(name: str = <typer.models.ArgumentInfo object>, namespace: str = <typer.models.OptionInfo object>)

Show basic info for calling the service depending on whether an ingress is configured.

kubetorch.cli.kt_list(namespace: str = <typer.models.OptionInfo object>, sort_by_updated: bool = <typer.models.OptionInfo object>, tag: str = <typer.models.OptionInfo object>)

List all Kubetorch services.

Examples:

$ kt list $ kt list -t dev-branch
kubetorch.cli.kt_port_forward(name: str = <typer.models.ArgumentInfo object>, local_port: int = <typer.models.ArgumentInfo object>, remote_port: int = <typer.models.ArgumentInfo object>, namespace: str = <typer.models.OptionInfo object>, pod: str = <typer.models.OptionInfo object>)

Port forward a local port to the specified Kubetorch service.

Examples:

$ kt port-forward my-service $ kt port-forward my-service 32300 $ kt port-forward my-service -n custom-namespace $ kt port-forward my-service -p my-pod

This allows you to access the service locally using curl http://localhost:<port>.

kubetorch.cli.kt_run(ctx: ~typer.models.Context, name: str = <typer.models.OptionInfo object>, run_async: bool = <typer.models.OptionInfo object>, file: int = <typer.models.OptionInfo object>)

Build and deploy a kubetorch app that runs the provided CLI command. In order for the app to be deployed, the file being run must be a Python file specifying a kt.app construction at the top of the file.

Examples:

$ kt run python train.py --epochs 5 $ kt run fastapi run my_app.py --name fastapi-app
kubetorch.cli.kt_secrets(action: ~kubetorch.cli_utils.SecretAction = <typer.models.ArgumentInfo object>, name: str = <typer.models.ArgumentInfo object>, prefix: str = <typer.models.OptionInfo object>, namespace: str = <typer.models.OptionInfo object>, all_namespaces: bool = <typer.models.OptionInfo object>, yes: bool = <typer.models.OptionInfo object>, path: str = <typer.models.OptionInfo object>, provider: str = <typer.models.OptionInfo object>, env_vars: ~typing.List[str] = <typer.models.OptionInfo object>, show_values: bool = <typer.models.OptionInfo object>)

Manage secrets used in Kubetorch services.

Examples:

$ kt secrets # list secrets in the default namespace $ kt secrets list -n my_namespace # list secrets in `my_namespace` namespace $ kt secrets -A # list secrets in all namespaces $ kt secrets create --provider aws # create a secret with the aws credentials in `default` namespace $ kt secrets create my_secret -v ENV_VAR_1 -v ENV_VAR_2 -n my_namespace # create a secret using env vars $ kt secrets delete my_secret -n my_namespace # delete a secret called `my_secret` from `my_namespace` namespace $ kt secrets delete aws # delete a secret called `aws` from `default` namespace
kubetorch.cli.kt_ssh(name: str = <typer.models.ArgumentInfo object>, namespace: str = <typer.models.OptionInfo object>, pod: str = <typer.models.OptionInfo object>)

SSH into a remote service. By default, will SSH into the first running pod. For Ray clusters, prioritizes the head node.

Examples:

$ kt ssh my_service
kubetorch.cli.kt_teardown(name: str = <typer.models.ArgumentInfo object>, yes: bool = <typer.models.OptionInfo object>, teardown_all: bool = <typer.models.OptionInfo object>, prefix: str = <typer.models.OptionInfo object>, namespace: str = <typer.models.OptionInfo object>, force: bool = <typer.models.OptionInfo object>, exact_match: bool = <typer.models.OptionInfo object>)

Delete a service and all its associated resources (deployments, configmaps, etc).

Examples:

$ kt teardown my-service -y # force teardown resources corresponding to service $ kt teardown --all # teardown all resources corresponding to username $ kt teardown --prefix test # teardown resources with prefix "test"
kubetorch.cli.kt_volumes(action: ~kubetorch.cli_utils.VolumeAction = <typer.models.ArgumentInfo object>, name: str = <typer.models.ArgumentInfo object>, storage_class: str = <typer.models.OptionInfo object>, mount_path: str = <typer.models.OptionInfo object>, size: str = <typer.models.OptionInfo object>, access_mode: str = <typer.models.OptionInfo object>, namespace: str = <typer.models.OptionInfo object>, all_namespaces: bool = <typer.models.OptionInfo object>)

Manage volumes used in Kubetorch services.

Examples:

$ kt volumes $ kt volumes -A $ kt volumes create my-vol $ kt volumes create my-vol -c gp3-csi -s 20Gi $ kt volumes delete my-vol $ kt volumes ssh my-vol
kubetorch.cli.kt_notebook(name: str = <typer.models.ArgumentInfo object>, cpus: str = <typer.models.OptionInfo object>, memory: str = <typer.models.OptionInfo object>, gpus: str = <typer.models.OptionInfo object>, image: str = <typer.models.OptionInfo object>, namespace: str = <typer.models.OptionInfo object>, local_port: int = <typer.models.OptionInfo object>, inactivity_ttl: str = <typer.models.OptionInfo object>, restart_kernels: bool = <typer.models.OptionInfo object>)

Launch a JupyterLab notebook server on a new or existing Kubetorch service. The notebook service will continue running after you exit, and you can reconnect to it until the service is torn down.

Examples:

$ kt notebook tune-hpo # Launch notebook into new or existing service with name "tune-hpo" $ kt notebook --cpus 4 --memory 8Gi # Launch with specific resources $ kt notebook --gpus 1 --cpus 8 --memory 16Gi --image nvcr.io/nvidia/pytorch:23.10-py3 # Launch with GPU and custom image $ kt notebook --gpus 1 --cpus 8 --memory 16Gi --no-restart # Don't restart kernels on reconnect