Autoscaling and Distribution

Kubetorch provides comprehensive support for both autoscaling and distributed computing workflows. These features can be configured using compute.distribute(args) or compute.autoscale(args) on your compute resources.

Autoscaling

Kubetorch supports automatic scaling of your services based on concurrency or request rate using Knative Serving. Critically, this also controls auto-scaling down to zero; you might want to leave up min_scale or set a generous scale_to_zero_grace_period to ensure interactive development is not disrupted, and keeping your service warm for hot reloads. This enables efficient resource utilization and cost optimization for both interactive development and production workloads.

Configurable Features

Scaling approach: Based on the number of concurrent requests per pod, or by requests per second
Scale-to-zero grace period: Automatically scale down to zero replicas after idle periods
Resource limits: Set minimum and maximum replica counts
Target utilization: Control when to trigger scaling

Example

import kubetorch as kt

compute = kt.Compute(cpus="0.5").autoscale(
    min_scale=1,
    scale_to_zero_grace_period=30,
    concurrency=100,
)

remote_fn = kt.fn(my_function).to(compute)

Note that autoscaling requires Knative to be installed in your cluster. For more API details, refer to the API page.

Distributed

Kubetorch provides built-in support for distributed computing frameworks, through creating a single Kubernetes Deployment with multiple replicas. It handles pod-to-pod communication, environment setup, and cluster management.

Supported Frameworks

PyTorch DDP: Multi-GPU and multi-node PyTorch training with automatic process group initialization
Ray: Ray cluster management using KubeRay, including head/worker node setup, cluster lifecycle management, and code synchronization
JAX: JAX distributed computing with automatic coordinator setup and process management
TensorFlow: TensorFlow distributed training with cluster configuration and worker coordination
Generic SPMD: Framework-agnostic Single Program Multiple Data execution for custom distributed workloads

Key Features

Automatic environment setup: Framework-specific environment variables and initialization
Pod-to-pod communication: Headless services enable direct worker communication
Quorum management: Configurable timeouts for worker readiness
Process coordination: Tree-based communication topology for efficient scaling
Code synchronization: Automatic code deployment and reloading across workers
Error handling: Comprehensive exception propagation from any worker

Example

import kubetorch as kt

# PyTorch DDP with 4 workers
compute = kt.Compute(
    gpus=1,
    memory="8Gi",
    image=kt.Image(image_id="nvcr.io/nvidia/pytorch:23.10-py3")
).distribute("pytorch", workers=4)

def train_model(epochs=10):
    # PyTorch distributed setup is automatic
    torch.distributed.init_process_group(backend="nccl")
    rank = torch.distributed.get_rank()
    world_size = torch.distributed.get_world_size()
    
    # Your training logic here
    model = MyModel().cuda()
    model = DDP(model)
    # ... training loop

remote_train = kt.fn(train_model).to(compute)
results = remote_train(epochs=20)  # Returns results from all 4 workers

This distributed computing support makes Kubetorch suitable for training, large-scale data processing, and any workload that benefits from parallel execution across multiple nodes. For more API details, refer to the API page.