Autoscaling and Distribution

Kubetorch provides comprehensive support for both autoscaling and distributed computing workflows. These features can be configured using compute.distribute(args) or compute.autoscale(args) on your compute resources.

Autoscaling

Kubetorch supports automatic scaling of your services based on concurrency or request rate using Knative Serving. Critically, this also controls auto-scaling down to zero; you might want to leave up min_scale or set a generous scale_to_zero_grace_period to ensure interactive development is not disrupted, and keeping your service warm for hot reloads. This enables efficient resource utilization and cost optimization for both interactive development and production workloads.

Configurable Features

  • Scaling approach: Based on the number of concurrent requests per pod, or by requests per second
  • Scale-to-zero grace period: Automatically scale down to zero replicas after idle periods
  • Resource limits: Set minimum and maximum replica counts
  • Target utilization: Control when to trigger scaling

Example

import kubetorch as kt compute = kt.Compute(cpus="0.5").autoscale( min_scale=1, scale_to_zero_grace_period=30, concurrency=100, ) remote_fn = kt.fn(my_function).to(compute)

Note that autoscaling requires Knative to be installed in your cluster. For more API details, refer to the API page.

Distributed

Kubetorch provides built-in support for distributed computing frameworks, through creating a single Kubernetes Deployment with multiple replicas. It handles pod-to-pod communication, environment setup, and cluster management.

Supported Frameworks

  • PyTorch DDP: Multi-GPU and multi-node PyTorch training with automatic process group initialization
  • Ray: Ray cluster management using KubeRay, including head/worker node setup, cluster lifecycle management, and code synchronization
  • JAX: JAX distributed computing with automatic coordinator setup and process management
  • TensorFlow: TensorFlow distributed training with cluster configuration and worker coordination
  • Generic SPMD: Framework-agnostic Single Program Multiple Data execution for custom distributed workloads

Key Features

  • Automatic environment setup: Framework-specific environment variables and initialization
  • Pod-to-pod communication: Headless services enable direct worker communication
  • Quorum management: Configurable timeouts for worker readiness
  • Process coordination: Tree-based communication topology for efficient scaling
  • Code synchronization: Automatic code deployment and reloading across workers
  • Error handling: Comprehensive exception propagation from any worker

Example

import kubetorch as kt # PyTorch DDP with 4 workers compute = kt.Compute( gpus=1, memory="8Gi", image=kt.Image(image_id="nvcr.io/nvidia/pytorch:23.10-py3") ).distribute("pytorch", workers=4) def train_model(epochs=10): # PyTorch distributed setup is automatic torch.distributed.init_process_group(backend="nccl") rank = torch.distributed.get_rank() world_size = torch.distributed.get_world_size() # Your training logic here model = MyModel().cuda() model = DDP(model) # ... training loop remote_train = kt.fn(train_model).to(compute) results = remote_train(epochs=20) # Returns results from all 4 workers

This distributed computing support makes Kubetorch suitable for training, large-scale data processing, and any workload that benefits from parallel execution across multiple nodes. For more API details, refer to the API page.