Autoscaling and Distribution
Kubetorch provides comprehensive support for both autoscaling and distributed computing workflows. These features
can be configured using compute.distribute(args)
or compute.autoscale(args)
on your compute resources.
Autoscaling
Kubetorch supports automatic scaling of your services based on concurrency or request rate using Knative
Serving. Critically, this also controls auto-scaling down to zero; you might want to leave up min_scale
or set a generous scale_to_zero_grace_period
to ensure interactive development is not disrupted, and
keeping your service warm for hot reloads. This enables efficient resource utilization and cost optimization
for both interactive development and production workloads.
Configurable Features
- Scaling approach: Based on the number of concurrent requests per pod, or by requests per second
- Scale-to-zero grace period: Automatically scale down to zero replicas after idle periods
- Resource limits: Set minimum and maximum replica counts
- Target utilization: Control when to trigger scaling
Example
import kubetorch as kt compute = kt.Compute(cpus="0.5").autoscale( min_scale=1, scale_to_zero_grace_period=30, concurrency=100, ) remote_fn = kt.fn(my_function).to(compute)
Note that autoscaling requires Knative to be installed in your cluster. For more API details, refer to the API page.
Distributed
Kubetorch provides built-in support for distributed computing frameworks, through creating a single Kubernetes Deployment with multiple replicas. It handles pod-to-pod communication, environment setup, and cluster management.
Supported Frameworks
- PyTorch DDP: Multi-GPU and multi-node PyTorch training with automatic process group initialization
- Ray: Ray cluster management using KubeRay, including head/worker node setup, cluster lifecycle management, and code synchronization
- JAX: JAX distributed computing with automatic coordinator setup and process management
- TensorFlow: TensorFlow distributed training with cluster configuration and worker coordination
- Generic SPMD: Framework-agnostic Single Program Multiple Data execution for custom distributed workloads
Key Features
- Automatic environment setup: Framework-specific environment variables and initialization
- Pod-to-pod communication: Headless services enable direct worker communication
- Quorum management: Configurable timeouts for worker readiness
- Process coordination: Tree-based communication topology for efficient scaling
- Code synchronization: Automatic code deployment and reloading across workers
- Error handling: Comprehensive exception propagation from any worker
Example
import kubetorch as kt # PyTorch DDP with 4 workers compute = kt.Compute( gpus=1, memory="8Gi", image=kt.Image(image_id="nvcr.io/nvidia/pytorch:23.10-py3") ).distribute("pytorch", workers=4) def train_model(epochs=10): # PyTorch distributed setup is automatic torch.distributed.init_process_group(backend="nccl") rank = torch.distributed.get_rank() world_size = torch.distributed.get_world_size() # Your training logic here model = MyModel().cuda() model = DDP(model) # ... training loop remote_train = kt.fn(train_model).to(compute) results = remote_train(epochs=20) # Returns results from all 4 workers
This distributed computing support makes Kubetorch suitable for training, large-scale data processing, and any workload that benefits from parallel execution across multiple nodes. For more API details, refer to the API page.