Ray Hyperparameter Tuning

View on Github

In this example, we show you how to start a basic hyperparameter tuning using Ray Tune on remote compute. You simply need to write your Ray Tune program as you would normally, and then send it to the remote cluster. Kubetorch handles all the complexities of launching and setting up the remote Ray cluster for you.

import time import kubetorch as kt import ray from ray import tune

Define a Ray Tune program

You should simply think of this as "any regular Ray Tune program" that you would write entirely agnostic of Kubetorch.

  • A dummy objective function that is used by the train_function() to score the hyperparameters.
  • A Ray Tune Tuner that runs the train_function() over a search space of hyperparameters.
def dummy_objective_function(width, height, step): """Dummy objective function for hyperparameter optimization.""" # Simulate some computation time time.sleep(0.1) # Return a score based on the parameters return (0.1 + width * step / 100) ** (-1) + height * 0.1 def ray_tune_hpo(num_samples=4, max_concurrent_trials=2): """Ray Tune hyperparameter optimization function for testing.""" # Initialize Ray (should connect to existing cluster) ray.init(address="auto") print("hello") def train_function(config): """Training function for Ray Tune.""" step = 0 for step in range(3): # Short training for testing score = dummy_objective_function(config["width"], config["height"], step) # Report the score to Tune tune.report(dict(score=score, step=step)) # Define the search space search_space = { "width": tune.uniform(0, 10), "height": tune.uniform(-10, 10), } # Create and run the tuner tuner = tune.Tuner( train_function, tune_config=tune.TuneConfig( metric="score", mode="max", max_concurrent_trials=max_concurrent_trials, num_samples=num_samples, ), param_space=search_space, ) results = tuner.fit() best_result = results.get_best_result() # Return summary of the HPO run return { "best_score": best_result.metrics["score"], "best_config": best_result.config, "num_trials": len(results), "status": "completed", }

Define Compute and Execution

In code, we will define the compute our Ray Tune program will run on, dispatch our function to the compute and execute it normally, as if it were local, propagating the values we want to the remote function call. You must have Kuberay installed on your cluster, the installation instructions with Kubetorch are here.

def find_minimum(): ray_compute = kt.Compute( cpus="2", memory="3Gi", image=kt.Image(image_id="rayproject/ray") ).distribute("ray", workers=2) remote_fn = kt.fn(ray_tune_hpo).to(ray_compute) # Run Ray Tune HPO with small parameters for testing results = remote_fn(num_samples=2, max_concurrent_trials=1) return results if __name__ == "__main__": res = find_minimum() print(res)