A cluster is the most basic form of compute in Runhouse, largely representing a group of instances or VMs connected with Ray. They largely fall in two categories:
Static Clusters: Any machine you have SSH access to, set up with IP addresses and SSH credentials.
On-Demand Clusters: Any cloud instance spun up automatically for you with your cloud credentials.
Runhouse provides various APIs for interacting with remote clusters, such as terminating an on-demand cloud cluster or running remote CLI or Python commands from your local dev environment.
Let’s start with a simple example using AWS. First, install runhouse
with AWS dependencies:
! pip install "runhouse[aws]"
Make sure your AWS credentials are set up:
! aws configure ! sky check
We can start by using the rh.cluster
factory function to create our
cluster. By specifying an instance_type
, Runhouse sets up an
On-Demand Cluster in AWS EC2 for us.
Each cluster must be provided with a unique name
identifier during
construction. This name
parameter is used for saving down or loading
previous saved clusters, and also used for various CLI commands for the
cluster.
Our instance_type
here is defined as CPU:2
, which is the
accelerator type and count that we need (another example would be
A10G:2
). We could alternatively specify a specific specific instance
type, such as p3.2xlarge
or g4dn.xlarge
(these are instance
types on AWS).
import runhouse as rh aws_cluster = rh.cluster(name="test-cluster", instance_type="CPU:2") aws_cluster.up_if_not()
Next, we set up a basic function to throw up on our cluster. For more information about Functions & Modules that you can put up on a cluster, see Functions & Modules.
def run_home(name: str): return f"Run home {name}!" remote_function = rh.function(run_home).to(aws_cluster)
After running .to
, your function is set up on the cluster to be
called from anywhere. When you call remote_function
, it executes
remotely on your AWS instance.
remote_function("in cluster!")
INFO | 2024-03-06 15:18:58.439252 | Calling run_home.call
INFO | 2024-03-06 15:18:59.490122 | Time to call run_home.call: 1.05 seconds
'Run home in cluster!!'
If you would like to launch on-demand clusters using existing VPCs, you can easily set it up by configuring SkyPilot. Without setting VPC, we launch in the default VPC in the region of the cluster. If you do set a VPC name, we will only launch in regions containing that VPC name.
You need to create or update the file ~/.sky/config.yaml to configure the VPC. For instance in Amazon Web Services, you need to add
aws: vpc_name: my-vpc-name
And for Google Cloud you need:
gcp: vpc_name: my-vpc-name
If you need support for more advanced enterprise configurations, please email support@run.house for more information. More documentation is also available at SkyPilot’s advanced config page.
In the previous example, the cluster that was brought up in EC2 is only
accessible to the original user that has SSH credentials to the machine.
However, you can set up a cluster with ports exposed to open Internet,
and access objects and functions via curl
.
tls_cluster = rh.cluster(name="tls-cluster", instance_type="CPU:2", open_ports=[443], # expose HTTPS port to public server_connection_type="tls", # specify how runhouse communicates with this cluster den_auth=False, # no authentication required to hit this cluster (NOT recommended) ).up_if_not()
WARNING | 2024-03-06 15:19:05.297411 | /Users/rohinbhasin/work/runhouse/runhouse/resources/hardware/on_demand_cluster.py:317: UserWarning: Server is insecure and must be inside a VPC or have den_auth enabled to secure it. warnings.warn(
remote_tls_function = rh.function(run_home).to(tls_cluster)
remote_tls_function("Marvin")
INFO | 2024-03-06 15:26:05.482586 | Calling run_home.call
INFO | 2024-03-06 15:26:06.550625 | Time to call run_home.call: 1.07 seconds
'Run home Marvin!'
tls_cluster.address
'54.172.178.196'
! curl "https://54.172.178.196/run_home/call?name=Marvin" -k
{"data":""Run home Marvin!"","error":null,"traceback":null,"output_type":"result_serialized","serialization":"json"}
This cluster is exposed to the open Internet, so anyone can hit it. If
you do want to share functions and apps publically, it’s recommended you
set den_auth=True
when setting up your cluster, which requires a
user to run runhouse login
in order to hit the cluster. We’ll enable
it now:
tls_cluster.enable_den_auth()
! curl "https://54.172.178.196/run_home/call?name=Marvin" -k
{"data":null,"error":raise PermissionError(\nPermissionError: No Runhouse token provided. Try running $ runhouse login or visiting https://run.house/login to retrieve a token. If calling via HTTP, please provide a valid token in the Authorization header.\n"","output_type":"exception","serialization":null}
If we send our Runhouse Den token as a header, then the request is valid:
! curl "https://54.172.178.196/run_home/call?name=Marvin" -k -H "Authorization: Bearer <YOUR TOKEN HERE>"
{"data":""Run home Marvin!"","error":null,"traceback":null,"output_type":"result_serialized","serialization":"json"}
If you have existing machines within a VPC that you want to connect to, you can simply provide the IP addresses and path to SSH credentials to the machine.
cluster = rh.cluster( # using private key name="cpu-cluster-existing", ips=['<ip of the cluster>'], ssh_creds={'ssh_user': '<user>', 'ssh_private_key':'<path_to_key>'}, )
tls_cluster.run(['pip install numpy && pip freeze | grep numpy'])
Warning: Permanently added '54.172.178.196' (ED25519) to the list of known hosts.
Requirement already satisfied: numpy in /opt/conda/lib/python3.10/site-packages (1.26.4)
numpy==1.26.4
[(0, 'Requirement already satisfied: numpy in /opt/conda/lib/python3.10/site-packages (1.26.4)nnumpy==1.26.4n', "Warning: Permanently added '54.172.178.196' (ED25519) to the list of known hosts.rn")]
tls_cluster.run_python(['import numpy', 'print(numpy.__version__)'])
1.26.4
[(0, '1.26.4n', '')]