Folders

Open In Colab

The Runhouse Folder makes it easy to send folders and files between your local environment, a cluster, or your cloud storage (using your own credentials), without needing to learn and provider-specific APIs.

Installation and Setup

To install base runhouse:

!pip install runhouse

Runhouse supports sending folders to/from cloud storage such as s3, gcs, azure. To download provider-specific libraries that are used under the hood, you can install "runhouse[aws/gcp/azure]". In this tutorial we demonstrate with s3 and gcs, and install "runhouse[aws, gcp]".

!pip install "runhouse[aws, gcp]"

If you would like to use s3 or gcs, please make sure to also set up your credentials locally. You can see the instructions for this by running sky check.

import runhouse as rh

Folder Setup

Here we define a simple folder structure in our current directory, a simple sample-folder consisting of 5 files, 1-5.txt.

import os folder_name = "sample-folder" os.makedirs(folder_name, exist_ok=True) for i in range(5): with open(f'{folder_name}/{i}.txt', 'w') as f: f.write('i') local_path = f"{os.getcwd()}/{folder_name}" local_path
'/Users/caroline/Documents/runhouse/notebooks/docs/sample-folder'

Cluster Setup

Launch a basic cluster, as the tutorial will demonstrate sending the local folder to the cluster. You can learn more about clusters in the Cluster tutorial.

cluster = rh.cluster( name="rh-cluster", instance_type="CPU:2+", provider="aws", ) cluster.up_if_not()

Runhouse Folder

Construct a Runhouse folder object with rh.folder, passing in the path of the folder you’d like it to represent. Optionally pass in a system=<cluster>/s3/gcs/azure that the folder lives on.

Here, we construct a Runhouse folder object that represents the sample-folder that we created earlier.

local_folder = rh.folder(path=local_path)

To print the full paths, call .ls(), or for relative paths, call .ls(full_paths=False).

local_folder.ls(full_paths=False)
['4.txt', '3.txt', '2.txt', '0.txt', '1.txt']

To: Cluster

To send it to a cluster, call .to(system=cluster), and optionally pass in a path. If no path is provided, it will be automatically generated. The path can be retrieved by calling .path on the resulting object.

cluster_folder = local_folder.to(system=cluster, path=folder_name)
INFO | 2024-03-06 04:35:08.517625 | Copying folder from file:///Users/caroline/Documents/runhouse/notebooks/docs/sample-folder to: rh-cluster, with path: sample-folder
cluster_folder.ls()
['sample-folder/3.txt',
 'sample-folder/0.txt',
 'sample-folder/4.txt',
 'sample-folder/2.txt',
 'sample-folder/1.txt']
cluster_folder.path
'sample-folder'

To: S3/GCS

Sending to S3/GCS is similar, call .to(system=s3/gcs).

gs_folder = local_folder.to(system="gs")
INFO | 2024-03-06 04:35:38.607986 | Copying folder from file:///Users/caroline/Documents/runhouse/notebooks/docs/sample-folder to: gs, with path: /runhouse-folder/bd489bb276734f7f8c23e401e6bb2b51
gs_folder.ls(full_paths=False)
['0.txt', '1.txt', '2.txt', '3.txt', '4.txt']

Similarly, for s3:

s3_folder = local_folder.to(system="s3")
INFO | 2024-03-06 04:36:04.390441 | Copying folder from file:///Users/caroline/Documents/runhouse/notebooks/docs/sample-folder to: s3, with path: /runhouse-folder/dae8c16b71a744cb976da0dace7c4db2
s3_folder.ls(full_paths=False)
['0.txt', '1.txt', '2.txt', '3.txt', '4.txt']

To: Here

The keyword for sending to local is .to("here").

new_local_folder = s3_folder.to("here", path="new-sample-folder")
INFO | 2024-03-06 04:38:01.269441 | Copying folder from s3://runhouse-folder/dae8c16b71a744cb976da0dace7c4db2 to: file, with path: new-sample-folder
new_local_folder.ls(full_paths=False)
['4.txt', '3.txt', '2.txt', '0.txt', '1.txt']

And more..

Folders can be sent between any pair of local, cluster, or cloud storage, including between different clusters, or within the same cloud storage but duplicating the folder to a second location in storage.