A Blob is a Runhouse primitive that represents an entity for storing data and lives inside of a Folder.

Blob Factory Method

runhouse.blob(data: [Any] = None, name: str | None = None, path: str | Path | None = None, system: str | None = None, env: str | Env | None = None, data_config: Dict | None = None, load: bool = True, dryrun: bool = False)[source]

Returns a Blob object, which can be used to interact with the resource at the given path

  • data – Blob data. The data to persist either on the cluster or in the filesystem.

  • name (Optional[str]) – Name to give the blob object, to be reused later on.

  • path (Optional[str or Path]) – Path (or path) to the blob object. Specfying a path will force the blob to be saved to the filesystem rather than persist in the cluster’s object store.

  • system (Optional[str or Cluster]) – File system or cluster name. If providing a file system this must be one of: [file, github, sftp, ssh, s3, gs, azure]. We are working to add additional file system support. If providing a cluster, this must be a cluster object or name, and whether the data is saved to the object store or filesystem depends on whether a path is specified.

  • env (Optional[Env or str]) – Environment for the blob. If left empty, defaults to base environment. (Default: None)

  • data_config (Optional[Dict]) – The data config to pass to the underlying fsspec handler (in the case of saving the the filesystem).

  • load (bool) – Whether to try to load the Blob object from RNS. (Default: True)

  • dryrun (bool) – Whether to create the Blob if it doesn’t exist, or load a Blob object as a dryrun. (Default: False)


The resulting blob.

Return type:



>>> import runhouse as rh >>> import json >>> >>> data = list(range(50) >>> serialized_data = json.dumps(data) >>> >>> # Local blob with name and no path (saved to Runhouse object store) >>> rh.blob(name="@/my-blob", data=data) >>> >>> # Remote blob with name and no path (saved to cluster's Runhouse object store) >>> rh.blob(name="@/my-blob", data=data, system=my_cluster) >>> >>> # Remote blob with name, filesystem, and no path (saved to filesystem with default path) >>> rh.blob(name="@/my-blob", data=serialized_data, system="s3") >>> >>> # Remote blob with name and path (saved to remote filesystem) >>> rh.blob(name='@/my-blob', data=serialized_data, path='/runhouse-tests/my_blob.pickle', system='s3') >>> >>> # Local blob with path and no system (saved to local filesystem) >>> rh.blob(data=serialized_data, path=str(Path.cwd() / "my_blob.pickle"))
>>> # Loading a blob >>> my_local_blob = rh.blob(name="~/my_blob") >>> my_s3_blob = rh.blob(name="@/my_blob")

Blob Class

class runhouse.Blob(name: str | None = None, system: Cluster | str | None = None, env: Env | None = None, dryrun: bool = False, **kwargs)[source]
__init__(name: str | None = None, system: Cluster | str | None = None, env: Env | None = None, dryrun: bool = False, **kwargs)[source]

Runhouse Blob object


To build a Blob, please use the factory method blob().


Check whether the blob exists in the file system


>>> blob = rh.blob(data) >>> blob.exists_in_system()

Return the resolved state of the blob, which is the data.

Primarily used to define the behavior of the fetch method.


>>> blob = rh.blob(data) >>> blob.resolved_state()

Delete the blob from wherever it’s stored.


>>> blob = rh.blob(data) >>> blob.rm()
to(system: str | Cluster, env: str | Env | None = None, path: str | None = None, data_config: dict | None = None)[source]

Return a copy of the blob on the destination system, and optionally path.


>>> local_blob = rh.blob(data) >>> s3_blob ="s3") >>> cluster_blob =

Save the underlying blob to its cluster’s store.


>>> rh.blob(data).write()