A Blob is a Runhouse primitive that represents an entity for storing data and lives inside of a Folder.
Returns a Blob object, which can be used to interact with the resource at the given path
data – Blob data. The data to persist either on the cluster or in the filesystem.
name (Optional[str]) – Name to give the blob object, to be reused later on.
path (Optional[str]) – Path (or path) to the blob object. Specfying a path will force the blob to be saved to the filesystem rather than persist in the cluster’s object store.
system (Optional[str or Cluster]) – File system or cluster name. If providing a file system this must be one of:
[file
, github
, sftp
, ssh
, s3
, gs
, azure
].
We are working to add additional file system support. If providing a cluster, this must be a cluster object
or name, and whether the data is saved to the object store or filesystem depends on whether a path is
specified.
env (Optional[Env or str]) – Environment for the blob. If left empty, defaults to base environment.
(Default: None
)
data_config (Optional[Dict]) – The data config to pass to the underlying fsspec handler (in the case of saving the the filesystem).
load (bool) – Whether to try to load the Blob object from RNS. (Default: True
)
dryrun (bool) – Whether to create the Blob if it doesn’t exist, or load a Blob object as a dryrun.
(Default: False
)
The resulting blob.
Example
>>> import runhouse as rh
>>> import json
>>>
>>> data = list(range(50)
>>> serialized_data = json.dumps(data)
>>>
>>> # Local blob with name and no path (saved to Runhouse object store)
>>> rh.blob(name="@/my-blob", data=data)
>>>
>>> # Remote blob with name and no path (saved to cluster's Runhouse object store)
>>> rh.blob(name="@/my-blob", data=data, system=my_cluster)
>>>
>>> # Remote blob with name, filesystem, and no path (saved to filesystem with default path)
>>> rh.blob(name="@/my-blob", data=serialized_data, system="s3")
>>>
>>> # Remote blob with name and path (saved to remote filesystem)
>>> rh.blob(name='@/my-blob', data=serialized_data, path='/runhouse-tests/my_blob.pickle', system='s3')
>>>
>>> # Local blob with path and no system (saved to local filesystem)
>>> rh.blob(data=serialized_data, path=str(Path.cwd() / "my_blob.pickle"))
>>> # Loading a blob
>>> my_local_blob = rh.blob(name="~/my_blob")
>>> my_s3_blob = rh.blob(name="@/my_blob")
Check whether the blob exists in the file system
Example
>>> blob = rh.blob(data)
>>> blob.exists_in_system()
Return the resolved state of the blob, which is the data.
Primarily used to define the behavior of the fetch
method.
Example
>>> blob = rh.blob(data)
>>> blob.resolved_state()
Delete the blob from wherever it’s stored.
Example
>>> blob = rh.blob(data)
>>> blob.rm()
Return a copy of the blob on the destination system, and optionally path.
Example
>>> local_blob = rh.blob(data)
>>> s3_blob = blob.to("s3")
>>> cluster_blob = blob.to(my_cluster)
Save the underlying blob to its cluster’s store.
Example
>>> rh.blob(data).write()