Returns a Module object, which can be used to instantiate and interact with the class remotely.
Any callable public method of the module is intercepted and executed remotely over rpc, with exception of
certain functions Python doesn’t make interceptable (e.g. __call__, __init__), and methods of the Module
class (e.g. to
, fetch
, etc.). Properties and private methods are not intercepted, and will be
executed locally.
Any method which executes remotely may be called normally, e.g. model.forward(x)
, or asynchronously,
e.g. key = model.forward.run(x)
(which returns a key to retrieve the result with
cluster.get(key)
), or with run_obj = model.train.remote(x)
, which runs synchronously but returns
a remote object to avoid passing heavy results back over the network.
Setting attributes, both public and private, will be executed remotely, with the new values only being
set in the remote module and not the local one. This excludes any methods or attribtes of the Module class
proper (e.g. system
or name
), which will be set locally.
Attributes, private properties can be fetched with the remote
property, and the full resource can be
fetched using .fetch()
, e.g. model.remote.weights
, model.remote.__dict__
, model.fetch()
.
When a module is sent to a cluster, it’s public attribtes are serialized, sent over, and repopulated in the remote instance. This means that any changes to the module’s attributes will not be reflected in the remote
cls – The class to instantiate.
name (Optional[str]) – Name to give the module object, to be reused later on.
system (Optional[str or Cluster]) – File system or cluster name. If providing a file system this must be one of:
[file
, github
, sftp
, ssh
, s3
, gs
, azure
].
We are working to add additional file system support. If providing a cluster, this must be a cluster object
or name, and whether the data is saved to the object store or filesystem depends on whether a path is
specified.
env (Optional[str or Env]) – Environment in which the module should live on the cluster, if system is cluster.
dryrun (bool) – Whether to create the Blob if it doesn’t exist, or load a Blob object as a dryrun.
(Default: False
)
The resulting module.
>>> import runhouse as rh
>>> import transformers
>>>
>>> # Sample rh.Module class
>>> class Model(rh.Module):
>>> def __init__(self, model_id, device="cpu", system=None, env=None):
>>> # Note that the code here will be run in your local environment prior to being sent to
>>> # to a cluster. For loading large models/datasets that are only meant to be used remotely,
>>> # we recommend using lazy initialization (see tokenizer and model attributes below).
>>> super().__init__(system=system, env=env)
>>> self.model_id = model_id
>>> self.device = device
>>>
>>> @property
>>> def tokenizer(self):
>>> # Lazily initialize the tokenizer remotely only when it is needed
>>> if not hasattr(self, '_tokenizer'):
>>> self._tokenizer = transformers.AutoTokenizer.from_pretrained(self.model_id)
>>> return self._tokenizer
>>>
>>> @property
>>> def model(self):
>>> if not hasattr(self, '_model'):
>>> self._model = transformers.AutoModel.from_pretrained(self.model_id).to(self.device)
>>> return self._model
>>>
>>> def predict(self, x):
>>> x = self.tokenizer(x, return_tensors="pt")
>>> return self.model(x)
>>> # Creating rh.Module instance
>>> model = Model(model_id="bert-base-uncased", device="cuda", system="my_gpu", env="my_env")
>>> model.predict("Hello world!") # Runs on system in env
>>> tok = model.remote.tokenizer # Returns remote tokenizer
>>> id = model.local.model_id # Returns local model_id, if any
>>> model_id = model.model_id # Returns local model_id (not remote)
>>> model.fetch() # Returns full remote module, including model and tokenizer
>>>
>>> other_model = Model(model_id="bert-base-uncased", device="cuda").to("my_gpu", "my_env")
>>>
>>> # Another method: Create a module instance from an existing non-Module class using rh.module()
>>> RemoteModel = rh.module(cls=BERTModel, system="my_gpu", env="my_env")
>>> remote_model = RemoteModel(model_id="bert-base-uncased", device="cuda")
>>> remote_model.predict("Hello world!") # Runs on system in env
>>>
>>> # You can also call remote class methods
>>> other_model = RemoteModel.get_model_size("bert-base-uncased")
>>> # Loading a module
>>> my_local_module = rh.module(name="~/my_module")
>>> my_s3_module = rh.module(name="@/my_module")
Helper method to allow for access to remote state, both public and private. Fetching functions is not advised. system.get(module.name).resolved_state() is roughly equivalent to module.fetch().
Example
>>> my_module.fetch("my_property")
>>> my_module.fetch("my_private_property")
>>> MyRemoteClass = rh.module(my_class).to(system)
>>> MyRemoteClass(*args).fetch() # Returns a my_class instance, populated with the remote state
>>> my_blob.fetch() # Returns the data of the blob, due to overloaded ``resolved_state`` method
>>> class MyModule(rh.Module):
>>> # ...
>>>
>>> MyModule(*args).to(system).fetch() # Returns the full remote module, including private and public state
Async version of fetch. Can’t be a property like fetch because __getattr__ can’t be awaited.
Example
>>> await my_module.fetch_async("my_property")
>>> await my_module.fetch_async("_my_private_property")
Check if the module already exists on the cluster, and if so return the module object. If not, put the module on the cluster and return the remote module.
Example
>>> remote_df = Model().get_or_to(my_cluster, name="remote_model")
Helper property to allow for access to local properties, both public and private.
Example
>>> my_module.local.my_property
>>> my_module.local._my_private_property
>>> my_module.local.size = 14
Update the resource in the object store.
Helper property to allow for access to remote properties, both public and private. Returning functions is not advised.
Example
>>> my_module.remote.my_property
>>> my_module.remote._my_private_property
>>> my_module.remote.size = 14
Rename the module.
Specify that the module should resolve to a particular state when passed into a remote method. This is
useful if you want to revert the module’s state to some “Runhouse-free” state once it is passed into a
Runhouse-unaware function. For example, if you call a Runhouse-unaware function with .remote()
,
you will be returned a Blob which wraps your data. If you want to pass that Blob into another function
that operates on the original data (e.g. a function that takes a numpy array), you can call
my_second_fn(my_blob.resolve())
, and my_blob
will be replaced with the contents of its .data
on the
cluster before being passed into my_second_fn
.
Resolved state is defined by the resolved_state
method. By default, modules created with the
rh.module
factory constructor will be resolved to their original non-module-wrapped class (or best attempt).
Modules which are defined as a subclass of Module
will be returned as-is, as they have no other
“original class.”
Example
>>> my_module = rh.module(my_class)
>>> my_remote_fn(my_module.resolve()) # my_module will be replaced with the original class `my_class`
>>> my_result_blob = my_remote_fn.remote(args)
>>> my_other_remote_fn(my_result_blob.resolve()) # my_result_blob will be replaced with its data
Return the resolved state of the module. By default, this is the original class of the module if it was
created with the module
factory constructor.
Register the resource and save to local working_dir config and RNS config store.
Async version of property setter.
Example
>>> await my_module.set_async("my_property", my_value)
>>> await my_module.set_async("_my_private_property", my_value)