Serverless ML in Your Own Cloud

Define training pipelines in regular Python and dispatch them to arbitrary compute. Iterable, debuggable, DSL-free, and infrastructure agnostic.

IDEs, Orchestrators, or CI
Any cloud & Kubernetes
Deploy updates instantly
Built-in infra observability

Dispatch execution to any compute

Developers request compute that is launched from Kubernetes, elastic compute, bare metal, or a mixture.

import runhouse as rh # Define a cluster that you want to launch # This can be provisioned from elastic compute # or use existing Kubernetes clusters or VMs my_cluster = rh.cluster( name="rh-a10x", instance_type="A10G:1", memory="32+", provider="aws" ).up_if_not() # Save and reuse the cluster across multiple pipeline steps, # pipelines, or simply to ensure reproducibility. my_cluster.save() # Later... load your saved clusters for future use my_cluster = rh.cluster(name="rh-a10x").up_if_not()

Flexible and debuggable ML pipelines

Deploy code updates in less than 5 seconds and get streaming logs back for fast, iterable development.

# Define your model class using normal code class MyModelClass: def train(): .. def predict(): .. def save(): .. # Send your class to remote RemoteClass = rh.module(MyModelClass).to(my_cluster) # Instantiate and call an instance of the remote class RemoteModel = RemoteClass() RemoteModel.train()

Break the barrier between research and production

Manage your ML lifecycle using software development best practices on regular code. Deploy with no extra translation.

$ git add MyModelClass.py $ git commit -m "Refactor the train method" $ git push $ echo "Develop code, not orchestrator pipelines"

High-quality telemetry, out of the box

Automatically persist logs, track GPU/CPU/memory utilization, and audit resource access.

# API route to fetch logs for a resource @router.get( "/{uri}/logs", response_description="Resource logs retrieved", ) @send_event async def resource_logs_preview(...): ... # API route to fetch cluster status and metrics @router.get( "/{uri}/cluster/status", response_description="Cluster status retrieved", ) @send_event async def load_cluster_status(...): ...
import runhouse as rh # Define a cluster that you want to launch # This can be provisioned from elastic compute # or use existing Kubernetes clusters or VMs my_cluster = rh.cluster( name="rh-a10x", instance_type="A10G:1", memory="32+", provider="aws" ).up_if_not() # Save and reuse the cluster across multiple pipeline steps, # pipelines, or simply to ensure reproducibility. my_cluster.save() # Later... load your saved clusters for future use my_cluster = rh.cluster(name="rh-a10x").up_if_not()

Runhouse

Effortlessly program powerful ML systems across arbitrary compute in regular Python.

Works with your stack

Easily integrate within your existing pipelines, code, and development workflows.

$pip install runhouse

Loved by research and infra teams alike

Runhouse is built for end-to-end ML development. Dispatch work quickly during local development in notebooks or IDEs, but run as-is inside Kubernetes, CI, or your favorite orchestrator. No more push and pray.

Graphic showing a laptop with a Python logo connecting to modules and functions running on cloud compute

Runs inside your own infrastructure

Execution stays inside your cloud(s), with total flexibility to cost-optimize or scale to new providers.

List of cloud provider logos: AWS, Google, Azure, Kubernetes, AWS Lambda, and SageMaker

Use Cases

Training

OfflineOnlineDistributedHPO

Fine-tuning with LoRA

Inference

OnlineBatchLLMsMulti-step

Call Llama3 on AWS EC2

Composite Systems

RAGMulti-taskEvaluation

FastAPI RAG app with LanceDB and vLLM

Data Processing

BatchOnlineData Apps

Parallel GPU Batch Embeddings

ML that Runs

An ML platform that improves developer experience while increasing development velocity.

Line illustration showing researchers and engineers connected to an AI app or code via lines before and after changes

Without Runhouse:

Research is launched on siloed compute, sampled data, and notebook code to enable iterative development. Production is reached via a slow translation to orchestrators, and becomes difficult to debug when errors arise.

Diagram showing a block with "Code development using regular SDLC" above a smaller "Compute" block and an arrow with "Runhouse manages dispatch" between

With Runhouse: Fast Software Development

Code is written and executed identically in research and production. Errors can be debugged on a branch from local IDEs and merged into production using a standard development lifecycle.

Operationalize your ML as a living stack.

Try it out
Screenshot of a search interface with a search bar and listed resources

Search & Sharing

Runhouse Den provides you with an interface to explore your ML artifacts. Easily share with your teammates, search through your resources, and view details all in one place.

Screenshot of a chart showing GPU memory usage

Observability

View cluster status, automatically persist logs, track GPU/CPU/memory utilization, and enable more efficient debugging for your team. Gain insights with trends and simple dashboards.

Screenshot of a resource page and user access list

Auth & Access Control

Den makes it easy to control access to your services. Provide individual teammates with read or write access, or choose "public" visibility to expose public API endpoints.

Start building on a solid ML foundation.

Book a demo