Actors for Kubernetes

Actors as stateful workers have many advantages for ML execution and are especially critical for RL. Their long-lived state, fault-tolerant architecture, and ability to be orchestrated in heterogeneous workloads offer distinct benefits relative to jobs. Ray has been an actor framework, offering Pythonic APIs for quickly deploying and using actors. Kubetorch offers a Kubernetes-native way to do the same without adding complexity and brittleness through framework layering.

Paul Yang

ML @ 🏃‍♀️Runhouse🏠

January 8, 2026

Actors?
Ray: Why It’s Important and What’s Missing
Kubetorch: Kubernetes-Native Actors for ML
Learn More

This article examines why actors matter for ML and RL specifically, explains why Ray has become popular, and introduces how Kubetorch solves some of Ray’s sharp edges if your team is Kubernetes-first.

Actors?

Actors are a common programming paradigm with an extensive background that we will not cover in detail. Briefly, when we describe actors, we refer to objects in independent compute entities, with individual lifecycles, containing private state, and having the ability to communicate with other actors or even spawn additional actors.

In machine learning specifically, actors can be applied to almost all use cases, across training, batch processing, and inference. But actors have become especially popular recently because they are critical for reinforcement learning in generative model post-training, a use case that essentially can't be run any other way.

Here are a few key benefits to call out relative to job-style execution.

Imperative control over actors gives you rich programmability for distributed execution. You issue calls when you want, intervene with arbitrary logic, and debug naturally.

Debugging: It’s significantly more pleasant to build your application by making interactive calls to your actor, executing step-by-step, and inspecting live state rather than firing off a job and doing log archaeology upon failure.
Scheduling: Invoke execution directly, and write custom logic to load balance or adjust execution. Take hyperparameter optimization; each subsequent training trial is farmed out one-by-one from a control process, giving control versus feeding a config into a framework, and with zero runtime control.

Long-lived state allows for heavy artifacts to persist across different stages of execution.

Iteration: Long-lived actors let you iterate on your programs within the same environment without reloading heavy artifacts like weights or datasets. By contrast, jobs restart from cold, starting from scratch, just to add a print statement.
Persistence across a Pipeline: A pipeline can reuse the same artifacts across multiple pipeline steps, like running batch inference or evals on the same actor after training, or even letting a dev manually try inference without launching a separate service.

Fault Tolerance: If an actor dies, there is no cascading failure since every actor lives in a separate process. The control process used to interact with the actors might also live separately and recover gracefully. For instance, if you hit a fault during distributed training, you can decide what to do next without teardown: if a CUDA OOM, reduce the batch size; if nodes were preempted, reform the process group with a smaller world size; else dump state and checkpoint.

Heterogeneous Orchestration (RL!): You simply need actors for RL. An RL training loop involves generation (sampling rollouts from the current policy), reward scoring, and training (updating the policy). These have different compute profiles and scaling needs, but must exchange data iteratively. With actors, once you have defined each of your RL components, all you have to do is put these on workers and orchestrate the training from a driver / control process outside of the actor. How would you do this with torchrun, without building so much custom scaffolding that you reinvent actors?

Ray: Why It’s Important and What’s Missing

If you have experimented with actors for ML or with RL, you’ve probably tried Ray. This section is not a Ray tutorial (or Ray criticism, for that matter), but we identify why Ray Core has become so critical and where teams have faced challenges.

Very briefly, if you are on a Ray cluster head node and you run something like the below, you have created an actor. Beyond all of the aforementioned benefits of using actors for development, a Pythonic ray.remote call is significantly simpler than having to wrestle with MLOps systems. It is fast to spin up and easy to iterate with.

@ray.remote(num_cpus=2, num_gpus=4)
class Trainer:
    ...

From there, you schedule it as a Ray worker process (and optionally scale it up to many actors) and make calls out to the actors you launched.

workers = [Trainer.remote(rank=i) for i in range(4)] # 4 workers
futures = [w.train.remote(...) for w in workers]
results = ray.get(futures)

Because of these ergonomics, Ray won popularity (though certainly not close to universal adoption) for sophisticated ML teams. Once my researchers get onto the head node of a Ray cluster, they can launch actors and start working effectively, programming regularly, regardless of the type of workload or scale they want to operate at. (It’s worth clarifying that Ray abstractions like Ray Data, HPO, and Train are libraries built on Ray and show its flexibility, but aren’t what we discuss re: actor programming).

More recently, Ray has become extremely popular for modern RL for post-training, as the foundation for popular frameworks like VERL, SkyRL, or AReal, as well as the custom implementations AI labs use, which rely on Ray’s actor model for execution.

Why You Need Ray for Actors

As far as why Ray specifically, it’s because we need to coerce the compute primitives of existing infrastructure into a Ray cluster to make them programmable. Most ML platforms today run on top of Kubernetes or Slurm (or in vendor platforms that are Kubernetes-like in accepting a Docker container or Slurm-like in accepting a bash script entrypoint). These are not ergonomic systems for ML devs.

Kubernetes offers no friendly framework (until Kubetorch) for deploying and using actors. If I have a class I want to run as an actor, I must wrap it with a FastAPI app, build a Docker image, deploy it as a service, and then make calls via requests to share data. This is heavy, and ML researchers don’t want to become Kubernetes experts writing YAML and wrestling with Docker. With Ray, spinning up a process is extremely fast and done by decorating Python and orchestration, also just lives as regular code.

Slurm was traditionally where researchers worked, but an even worse fit. As a workload scheduler for HPC jobs, it is architecturally extremely unsuited for actors and a much better fit for homogeneous SPMD execution like regular distributed training or heavyweight pre-training. It is similarly impractical to run multiple turns of RL training with any other system that operates by launching a job with a bash command, like SageMaker Training or Kubeflow PyTorchJob.

Challenges with Ray as a Platform

But making Ray your platform on top of existing compute introduces a number of new problems. You are now operating within a layered cluster-in-cluster structure, with likely a Kubernetes cluster underneath. This leads to some widely experienced pains, whether for RL or just ML development more broadly.

Duplicate, conflicting primitives: You now have Kubernetes and Ray schedulers, Kubernetes and Ray observability, Kubernetes and Ray autoscaling, etc., which now clash. For instance, pre-emptions at a Kubernetes level might interrupt Ray’s operations.
Less robust components: Adding layers always leads to more failures. While Ray’s primitives are pretty good, Kubernetes has frankly been battle-hardened at next-level scale and has a significantly wider base of development. For instance, teams running scaled workloads often see the head node or object store fall over.
Mono-image cluster: Ray pushes users to operate within a largely homogeneous environment and image across all workers, due to the serialization model of the Ray object store and communication between actors. This makes it challenging for heterogeneous workloads like RL, which have a greater demand for custom environments. It’s common for rewards evaluation to be run as separate Docker images outside of the Ray cluster and exposed as services altogether (i.e., manual actors).
Observability and debugging: Ray makes the potentially existing Kubernetes-native ecosystem of management and observability less useful. For instance, trying to capture Ray logs at a platform level or understand how much GPU utilization there is from Grafana within a Ray cluster are both surprisingly hard tasks.
Platform engineering support: Ray is more exotic than regular Kubernetes and outside of traditional platform team expertise. Teams need to dedicate ML-specific platform staff without the benefit of core platform scaffolding.

Kubetorch: Kubernetes-Native Actors for ML

The ML world has increasingly coalesced around Kubernetes, so we designed Kubetorch from the ground up to be a Kubernetes-native framework for ML actors. The name Kubetorch itself does not imply it’s only usable with PyTorch, but alludes to a design around creating “eager execution on Kubernetes” with actors, much like the eager execution of PyTorch gave ML practitioners imperative control over GPU execution.

A simple way to conceptualize Kubetorch on Kubernetes is to disaggregate the pieces of what Ray does into what is already well-served by Kubernetes and what was described as missing above:

Well-Served by Kubernetes: Scheduling, resource management, lifecycle, and service discovery
Kubetorch Is Necessary: Simple and Pythonic deployment of application logic, orchestration of distributed execution, and fast actor-to-actor data synchronization

Kubernetes already offers a robust and well-understood featureset with rich tooling. Creating actors on Kubernetes should not require extra layering or rebuilding core primitives. With Kubetorch, you have a simple way to command Kubernetes resources in Python as actors at scale. Here are a few examples and code snippets that illustrate the functionality:

Simple and Fast Actor Deployments

With Kubetorch, you can take regular functions and classes and deploy them as actors; the APIs are Pythonic, launch or update in seconds, and support arbitrary Kubernetes resource flavors.

For instance, we might run the following snippet on our devbox, in Argo, or as a KT app, to define compute and launch an actor for generating inference. Launching compute is as simple as specifying GPUs, but because these ultimately turn into Kubernetes pods, I have all the richness and configurability of Kubernetes to define exactly the compute I need (with integration into any queueing or policies I might have).

The returned inference_actor acts exactly like an instance of my InferenceActor class and can be called “locally,” while execution propagates to the remote resources on Kubernetes. With this, I can make repeated async calls to many replicas for batch inference or use it to generate rollouts for my RL training loop.

inference_compute = kt.Compute(
        gpus=1, 
        image=kt.Image(image_id="ghcr.io/team-base-image").run_bash("uv pip install -r async_grpo_code/requirements-inference.txt")
).autoscale(min_scale=1)

inference_actor = kt.cls(InferenceActor).to(inference_compute)

Distributed, Imperative Execution

Call deployed actors in Python from any local or orchestrator process; the calls propagate to the actors on Kubernetes and trigger execution there. These executions can be a single actor, embarrassingly parallel actors, SPMD (like torchrun) execution on a group of actors, or even a Ray program as an actor.

In the following example, which is an illustrative example, we show how calls to the training actor propagate to all replicas simultaneously, while the inference actor round robins and I can call it in parallel.

inference_compute = kt.Compute(...).autoscale(min_scale=2, max_scale=8)
inference_actors = kt.cls(InferenceActor).to(inference_compute) 

train_compute = kt.Compute(...).distribute("pytorch", workers=8)
train_actors = kt.cls(TrainerActor).to(train_compute)

# In actual practice, all these steps would be called async 

results = await asyncio.gather(*[inference_actors.generate_batch(b) for b in batches]) # parallel calls to inference actors 

train_actors.train_epoch(
concat([r.rollouts for r in results]), 
concat([r.rewards for r in results])
) # distributed training call to all ranks

train_actors.put_weights() 
inference_actors.update_weights()

Cluster-Global Data Syncs

Kubernetes does not expose simple primitives for sharing and heavy artifacts across pods, such as moving model checkpoints around in RL training. Kubetorch exposes a data store that allows for fast key-based transfer disk-to-disk or GPU-to-GPU.

The following code will move weights across the fastest path between a source GPU that has put a key for the most recently updated LoRA checkpoint and a destination inference worker who gets the tensors directly in GPU memory and then hot loads it into vLLM.

# Reload LORA weights from trainer GPU memory
dest = {name: torch.empty(info["shape"], dtype=torch.bfloat16), device="cuda")
                for name, info in metadata.items()}

kt.get(key=key, dest=dest, verbose=True)
self.load_lora_from_tensors(dest, new_version=new_version)

As well, each actor can launch and call other actors directly, such as this basic example of launching a code evaluation sandbox.

class InferenceActor: 
...

async def launch_sandbox(self):
        """Start 0-30 of sandboxes as actors to test code for reward"""
        cpus = kt.Compute(
            cpus="0.25",
            image=kt.Image("ghcr.io/whatever").pip_install(["swe-rex"),
        ).autoscale(max_scale=30)
        self.agent = await kt.cls(CodeSandbox).to_async(cpus)

Want to give it a try? We’d love feedback and collaboration as we all set out to do significantly more RL.

TLDR; Why Should I Use Kubetorch

Actors are great because they give you imperative control over heterogeneous program execution, which is nice for training and batch workloads and essential for RL training. Pythonic interfaces into compute are great because ML engineers don't want to wrestle with Docker and YAML. Until Kubetorch, the only real way to Pythonically launch and interact with actors on Kubernetes was by launching a Ray cluster with KubeRay and working there.

But Kubernetes is great because it has a rich ecosystem of tooling and robust primitives that have been battle-tested at scale. It's familiar to many platform engineers, and you probably already do a lot of your regular deployment or ML work on Kubernetes. Having to effectively leave the Kubernetes ecosystem and migrate to Ray to access actors is far too disruptive.

Kubetorch bridges this gap. Your ML platform now simply builds on top of your regular Kubernetes clusters with standard primitives (and keeps your platform engineers happy), using the queuing, quota, management, observability, etc., tooling you want. At the same time, the ML dev can write powerful distributed programs in regular code and wield simple Pythonic APIs to interact with heterogeneous resources.

Learn More

A simple example: GitHub
Get access to 1-click cluster: https://www.run.house/kubetorch/get-started
Just install it: https://github.com/run-house/kubetorch

Stay up to speed 🏃‍♀️📩

Subscribe to our newsletter to receive updates about upcoming Runhouse features and announcements.