Serialization

When calling remote functions, Kubetorch serializes function arguments for transport over HTTP. You can control the serialization format using the serialization parameter.

Serialization Formats

Kubetorch supports two serialization formats: JSON (default) and Pickle.

JSON (Default)

JSON is the default and works well for most common Python types.

import kubetorch as kt

remote_fn = kt.fn(my_function).to(kt.Compute(cpus="0.5"))

# Default JSON serialization
result = remote_fn(arg1, arg2)

# Explicitly specify JSON
result = remote_fn(arg1, arg2, serialization="json")

Supported types:

Basic types: str, int, float, bool, None
Collections: list, dict, tuple (converted to list)
Nested combinations of the above

Limitations:

Cannot serialize custom Python objects
Cannot serialize functions, classes, or complex types
No support for circular references

Pickle

Pickle supports arbitrary Python objects, making it ideal for complex data structures. Data is pickled and then base64-encoded for HTTP transport.

import kubetorch as kt

remote_fn = kt.fn(my_function).to(kt.Compute(cpus="0.5"))

# Use pickle serialization
result = remote_fn(my_custom_object, serialization="pickle")

Supported types:

All JSON-supported types
Custom Python classes and objects
NumPy arrays, Pandas DataFrames
Most Python built-in types
Functions and lambda expressions (with limitations)

Limitations:

Larger payload size due to base64 encoding
Security considerations (see below)
Some objects may not be picklable (e.g., open file handles, database connections)

Dependency Version Mismatches

Warning: Pickle serialization can fail if the client and server have different versions of a library.

For example, if your local environment has numpy==1.26.0 and the remote has numpy==2.0.0, pickling a NumPy array may fail during deserialization due to incompatible internal representations.

To avoid this:

Keep local and remote dependencies in sync
Use JSON serialization when possible
Convert complex objects to simpler types before sending

Setting Serialization Mode

Per-Call

result = remote_fn(data, serialization="pickle")

On the Function

remote_fn.serialization = "pickle"

# All calls now use pickle
result = remote_fn(data)

# Override for a specific call
result = remote_fn(data, serialization="json")

Restricting Allowed Serialization

For security, you can restrict which serialization methods are allowed on a service.

At Compute Level

import kubetorch as kt

compute = kt.Compute(
    cpus="1",
    allowed_serialization=["json"],  # Only allow JSON
)

remote_fn = kt.fn(my_function).to(compute)

Via Environment Variable

# Allow only JSON (more secure)
export KT_ALLOWED_SERIALIZATION="json"

# Allow only pickle
export KT_ALLOWED_SERIALIZATION="pickle"

# Allow both
export KT_ALLOWED_SERIALIZATION="json,pickle"

Precedence

allowed_serialization on Compute (highest)
KT_ALLOWED_SERIALIZATION environment variable
Default: ["json"]

Error Handling

If a client attempts to use a disallowed serialization method, the server returns a 400 error:

HTTPException: Serialization format 'pickle' not allowed. Allowed formats: ['json']

Security Considerations

Pickle can execute arbitrary code during deserialization. Only use pickle when:

You trust the source of the serialized data
You're working in a controlled environment
You need to transfer complex Python objects

For security-sensitive applications, restrict to JSON-only:

compute = kt.Compute(cpus="1", allowed_serialization=["json"])

How It Works

When you call a remote function:

Client: Arguments are serialized using the specified format
Transport: Data is sent via HTTP POST
Server:
- Validates serialization method against allowed_serialization
- Deserializes arguments
- Executes your function
- Serializes the result
Client: Deserializes and returns the result

Examples

Sending NumPy Arrays

import kubetorch as kt
import numpy as np

def process_array(arr):
    return arr.sum()

remote_fn = kt.fn(process_array).to(kt.Compute(cpus="0.5"))

data = np.random.rand(1000, 1000)

# Must use pickle for NumPy arrays
result = remote_fn(data, serialization="pickle")

Sending Custom Objects

import kubetorch as kt
from dataclasses import dataclass

@dataclass
class TrainingConfig:
    epochs: int
    learning_rate: float
    batch_size: int

def train(config: TrainingConfig):
    return {"epochs": config.epochs, "lr": config.learning_rate}

remote_fn = kt.fn(train).to(kt.Compute(cpus="0.5"))

config = TrainingConfig(epochs=10, learning_rate=0.001, batch_size=32)

# Must use pickle for custom objects
result = remote_fn(config, serialization="pickle")

Converting to JSON-Compatible Format

If you want to avoid pickle, convert objects to dicts:

import kubetorch as kt
from dataclasses import dataclass, asdict

@dataclass
class TrainingConfig:
    epochs: int
    learning_rate: float

def train(config_dict: dict):
    return {"epochs": config_dict["epochs"]}

remote_fn = kt.fn(train).to(kt.Compute(cpus="0.5"))

config = TrainingConfig(epochs=10, learning_rate=0.001)

# Convert to dict for JSON serialization
result = remote_fn(asdict(config))  # Uses JSON (default)