Serialization
When calling remote functions, Kubetorch serializes function arguments for transport over HTTP.
You can control the serialization format using the serialization parameter.
Serialization Formats
Kubetorch supports two serialization formats: JSON (default) and Pickle.
JSON (Default)
JSON is the default and works well for most common Python types.
import kubetorch as kt remote_fn = kt.fn(my_function).to(kt.Compute(cpus="0.5")) # Default JSON serialization result = remote_fn(arg1, arg2) # Explicitly specify JSON result = remote_fn(arg1, arg2, serialization="json")
Supported types:
- Basic types:
str,int,float,bool,None - Collections:
list,dict,tuple(converted to list) - Nested combinations of the above
Limitations:
- Cannot serialize custom Python objects
- Cannot serialize functions, classes, or complex types
- No support for circular references
Pickle
Pickle supports arbitrary Python objects, making it ideal for complex data structures. Data is pickled and then base64-encoded for HTTP transport.
import kubetorch as kt remote_fn = kt.fn(my_function).to(kt.Compute(cpus="0.5")) # Use pickle serialization result = remote_fn(my_custom_object, serialization="pickle")
Supported types:
- All JSON-supported types
- Custom Python classes and objects
- NumPy arrays, Pandas DataFrames
- Most Python built-in types
- Functions and lambda expressions (with limitations)
Limitations:
- Larger payload size due to base64 encoding
- Security considerations (see below)
- Some objects may not be picklable (e.g., open file handles, database connections)
Dependency Version Mismatches
Warning: Pickle serialization can fail if the client and server have different versions of a library.
For example, if your local environment has numpy==1.26.0 and the remote has numpy==2.0.0,
pickling a NumPy array may fail during deserialization due to incompatible internal representations.
To avoid this:
- Keep local and remote dependencies in sync
- Use JSON serialization when possible
- Convert complex objects to simpler types before sending
Setting Serialization Mode
Per-Call
result = remote_fn(data, serialization="pickle")
On the Function
remote_fn.serialization = "pickle" # All calls now use pickle result = remote_fn(data) # Override for a specific call result = remote_fn(data, serialization="json")
Restricting Allowed Serialization
For security, you can restrict which serialization methods are allowed on a service.
At Compute Level
import kubetorch as kt compute = kt.Compute( cpus="1", allowed_serialization=["json"], # Only allow JSON ) remote_fn = kt.fn(my_function).to(compute)
Via Environment Variable
# Allow only JSON (more secure) export KT_ALLOWED_SERIALIZATION="json" # Allow only pickle export KT_ALLOWED_SERIALIZATION="pickle" # Allow both export KT_ALLOWED_SERIALIZATION="json,pickle"
Precedence
allowed_serializationon Compute (highest)KT_ALLOWED_SERIALIZATIONenvironment variable- Default:
["json"]
Error Handling
If a client attempts to use a disallowed serialization method, the server returns a 400 error:
HTTPException: Serialization format 'pickle' not allowed. Allowed formats: ['json']
Security Considerations
Pickle can execute arbitrary code during deserialization. Only use pickle when:
- You trust the source of the serialized data
- You're working in a controlled environment
- You need to transfer complex Python objects
For security-sensitive applications, restrict to JSON-only:
compute = kt.Compute(cpus="1", allowed_serialization=["json"])
How It Works
When you call a remote function:
- Client: Arguments are serialized using the specified format
- Transport: Data is sent via HTTP POST
- Server:
- Validates serialization method against
allowed_serialization - Deserializes arguments
- Executes your function
- Serializes the result
- Validates serialization method against
- Client: Deserializes and returns the result
Examples
Sending NumPy Arrays
import kubetorch as kt import numpy as np def process_array(arr): return arr.sum() remote_fn = kt.fn(process_array).to(kt.Compute(cpus="0.5")) data = np.random.rand(1000, 1000) # Must use pickle for NumPy arrays result = remote_fn(data, serialization="pickle")
Sending Custom Objects
import kubetorch as kt from dataclasses import dataclass @dataclass class TrainingConfig: epochs: int learning_rate: float batch_size: int def train(config: TrainingConfig): return {"epochs": config.epochs, "lr": config.learning_rate} remote_fn = kt.fn(train).to(kt.Compute(cpus="0.5")) config = TrainingConfig(epochs=10, learning_rate=0.001, batch_size=32) # Must use pickle for custom objects result = remote_fn(config, serialization="pickle")
Converting to JSON-Compatible Format
If you want to avoid pickle, convert objects to dicts:
import kubetorch as kt from dataclasses import dataclass, asdict @dataclass class TrainingConfig: epochs: int learning_rate: float def train(config_dict: dict): return {"epochs": config_dict["epochs"]} remote_fn = kt.fn(train).to(kt.Compute(cpus="0.5")) config = TrainingConfig(epochs=10, learning_rate=0.001) # Convert to dict for JSON serialization result = remote_fn(asdict(config)) # Uses JSON (default)