Monitoring and Observability
Kubetorch supports real-time monitoring and observability for your running workloads. While executing a Kubetorch function or class method, both logs and/or metrics can be streamed back to your client.
Log Streaming
Kubetorch automatically captures:
- Standard logging output
- Anything printed to stdout/stderr
Configuration Options
Logs streaming is controlled using the stream_logs flag, and follows a strict precedence order (highest β lowest):
(1) Workflow Call Argument
import kubetorch as kt remote_fn = kt.fn(my_function).to( compute=kt.Compute(cpus="0.5") ) result = remote_fn(*args, **kwargs, stream_logs=True)
(2) Environment Variable
Used only if no call argument was provided.
export KT_STREAM_LOGS="True" # or "False"
If the stream_logs argument is not provided (None), Kubetorch checks the KT_STREAM_LOGS environment variable.
(3) Kubetorch Config
Only used when neither the workflow argument nor environment variable is set. If not specified, will default to
True.
kt config set stream_logs <value> # e.g., true or false
Metrics Streaming
Kubetorch can stream live resource metrics from your compute environment directly to your client.
If enabled, Kubetorch streams metrics which include:
- CPU utilization
- Memory usage
- GPU utilization and memory (when applicable)
Configuration Options
Metrics streaming is controlled using the stream_metrics flag, and follows a strict precedence order (highest β lowest):
(1) Workflow Call Argument
Using boolean values:
Passing stream_metrics=<value> overrides all other settings.
If stream_metrics is not provided, its value is considered None.
import kubetorch as kt compute = kt.Compute(cpus="0.5") remote_fn = kt.fn(my_function).to(compute) # No argument -> relies on env/config/defaults (stream_metrics=None) result0 = remote_fn(*args, **kwargs) # Enable streaming with default behavior (30s interval, resource scope) result1 = remote_fn(*args, **kwargs, stream_metrics=True) # Disable metrics streaming result2 = remote_fn(*args, **kwargs, stream_metrics=False)
Custom configuration:
The kt.MetricsConfig class allows granular control over metrics collection:
| Field | Description | Default Value | Supported Values |
|---|---|---|---|
| Interval | Time between two consecutive metrics outputs, in seconds. | 30 seconds | Integer (seconds) |
| Scope | Metrics aggregation level. | resource | resource or pod |
import kubetorch as kt compute = kt.Compute(cpus="0.5") remote_fn = kt.fn(my_function).to(compute) # Custom interval cfg_interval = kt.MetricsConfig(interval=35) result1 = remote_fn(*args, **kwargs, stream_metrics=cfg_interval) # Pod-level metrics cfg_pod = kt.MetricsConfig(scope="pod") result2 = remote_fn(*args, **kwargs, stream_metrics=cfg_pod) # Both interval + scope cfg_both = kt.MetricsConfig(interval=35, scope="pod") result3 = remote_fn(*args, **kwargs, stream_metrics=cfg_both)
(2) Environment Variable
Used only if no workflow call argument was provided.
export KT_STREAM_METRICS="True" # or "False"
If stream_metrics=None, Kubetorch checks the environment variable.
(3) Kubetorch Config
Only used when neither the workflow argument nor environment variable is set. If not specified, will default to
True.
kt config set stream_metrics <value> # e.g., true or false