Monitoring and Observability

Kubetorch supports real-time monitoring and observability for your running workloads. While executing a Kubetorch function or class method, both logs and/or metrics can be streamed back to your client.

Log Streaming

Kubetorch automatically captures:

  • Standard logging output
  • Anything printed to stdout/stderr

Configuration Options

Logs streaming is controlled using the stream_logs flag, and follows a strict precedence order (highest β†’ lowest):

(1) Workflow Call Argument

import kubetorch as kt remote_fn = kt.fn(my_function).to( compute=kt.Compute(cpus="0.5") ) result = remote_fn(*args, **kwargs, stream_logs=True)

(2) Environment Variable

Used only if no call argument was provided.

export KT_STREAM_LOGS="True" # or "False"

If the stream_logs argument is not provided (None), Kubetorch checks the KT_STREAM_LOGS environment variable.

(3) Kubetorch Config

Only used when neither the workflow argument nor environment variable is set. If not specified, will default to True.

kt config set stream_logs <value> # e.g., true or false

Metrics Streaming

Kubetorch can stream live resource metrics from your compute environment directly to your client.

If enabled, Kubetorch streams metrics which include:

  • CPU utilization
  • Memory usage
  • GPU utilization and memory (when applicable)

Configuration Options

Metrics streaming is controlled using the stream_metrics flag, and follows a strict precedence order (highest β†’ lowest):

(1) Workflow Call Argument

Using boolean values:

Passing stream_metrics=<value> overrides all other settings. If stream_metrics is not provided, its value is considered None.

import kubetorch as kt compute = kt.Compute(cpus="0.5") remote_fn = kt.fn(my_function).to(compute) # No argument -> relies on env/config/defaults (stream_metrics=None) result0 = remote_fn(*args, **kwargs) # Enable streaming with default behavior (30s interval, resource scope) result1 = remote_fn(*args, **kwargs, stream_metrics=True) # Disable metrics streaming result2 = remote_fn(*args, **kwargs, stream_metrics=False)

Custom configuration:

The kt.MetricsConfig class allows granular control over metrics collection:

FieldDescriptionDefault ValueSupported Values
IntervalTime between two consecutive metrics outputs, in seconds.30 secondsInteger (seconds)
ScopeMetrics aggregation level.resourceresource or pod

import kubetorch as kt compute = kt.Compute(cpus="0.5") remote_fn = kt.fn(my_function).to(compute) # Custom interval cfg_interval = kt.MetricsConfig(interval=35) result1 = remote_fn(*args, **kwargs, stream_metrics=cfg_interval) # Pod-level metrics cfg_pod = kt.MetricsConfig(scope="pod") result2 = remote_fn(*args, **kwargs, stream_metrics=cfg_pod) # Both interval + scope cfg_both = kt.MetricsConfig(interval=35, scope="pod") result3 = remote_fn(*args, **kwargs, stream_metrics=cfg_both)

(2) Environment Variable

Used only if no workflow call argument was provided.

export KT_STREAM_METRICS="True" # or "False"

If stream_metrics=None, Kubetorch checks the environment variable.

(3) Kubetorch Config

Only used when neither the workflow argument nor environment variable is set. If not specified, will default to True.

kt config set stream_metrics <value> # e.g., true or false