Logs and Metrics
Kubetorch provides comprehensive logging and metrics capabilities for your remote workloads. You can view logs via the CLI, stream them in real-time during function calls, and monitor resource utilization metrics.
Viewing Logs with the CLI
The kt logs command lets you view logs for any running service:
# View logs from all pods kt logs my-service # Follow logs in real-time kt logs my-service -f # View logs from a specific pod (by index) kt logs my-service -p 0 # Tail the last N lines kt logs my-service -t 100 # Specify namespace kt logs my-service -n my-namespace
| Option | Description |
|---|---|
-f, --follow | Follow logs in real-time (like tail -f) |
-t, --tail | Number of lines to show (default: 100) |
-p, --pod | Specific pod name or index (0-based) |
-n, --namespace | Kubernetes namespace |
Log Streaming
Kubetorch can stream logs back to your client in real-time. This includes:
- Standard logging output from your code
- Anything printed to stdout/stderr
- Kubernetes events during service launch (pod scheduling, image pulling, container starting, etc.)
Log streaming happens in two contexts:
- During
.to()deployment - See K8s events and startup logs as your service launches - During function/method calls - See your application logs in real-time
Enabling Log Streaming
Log streaming is controlled via a precedence chain (highest to lowest):
1. Compute-Level Configuration
Set logging behavior for all calls to a service using LoggingConfig:
import kubetorch as kt compute = kt.Compute( cpus="0.5", logging_config=kt.LoggingConfig( stream_logs=True, level="info", ) ) remote_fn = kt.fn(my_function).to(compute) result = remote_fn(*args) # Logs will stream
2. Per-Call Override
Override the compute-level setting for individual calls:
# Disable streaming for this call only result = remote_fn(*args, stream_logs=False) # Enable streaming even if disabled at compute level result = remote_fn(*args, stream_logs=True)
3. Environment Variable
If not set at compute or call level, Kubetorch checks the environment:
export KT_STREAM_LOGS="True" # or "False"
4. Global Config
If none of the above are set, falls back to the global config (default: True):
kt config set stream_logs True
LoggingConfig Options
The kt.LoggingConfig class provides fine-grained control over logging behavior:
import kubetorch as kt logging_config = kt.LoggingConfig( stream_logs=True, # Enable log streaming level="info", # Log level: "debug", "info", "warning", "error" include_events=True, # Include K8s events during startup include_name=True, # Prepend service name to log lines grace_period=2.0, # Seconds to continue streaming after call completes ) compute = kt.Compute(cpus="0.5", logging_config=logging_config)
| Parameter | Type | Default | Description |
|---|---|---|---|
stream_logs | bool | None | Whether to stream logs. If None, falls back to global config |
level | str | "info" | Log level filter: "debug", "info", "warning", "error" |
include_events | bool | True | Include Kubernetes events (pod scheduling, image pulling, etc.) |
include_name | bool | True | Prepend pod/service name to each log line |
grace_period | float | 2.0 | Seconds to continue streaming after request completes |
Log Level Propagation
The log level you set is propagated to the remote compute, controlling which logs are emitted by the server.
You can set it via LoggingConfig.level or the KT_LOG_LEVEL environment variable:
# Set locally - automatically propagates to remote compute export KT_LOG_LEVEL="DEBUG"
The precedence for log level is:
LoggingConfig.level(if set)KT_LOG_LEVELenvironment variable- Default:
"INFO"
This means you can run your local script with KT_LOG_LEVEL=DEBUG python my_script.py and both your local
client and remote compute will use debug-level logging.
Metrics Streaming
Kubetorch can stream live resource metrics from your compute environment during function calls:
- CPU utilization
- Memory usage
- GPU utilization and memory (when applicable)
Enabling Metrics Streaming
Metrics streaming follows the same precedence as log streaming:
1. Per-Call Argument
import kubetorch as kt remote_fn = kt.fn(my_function).to(kt.Compute(cpus="0.5")) # Enable with defaults (30s interval, resource scope) result = remote_fn(*args, stream_metrics=True) # Disable metrics streaming result = remote_fn(*args, stream_metrics=False)
2. Custom Configuration with MetricsConfig
import kubetorch as kt # Custom interval result = remote_fn(*args, stream_metrics=kt.MetricsConfig(interval=10)) # Pod-level metrics (instead of aggregated resource metrics) result = remote_fn(*args, stream_metrics=kt.MetricsConfig(scope="pod")) # Both options result = remote_fn(*args, stream_metrics=kt.MetricsConfig(interval=10, scope="pod"))
| Parameter | Type | Default | Description |
|---|---|---|---|
interval | int | 30 | Seconds between metrics outputs |
scope | str | "resource" | Aggregation level: "resource" (service-wide) or "pod" (per-pod) |
3. Environment Variable
export KT_STREAM_METRICS="True" # or "False"
4. Global Config
kt config set stream_metrics True
Example: Full Observability Setup
import kubetorch as kt # Configure comprehensive logging logging_config = kt.LoggingConfig( stream_logs=True, level="debug", # Show all logs including debug include_events=True, # Show K8s events during startup include_name=True, # Prefix logs with service name ) compute = kt.Compute( cpus="2", gpus="1", logging_config=logging_config, ) remote_fn = kt.fn(train_model).to(compute) # Run with both log and metrics streaming result = remote_fn( epochs=10, stream_metrics=kt.MetricsConfig(interval=15, scope="pod"), )
This will show:
- All debug/info/warning/error logs from your training function
- Kubernetes events as the pod starts up
- CPU, memory, and GPU metrics every 15 seconds per pod