Debugging Remote Workflows

Kubetorch includes full support for interactive remote debugging via Python’s built-in pdb debugger. Breakpoints can be placed anywhere inside your Kubetorch functions or class methods, including within deeply nested or distributed code running in Kubernetes. When a breakpoint is hit, you can inspect variables, step through code, and evaluate expressions with the same fidelity as local debugging.

Remote debugging works by launching a lightweight debug server inside your running pod. You can connect to this server using the Kubetorch CLI or any tool capable of reaching the forwarded debug port.

Enabling Breakpoints in Remote Code

You can enable debugging in three ways:

Standard Python Breakpoint

Simply place a breakpoint() inside your remote function or class method:

def my_fn(*args, **kwargs): print("before") breakpoint() # Execution will pause here print("after")

Kubetorch Deep Breakpoint

Similar to Python's built-in breakpoint(), the Kubetorch deep_breakpoint() can be used inside distributed code. For SPMD-style distributed code like PyTorch, be sure to only call this from one process (e.g. the rank 0 process) to avoid blocking all processes in the distributed group.

import kubetorch as kt def my_fn(*args, **kwargs): print("before") kt.deep_breakpoint() # Execution will pause here print("after")

Function Level Debugging

You can enable debugging without modifying your code by passing pdb=True when executing any Kubetorch function or class method:

import kubetorch as kt remote_fn = my_fn.to(compute) res = remote_fn(*args, **kwargs, pdb=True)

This pauses execution at the very beginning of your remote function, before any of your code runs. It’s ideal for inspecting arguments, the execution environment, or debugging initialization logic.

Running Remote Code in Debug Mode

When the remote code hits a breakpoint, the pod will pause execution and print debugging instructions directly into the pod logs. These logs include the exact kt debug command you should run locally to attach to the remote debug server.

For example, your pod logs may include a message like:

(main-data-preproc-6f6bcb7fd9-4nrrv) Distributed breakpoint activated. To attach a debugger, run the following command: (main-data-preproc-6f6bcb7fd9-4nrrv) kt debug main-data-preproc-v2-6f6bcb7fd9-4nrrv --port 5678 --namespace kubetorch

Copy and run that command on your laptop to open the Web-PDB interface and begin debugging.

Opening the Debugging Session (CLI)

kt debug <pod-name>

This command:

  • Port-forwards the debug server from inside the pod to your local machine
  • Opens a browser window pointing to the Web-PDB UI (http://localhost:<port>)
  • Keeps the session open until you press Ctrl+C

Once connected, you get:

  • A full interactive PDB shell
  • Source code view
  • Stack inspection
  • Variable inspection (p, pp, locals, globals)
  • Stepping (n, s, c, etc.)
  • Expression evaluation
  • Breakpoint management

Everything works exactly like local PDB.

Debugging Session

Once a breakpoint is hit and the UI opens, you can:

Inspect Variables

p my_var pp long_json_struct locals() globals()

Step through code

n # next line s # step into c # continue

Evaluate code dynamically

!print([x for x in range(10)])

Look at stack frames

where up down

Add additional breakpoints

b file.py:42

Everything behaves exactly like normal PDB, but inside a running Kubernetes pod.