Telemetry Capture

Telemetry in Flowcept refers to runtime resource measurements (CPU, memory, disk, network, GPU, process info, etc.) collected alongside provenance. These measurements are crucial for performance characterization and for making provenance more actionable in scientific workflows.

Flowcept captures telemetry at the beginning and at the end of each provenance task, so you can correlate resource usage with inputs/outputs, status, timing, and hierarchy (parent/child tasks, loops, model layers, etc.).

  • Telemetry objects are represented by flowcept.commons.flowcept_dataclasses.telemetry.Telemetry.

  • Decorated tasks use flowcept.instrumentation.flowcept_task.flowcept_task() and store telemetry in telemetry_at_start / telemetry_at_end fields of the flowcept.commons.flowcept_dataclasses.task_object.TaskObject.

  • PyTorch instrumentation via flowcept.instrumentation.flowcept_torch.flowcept_torch() also records telemetry for model parent/child forwards depending on configuration.

Configuration (per-type toggles)

Telemetry capture is configured in your settings.yaml. Each telemetry type can be independently turned on/off.

telemetry_capture:  # Toggle each telemetry type
  gpu: ~            # ~ means None (disabled). To enable, provide a list (see GPU section below).
  cpu: true
  per_cpu: true
  process_info: true
  mem: true
  disk: true
  network: true
  machine_info: true

instrumentation:
  enabled: true
  torch:
    what: parent_and_children
    children_mode: telemetry_and_tensor_inspection
    epoch_loop: lightweight
    batch_loop: lightweight
    capture_epochs_at_every: 1
    register_workflow: true

Notes

  • If a type is false or ~, Flowcept skips collecting it.

  • GPU is special: enable it by providing a list of metrics (AMD and NVIDIA differ; see below).

How telemetry attaches to provenance

Every provenance task includes telemetry fields when enabled:

  • telemetry_at_start: collected just before the task runs

  • telemetry_at_end: collected immediately after the task finishes

Example with the task decorator

from flowcept import Flowcept
from flowcept.instrumentation.flowcept_task import flowcept_task

@flowcept_task(output_names="y")
def mult_two(x: int) -> int:
    return 2 * x

with Flowcept(workflow_name="demo"):
    y = mult_two(21)

# The persisted task will include: # - used/generated (inputs/outputs) # - status, started_at/ended_at # - telemetry_at_start / telemetry_at_end (if enabled)

Supported telemetry types

Flowcept uses the following libraries:

  • psutil for CPU/memory/disk/network/process-info

  • py-cpuinfo (cpuinfo) for CPU details in machine info

  • pynvml for NVIDIA GPU metrics

  • amdsmi (ROCm SMI Python) for AMD GPU metrics

Note

Many telemetry fields are platform-dependent. Most keys mirror psutil outputs, so availability and naming can vary by OS and environment. Some fields may be missing depending on what psutil and vendor drivers can report.

CPU / per-CPU

Keys (when enabled):

  • cpu.times_avg — average CPU time breakdown across all CPUs (psutil cpu_times).

  • cpu.percent_all — total CPU utilization percent (psutil cpu_percent).

  • cpu.frequency — current CPU frequency in MHz (psutil cpu_freq().current).

  • cpu.times_per_cpu — per-CPU time breakdown list (only if ``per_cpu: true``).

  • cpu.percent_per_cpu — per-CPU utilization percent list (only if ``per_cpu: true``).

cpu.times_avg and cpu.times_per_cpu include psutil CPU time fields (platform dependent), such as:

  • user — time spent in user mode.

  • system — time spent in kernel mode.

  • idle — time spent idle.

  • nice — time spent on low-priority processes (Unix).

  • iowait — time waiting for I/O (Unix).

  • irq — time servicing hardware interrupts (Linux).

  • softirq — time servicing software interrupts (Linux).

  • steal — involuntary wait time in virtualized environments.

  • guest — time running a guest OS (Linux).

  • guest_nice — guest time with low priority (Linux).

See psutil CPU docs for full field availability: https://psutil.readthedocs.io/en/latest/#cpu

Process info

Keys (subset, platform-dependent):

  • process.pid — OS process ID.

  • process.cpu_number — current CPU core index.

  • process.memory — process memory info (psutil memory_info).

  • process.memory_percent — percent of RAM used by the process.

  • process.cpu_times — process CPU time breakdown (psutil cpu_times).

  • process.cpu_percent — process CPU utilization percent.

  • process.io_counters — process I/O counters (if available).

  • process.num_connections — number of open network connections.

  • process.num_open_files — number of open file handles (where supported).

  • process.num_open_file_descriptors — number of open file descriptors (Unix).

  • process.num_threads — number of OS threads.

  • process.num_ctx_switches — voluntary and involuntary context switches.

  • process.executable — absolute path of the process executable.

  • process.cmd_line — command line arguments for the process.

process.memory includes psutil memory fields (platform dependent), such as:

  • rss — resident set size (non-swapped physical memory).

  • vms — virtual memory size.

  • shared — shared memory (Linux).

  • text — code segment size (Linux).

  • lib — shared library size (Linux).

  • data — data segment size (Linux).

  • dirty — dirty pages (Linux).

process.cpu_times includes psutil CPU time fields (platform dependent), such as:

  • user — time spent in user mode by this process.

  • system — time spent in kernel mode by this process.

  • children_user — user time for child processes.

  • children_system — system time for child processes.

  • iowait — I/O wait time (Linux).

process.io_counters includes psutil I/O fields (platform dependent), such as:

  • read_count — read syscalls.

  • write_count — write syscalls.

  • read_bytes — bytes read.

  • write_bytes — bytes written.

  • read_chars — bytes read at the OS level.

  • write_chars — bytes written at the OS level.

process.num_ctx_switches includes:

  • voluntary — voluntary context switches.

  • involuntary — involuntary context switches.

See psutil process docs for full field availability: https://psutil.readthedocs.io/en/latest/#process-class

Memory

Keys:

  • memory.virtual — host virtual memory snapshot (psutil virtual_memory).

  • memory.swap — host swap memory snapshot (psutil swap_memory).

memory.virtual includes psutil memory fields (platform dependent), such as:

  • total — total physical memory.

  • available — available memory for new processes.

  • percent — percent used.

  • used — memory in use.

  • free — memory not used.

  • active — memory in active use.

  • inactive — memory not recently used.

  • buffers — buffers used by the OS (Linux).

  • cached — cached files/pages (Linux).

  • shared — memory shared across processes (Linux).

  • slab — kernel slab memory (Linux).

memory.swap includes psutil swap fields (platform dependent), such as:

  • total — total swap space.

  • used — used swap space.

  • free — free swap space.

  • percent — percent swap used.

  • sin — bytes swapped in.

  • sout — bytes swapped out.

See psutil memory docs for full field availability: https://psutil.readthedocs.io/en/latest/#memory

Disk

Keys:

  • disk.disk_usage — filesystem usage for / (psutil disk_usage).

  • disk.io_sum — aggregated disk I/O counters (psutil disk_io_counters(perdisk=False)).

  • disk.io_per_disk — per-device disk I/O counters (psutil disk_io_counters(perdisk=True)).

disk.disk_usage includes psutil disk usage fields:

  • total — total space in bytes.

  • used — used space in bytes.

  • free — free space in bytes.

  • percent — percent used.

disk.io_sum and disk.io_per_disk include psutil disk I/O fields (platform dependent), such as:

  • read_count — reads completed.

  • write_count — writes completed.

  • read_bytes — bytes read.

  • write_bytes — bytes written.

  • read_time — time spent reading (ms).

  • write_time — time spent writing (ms).

  • read_merged — merged reads (Linux).

  • write_merged — merged writes (Linux).

  • busy_time — time spent doing I/O (ms, Linux).

See psutil disk docs for full field availability: https://psutil.readthedocs.io/en/latest/#disks

Network

Keys:

  • network.netio_sum — aggregated network I/O counters (psutil net_io_counters(pernic=False)).

  • network.netio_per_interface — per-interface I/O counters (psutil net_io_counters(pernic=True)).

network.netio_sum and network.netio_per_interface include psutil network fields (platform dependent), such as:

  • bytes_sent — bytes sent.

  • bytes_recv — bytes received.

  • packets_sent — packets sent.

  • packets_recv — packets received.

  • errin — inbound errors.

  • errout — outbound errors.

  • dropin — inbound drops.

  • dropout — outbound drops.

See psutil network docs for full field availability: https://psutil.readthedocs.io/en/latest/#network

Machine info (snapshot)

If machine_info: true, flowcept.instrumentation.telemetry.TelemetryCapture.capture_machine_info() returns a snapshot with:

  • platform info (platform.uname), CPU info (cpuinfo), environment variables

  • memory (virtual/swap), disk usage, NIC addresses

  • hostname (HOSTNAME), login name (LOGIN_NAME)

  • process info (same structure as above)

  • optional GPU block (if GPU telemetry is on)

platform includes:

  • system — OS name (e.g., Linux, Darwin, Windows).

  • node — network name (hostname).

  • release — OS release version.

  • version — OS version string.

  • machine — machine type (e.g., x86_64).

  • processor — CPU identifier string.

network is derived from psutil.net_if_addrs and includes:

  • family — address family (AF_INET, AF_INET6, etc.).

  • address — IP or MAC address.

  • netmask — netmask.

  • broadcast — broadcast address (if any).

  • ptp — point-to-point address (if any).

cpu is the raw dict returned by cpuinfo.get_cpu_info (py-cpuinfo). See: https://py-cpuinfo.readthedocs.io/en/latest/

See psutil network address docs for full field availability: https://psutil.readthedocs.io/en/latest/#psutil.net_if_addrs

GPU telemetry

Enable GPU by setting telemetry_capture.gpu to a list of metrics. Flowcept will try AMD first, then NVIDIA:

  • AMD visibility via ROCR_VISIBLE_DEVICES

  • NVIDIA visibility via CUDA_VISIBLE_DEVICES or NVML detection

Common behavior:

  • Flowcept enumerates visible GPUs and collects metrics per device: gpu.gpu_0, gpu.gpu_1, …

  • Which fields are collected depends on vendor and your configured metric list.

AMD (ROCm SMI)

Supported metric names (choose any subset in the list):

  • used — VRAM usage for the device (amdsmi_get_gpu_memory_usage).

  • activity — current GPU activity percent (amdsmi_get_gpu_activity).

  • power.average_socket_power — average socket power draw.

  • power.energy_accumulator — cumulative energy use.

  • temperature.edge — edge temperature.

  • temperature.hotspot — hotspot temperature.

  • temperature.mem — memory temperature.

  • temperature.vrgfx — VR graphics temperature.

  • temperature.vrmem — VR memory temperature.

  • temperature.hbm — HBM temperature.

  • temperature.fan_speed — current fan speed.

  • others.current_gfxclk — current graphics clock.

  • others.current_socclk — current SoC clock.

  • others.current_uclk — current memory clock.

  • others.current_vclk0 — current video clock.

  • others.current_dclk0 — current display clock.

  • id — device UUID.

  • gpu_ix — device index (added by Flowcept for AMD).

Example (enable AMD GPU capture):

telemetry_capture:
  gpu: ["used", "activity", "power", "temperature", "id"]

NVIDIA (NVML)

Supported metric names (choose any subset in the list):

  • used — device memory used in bytes (nvmlDeviceGetMemoryInfo).

  • temperature — GPU temperature in Celsius (nvmlDeviceGetTemperature).

  • power — power usage in milliwatts (nvmlDeviceGetPowerUsage).

  • name — device name (nvmlDeviceGetName).

  • id — device UUID (nvmlDeviceGetUUID).

Example (enable NVIDIA GPU capture):

telemetry_capture:
  gpu: ["used", "temperature", "power", "name", "id"]

PyTorch model telemetry

Use flowcept.instrumentation.flowcept_torch.flowcept_torch() to instrument a torch.nn.Module:

  • Parent module forward can record telemetry and tensor inspections depending on config.

  • Child modules (layers) can also record telemetry/tensors when what: parent_and_children and an appropriate children_mode are set.

  • Flowcept can create epoch and batch loop tasks (lightweight or default), maintaining parent/child IDs so all forward calls are linked.

Configuration

instrumentation:
  enabled: true
  torch:
    what: parent_and_children                # or "parent_only"
    children_mode: telemetry_and_tensor_inspection  # "telemetry", "tensor_inspection", or both
    epoch_loop: lightweight                  # or default / ~ (disable)
    batch_loop: lightweight                  # or default / ~ (disable)
    capture_epochs_at_every: 1               # capture every N epochs
    register_workflow: true                  # save model as a workflow

Minimal example

import torch
import torch.nn as nn
from flowcept import Flowcept
from flowcept.instrumentation.flowcept_torch import flowcept_torch

@flowcept_torch
class MyNet(nn.Module):
    def __init__(self, **kwargs):
        super().__init__()
        self.fc = nn.Linear(10, 1)

    def forward(self, x):
        return self.fc(x)

x = torch.randn(8, 10)
model = MyNet(get_profile=True)   # optional: profile model (params, widths, modules)

with Flowcept(workflow_name="torch_demo"):
    y = model(x)                   # parent forward + (optionally) child forwards recorded
                                   # telemetry recorded per config

What gets stored

  • Parent/child forward tasks include: - subtype (e.g., parent_forward or child_forward) - parent_task_id linkage - optional tensor inspections (shape, device, nbytes, density) - telemetry_at_end (if telemetry is enabled)

  • Optional workflow registration for the model with profile (params, max width, module tree).

Direct access to Telemetry objects

If you need to call the capture API yourself:

from flowcept.instrumentation.telemetry import TelemetryCapture
tel = TelemetryCapture().capture()
if tel:
    print(tel.to_dict())  # same structure stored in tasks

Practical tips

  • Turn off types you don’t need; telemetry can add overhead on very tight loops.

  • GPU capture requires vendor libraries: - AMD: amdsmi (ROCm SMI Python) - NVIDIA: pynvml

  • Use environment variables to control visible devices: - ROCR_VISIBLE_DEVICES (AMD) - CUDA_VISIBLE_DEVICES (NVIDIA)

  • For PyTorch large models, prefer children_mode: telemetry if tensor inspection is too heavy; or use epoch_loop: lightweight + batch_loop: lightweight to keep loop overhead minimal.

Reference

  • Telemetry container: flowcept.commons.flowcept_dataclasses.telemetry.Telemetry

  • Task decorator: flowcept.instrumentation.flowcept_task.flowcept_task()

  • PyTorch decorator: flowcept.instrumentation.flowcept_torch.flowcept_torch()

  • Telemetry capture impl: flowcept.instrumentation.telemetry.TelemetryCapture