Workflow Data Schema¶
The Workflow schema captures high-level metadata and structure around a Flowcept-enabled run, including user-defined information, system context, and artifact lineage.
Workflow Fields¶
workflow_id (str): Unique identifier for this workflow execution.
used (dict): Dictionary of data or resources that were used by the workflow. Typically includes model and dataset filepaths, and script global configs.
generated (dict): Dictionary of outputs generated by the workflow (e.g., model paths, key summarized results, artifacts).
parent_workflow_id (str): Identifier for the parent workflow, if applicable (e.g., for nested workflows or pipelines).
machine_info (dict): Information about the machine or compute resource where the workflow was executed.
conf (dict): Configuration for the run, typically contains the path to used flowcept settings file.
flowcept_settings (dict): Snapshot of Flowcept-specific resolved configuration from the settings.yaml file.
flowcept_version (str): Version of Flowcept used during execution.
utc_timestamp (float): Timestamp (UTC, seconds since epoch) indicating when this workflow metadata was recorded.
user (str): Username of the person or agent who initiated the workflow. Derived from sys_metadata.login_name if available; otherwise falls back to getpass.getuser(), os.getlogin(), or remains None.
campaign_id (str): Optional campaign identifier grouping related workflows.
adapter_id (str): The adapter or source component that launched or instrumented this workflow.
interceptor_ids (list of str): List of Flowcept interceptor instance identifiers used during instrumentation.
name (str): Human-readable name for the workflow (e.g., “training-run-001”).
subtype (str): Optional workflow subtype/category (e.g.,
ml_workflow).custom_metadata (dict): User-defined metadata for extended tagging or traceability.
environment_id (str): Identifier for the execution environment (e.g., cluster name, like Frontier or Summit)
sys_name (str): Name of the operating system. Derived from os.uname()[0].
node_name (str): Hostname of the compute node. Derived from os.uname()[1].
hostname (str): Fully qualified domain name of the host, resolved via socket.getfqdn() with multiple fallbacks.
public_ip (str): Public IP address of the machine. Derived from sys_metadata.public_ip, if available.
private_ip (str): Private (intranet) IP address of the machine. Derived from sys_metadata.private_ip, if available.
extra_metadata (str): Serialized extra metadata captured from external config sources.
Notes¶
The schema prioritizes system-derived metadata from a sys_metadata block inside the Flowcept configuration.
System identification is robust, relying on environment variables, standard libraries, and fallback mechanisms for portability.
used and generated fields support artifact lineage and can store references to any structured or semi-structured resource.