Workflow Data Schema

The Workflow schema captures high-level metadata and structure around a Flowcept-enabled run, including user-defined information, system context, and artifact lineage.

Workflow Fields

  • workflow_id (str): Unique identifier for this workflow execution.

  • used (dict): Dictionary of data or resources that were used by the workflow. Typically includes model and dataset filepaths, and script global configs.

  • generated (dict): Dictionary of outputs generated by the workflow (e.g., model paths, key summarized results, artifacts).

  • parent_workflow_id (str): Identifier for the parent workflow, if applicable (e.g., for nested workflows or pipelines).

  • machine_info (dict): Information about the machine or compute resource where the workflow was executed.

  • conf (dict): Configuration for the run, typically contains the path to used flowcept settings file.

  • flowcept_settings (dict): Snapshot of Flowcept-specific resolved configuration from the settings.yaml file.

  • flowcept_version (str): Version of Flowcept used during execution.

  • utc_timestamp (float): Timestamp (UTC, seconds since epoch) indicating when this workflow metadata was recorded.

  • user (str): Username of the person or agent who initiated the workflow. Derived from sys_metadata.login_name if available; otherwise falls back to getpass.getuser(), os.getlogin(), or remains None.

  • campaign_id (str): Optional campaign identifier grouping related workflows.

  • adapter_id (str): The adapter or source component that launched or instrumented this workflow.

  • interceptor_ids (list of str): List of Flowcept interceptor instance identifiers used during instrumentation.

  • name (str): Human-readable name for the workflow (e.g., “training-run-001”).

  • subtype (str): Optional workflow subtype/category (e.g., ml_workflow).

  • custom_metadata (dict): User-defined metadata for extended tagging or traceability.

  • environment_id (str): Identifier for the execution environment (e.g., cluster name, like Frontier or Summit)

  • sys_name (str): Name of the operating system. Derived from os.uname()[0].

  • node_name (str): Hostname of the compute node. Derived from os.uname()[1].

  • hostname (str): Fully qualified domain name of the host, resolved via socket.getfqdn() with multiple fallbacks.

  • public_ip (str): Public IP address of the machine. Derived from sys_metadata.public_ip, if available.

  • private_ip (str): Private (intranet) IP address of the machine. Derived from sys_metadata.private_ip, if available.

  • extra_metadata (str): Serialized extra metadata captured from external config sources.

Notes

  • The schema prioritizes system-derived metadata from a sys_metadata block inside the Flowcept configuration.

  • System identification is robust, relying on environment variables, standard libraries, and fallback mechanisms for portability.

  • used and generated fields support artifact lineage and can store references to any structured or semi-structured resource.