Reporting

Flowcept can generate summarized reports from provenance records.

Current report implementations:

  • report_type="provenance_card" with format="markdown" (default)

  • report_type="provenance_report" with format="pdf" (executive PDF with plots)

API

Use:

from flowcept import Flowcept

# Default path: markdown provenance card
Flowcept.generate_report(
    report_type="provenance_card",
    format="markdown",
    output_path="PROVENANCE_CARD.md",
    records=my_records,  # or input_jsonl_path=..., or workflow_id/campaign_id
)

Markdown Provenance Cards (Default)

Markdown provenance cards are the default reporting mode.

from flowcept import Flowcept

# 1) Generate from workflow_id (DB-backed mode)
Flowcept.generate_report(
    report_type="provenance_card",
    format="markdown",
    workflow_id="20c5939f-f3ee-4031-9303-a9e68a5a8092",
    output_path="PROVENANCE_CARD.md",
)

# 2) Generate from in-memory records
Flowcept.generate_report(
    report_type="provenance_card",
    format="markdown",
    records=my_records,
    output_path="PROVENANCE_CARD_FROM_RECORDS.md",
)

# 3) Generate from Flowcept JSONL buffer
Flowcept.generate_report(
    report_type="provenance_card",
    format="markdown",
    input_jsonl_path="/tmp/flowcept_buffer.jsonl",
    output_path="PROVENANCE_CARD_FROM_JSONL.md",
)

Render Markdown Directly in Terminal (Rich)

You can optionally print the generated markdown report in a rich terminal:

from flowcept import Flowcept

Flowcept.generate_report(
    report_type="provenance_card",
    format="markdown",
    records=my_records,
    output_path="PROVENANCE_CARD.md",
    print_markdown=True,
)

If Rich is not installed and print_markdown=True, Flowcept raises an error. Install Rich via:

pip install flowcept["extras"]

Input Modes

Exactly one input mode must be provided:

  • input_jsonl_path: read from a Flowcept JSONL buffer file.

  • records: list of dictionaries already loaded in memory.

  • workflow_id or campaign_id: query workflow, task, and object documents from DB.

Aggregation

The provenance card is summarized, not raw-dump oriented.

  • Grouping key: activity_id.

  • Per-group summary includes: - number of task records aggregated (n_tasks) - status counts - timing aggregates (median/summary fields)

This aggregation method is written in generated output under Aggregation Method.

Object Metadata Summary

When objects are present, reports include metadata-only summaries:

  • counts by type

  • counts by storage mode (in_object vs gridfs)

  • linkage counts (task/workflow-linked)

  • object version and size summaries

Blob payload bytes are excluded from report rendering.

Real Example (Rendered in RST)

Below is a real example equivalent to generated markdown content for: Workflow Provenance Card: Perceptron GridSearch.

Summary

  • Workflow Name: Perceptron GridSearch

  • Workflow ID: 20c5939f-f3ee-4031-9303-a9e68a5a8092

  • Campaign ID: 661344de-ddf4-497d-a5ba-0d01c67cfb79

  • Execution Start (UTC): 2026-02-19 05:05:10

  • Execution End (UTC): 2026-02-19 05:05:12

  • Total Elapsed (s): 1.501

  • User: rsr

  • System Name: Darwin

  • Environment ID: laptop

  • Workflow Subtype: ml_workflow

  • Code Repository: branch=skills, short_sha=f3df676, dirty=dirty

  • Git Remote: git@github.com:ORNL/flowcept.git

  • Workflow args:

    • python_random_seeded: True

    • seed: 42

    • torch_cuda_manual_seeded: False

    • torch_cudnn_benchmark: False

    • torch_cudnn_deterministic: True

    • torch_deterministic_algorithms: True

    • torch_manual_seeded: True

Workflow-level Summary

  • Total Activities: 3

  • Status Counts: {'FINISHED': 7}

  • Total Elapsed Workflow Time (s): 1.501

    • train_and_validate: 0.088 s

    • get_dataset: 0.056 s

    • select_best_model: 0.041 s

  • Resource Totals:

    • Memory Used: 7.78 MB

    • Average CPU (%): 54.1%

    • IO:

      • Read: 38.49 MB

      • Write: 55.11 MB

      • Read Ops: 1,454

      • Write Ops: 155

  • Key Observations:

    • Slowest Activity: train_and_validate at 0.088 s

    • Largest IO Activity: train_and_validate with Read 31.74 MB and Write 52.10 MB

Workflow Structure

input data
        │
        ▼
 get_dataset
        │
 train_and_validate
        │
 select_best_model
        ▼
 output data

Timing Report

Rows are sorted by First Started At (ascending).

Activity

Status Counts

First Started At

Last Ended At

Median Elapsed (s)

get_dataset

{‘FINISHED’: 1}

2026-02-19 05:05:10

2026-02-19 05:05:10

0.056

train_and_validate

{‘FINISHED’: 5}

2026-02-19 05:05:10

2026-02-19 05:05:12

0.088

select_best_model

{‘FINISHED’: 1}

2026-02-19 05:05:12

2026-02-19 05:05:12

0.041

Per Activity Details

  • get_dataset (subtype=``dataprep``)

    • Used:

      • n_samples: 120

      • split_ratio: 0.8

    • Generated:

      • dataset_id: f1e918cc-a3eb-4dd8-8036-5f6e4fc140d1

      • x_train_shape: [96, 2]

      • x_val_shape: [24, 2]

      • y_train_shape: [96, 1]

      • y_val_shape: [24, 1]

  • train_and_validate (n=5, subtype=``learning``)

    • Used (aggregated): includes epochs, learning_rate, n_input_neurons, config_id, and other fields.

    • Generated (aggregated): includes best_val_loss, val_loss, val_accuracy, and model object ids.

  • select_best_model (subtype=``model_selection``)

    • Generated:

      • selected_config_id: cfg_5

      • selected_loss: 0.0490574836730957

      • selected_model_object_id: ae18a739-1ffe-45a8-ae64-827a079579a6

Workflow-level Resource Usage

Metric

Value

Telemetry Samples (task start/end pairs)

7

CPU User Time Delta

7.380

CPU System Time Delta

1.940

Average CPU (%) Delta

54.1%

Average CPU Frequency

3,228

Memory Used Delta

7.78 MB

Average Memory (%)

73.7%

Average Swap (%)

90.0%

Disk Read Time Delta (ms)

224.000

Disk Write Time Delta (ms)

14.000

Disk Busy Time Delta (ms)

0.000

Object Artifacts Summary

Metric

Value

Total Objects

6

By Type

{‘dataset’: 1, ‘ml_model’: 5}

By Storage

{‘in_object’: 1, ‘gridfs’: 5}

Task-linked Objects

6

Workflow-linked Objects

6

Max Version

7

Total Size

13.66 KB

Average Size

2.28 KB

Max Size

4.10 KB

Object Details by Type

  • Datasets

    • f1e918cc-a3eb-4dd8-8036-5f6e4fc140d1

      • version: 0

      • storage: in_object

      • size: 4.10 KB

      • task_id: 1771477510.9383209

      • workflow_id: 20c5939f-f3ee-4031-9303-a9e68a5a8092

      • timestamp: 2026-02-19 05:05:10

      • sha256: 7d7b4be35ea11f66e9a785d1b39cfb8fc31f8fd23020bc74918872ab5855253c

  • Models

    • ae18a739-1ffe-45a8-ae64-827a079579a6

      • version: 7

      • storage: gridfs

      • size: 1.91 KB

      • tags: best

      • custom_metadata includes checkpoint_epoch, class, config_id, learning_rate, loss, and model_profile.

Aggregation Method

  • Grouping key: activity_id.

  • Each grouped row may aggregate multiple task records (n_tasks).

  • Aggregated metrics currently include count/status/timing.

Generator footer example:

  • Provenance card generated by Flowcept | GitHub | Version: 0.9.14 on Feb 19, 2026 at 12:05 AM EST

PDF Reports (Optional)

PDF reports are intended for executive-friendly rendering and include plots.

pip install flowcept[report_pdf]
from flowcept import Flowcept

# 1) Generate PDF from workflow_id (DB-backed mode)
stats = Flowcept.generate_report(
    report_type="provenance_report",
    format="pdf",
    workflow_id="5def1173-d417-420b-a7ed-61ada01772cd",
    output_path="PROVENANCE_REPORT.pdf",
)
print(stats["output"])

# 2) Generate PDF from in-memory records
Flowcept.generate_report(
    report_type="provenance_report",
    format="pdf",
    records=my_records,
    output_path="PROVENANCE_REPORT_FROM_RECORDS.pdf",
)

# 3) Generate PDF from a Flowcept JSONL file
Flowcept.generate_report(
    report_type="provenance_report",
    format="pdf",
    input_jsonl_path="/tmp/flowcept_buffer.jsonl",
    output_path="PROVENANCE_REPORT_FROM_JSONL.pdf",
)

PDF report plots include:

  • Top slowest activities

  • Top fastest activities

  • Most resource-demanding activities (IO)

  • Telemetry-aware charts when telemetry fields are available