Reporting¶
Flowcept can generate summarized reports from provenance records.
Current report implementations:
report_type="provenance_card"withformat="markdown"(default)report_type="provenance_report"withformat="pdf"(executive PDF with plots)
API¶
Use:
from flowcept import Flowcept
# Default path: markdown provenance card
Flowcept.generate_report(
report_type="provenance_card",
format="markdown",
output_path="PROVENANCE_CARD.md",
records=my_records, # or input_jsonl_path=..., or workflow_id/campaign_id
)
Markdown Provenance Cards (Default)¶
Markdown provenance cards are the default reporting mode.
from flowcept import Flowcept
# 1) Generate from workflow_id (DB-backed mode)
Flowcept.generate_report(
report_type="provenance_card",
format="markdown",
workflow_id="20c5939f-f3ee-4031-9303-a9e68a5a8092",
output_path="PROVENANCE_CARD.md",
)
# 2) Generate from in-memory records
Flowcept.generate_report(
report_type="provenance_card",
format="markdown",
records=my_records,
output_path="PROVENANCE_CARD_FROM_RECORDS.md",
)
# 3) Generate from Flowcept JSONL buffer
Flowcept.generate_report(
report_type="provenance_card",
format="markdown",
input_jsonl_path="/tmp/flowcept_buffer.jsonl",
output_path="PROVENANCE_CARD_FROM_JSONL.md",
)
Render Markdown Directly in Terminal (Rich)¶
You can optionally print the generated markdown report in a rich terminal:
from flowcept import Flowcept
Flowcept.generate_report(
report_type="provenance_card",
format="markdown",
records=my_records,
output_path="PROVENANCE_CARD.md",
print_markdown=True,
)
If Rich is not installed and print_markdown=True, Flowcept raises an error.
Install Rich via:
pip install flowcept["extras"]
Input Modes¶
Exactly one input mode must be provided:
input_jsonl_path: read from a Flowcept JSONL buffer file.records: list of dictionaries already loaded in memory.workflow_idorcampaign_id: query workflow, task, and object documents from DB.
Aggregation¶
The provenance card is summarized, not raw-dump oriented.
Grouping key:
activity_id.Per-group summary includes: - number of task records aggregated (
n_tasks) - status counts - timing aggregates (median/summary fields)
This aggregation method is written in generated output under Aggregation Method.
Object Metadata Summary¶
When objects are present, reports include metadata-only summaries:
counts by type
counts by storage mode (
in_objectvsgridfs)linkage counts (task/workflow-linked)
object version and size summaries
Blob payload bytes are excluded from report rendering.
Real Example (Rendered in RST)¶
Below is a real example equivalent to generated markdown content for:
Workflow Provenance Card: Perceptron GridSearch.
Summary¶
Workflow Name:
Perceptron GridSearchWorkflow ID:
20c5939f-f3ee-4031-9303-a9e68a5a8092Campaign ID:
661344de-ddf4-497d-a5ba-0d01c67cfb79Execution Start (UTC):
2026-02-19 05:05:10Execution End (UTC):
2026-02-19 05:05:12Total Elapsed (s):
1.501User:
rsrSystem Name:
DarwinEnvironment ID:
laptopWorkflow Subtype:
ml_workflowCode Repository:
branch=skills, short_sha=f3df676, dirty=dirtyGit Remote:
git@github.com:ORNL/flowcept.gitWorkflow args:
python_random_seeded:Trueseed:42torch_cuda_manual_seeded:Falsetorch_cudnn_benchmark:Falsetorch_cudnn_deterministic:Truetorch_deterministic_algorithms:Truetorch_manual_seeded:True
Workflow-level Summary¶
Total Activities:
3Status Counts:
{'FINISHED': 7}Total Elapsed Workflow Time (s):
1.501train_and_validate:0.088 sget_dataset:0.056 sselect_best_model:0.041 s
Resource Totals:
Memory Used:7.78 MBAverage CPU (%):54.1%IO:
Read:38.49 MBWrite:55.11 MBRead Ops:1,454Write Ops:155
Key Observations:
Slowest Activity:
train_and_validateat0.088 sLargest IO Activity:
train_and_validatewith Read31.74 MBand Write52.10 MB
Workflow Structure¶
input data
│
▼
get_dataset
│
train_and_validate
│
select_best_model
▼
output data
Timing Report¶
Rows are sorted by First Started At (ascending).
Activity |
Status Counts |
First Started At |
Last Ended At |
Median Elapsed (s) |
|---|---|---|---|---|
get_dataset |
{‘FINISHED’: 1} |
2026-02-19 05:05:10 |
2026-02-19 05:05:10 |
0.056 |
train_and_validate |
{‘FINISHED’: 5} |
2026-02-19 05:05:10 |
2026-02-19 05:05:12 |
0.088 |
select_best_model |
{‘FINISHED’: 1} |
2026-02-19 05:05:12 |
2026-02-19 05:05:12 |
0.041 |
Per Activity Details¶
get_dataset (subtype=``dataprep``)
Used:
n_samples:120split_ratio:0.8
Generated:
dataset_id:f1e918cc-a3eb-4dd8-8036-5f6e4fc140d1x_train_shape:[96, 2]x_val_shape:[24, 2]y_train_shape:[96, 1]y_val_shape:[24, 1]
train_and_validate (
n=5, subtype=``learning``)Used (aggregated): includes
epochs,learning_rate,n_input_neurons,config_id, and other fields.Generated (aggregated): includes
best_val_loss,val_loss,val_accuracy, and model object ids.
select_best_model (subtype=``model_selection``)
Generated:
selected_config_id:cfg_5selected_loss:0.0490574836730957selected_model_object_id:ae18a739-1ffe-45a8-ae64-827a079579a6
Workflow-level Resource Usage¶
Metric |
Value |
|---|---|
Telemetry Samples (task start/end pairs) |
7 |
CPU User Time Delta |
7.380 |
CPU System Time Delta |
1.940 |
Average CPU (%) Delta |
54.1% |
Average CPU Frequency |
3,228 |
Memory Used Delta |
7.78 MB |
Average Memory (%) |
73.7% |
Average Swap (%) |
90.0% |
Disk Read Time Delta (ms) |
224.000 |
Disk Write Time Delta (ms) |
14.000 |
Disk Busy Time Delta (ms) |
0.000 |
Object Artifacts Summary¶
Metric |
Value |
|---|---|
Total Objects |
6 |
By Type |
{‘dataset’: 1, ‘ml_model’: 5} |
By Storage |
{‘in_object’: 1, ‘gridfs’: 5} |
Task-linked Objects |
6 |
Workflow-linked Objects |
6 |
Max Version |
7 |
Total Size |
13.66 KB |
Average Size |
2.28 KB |
Max Size |
4.10 KB |
Object Details by Type¶
Datasets
f1e918cc-a3eb-4dd8-8036-5f6e4fc140d1version:
0storage:
in_objectsize:
4.10 KBtask_id:
1771477510.9383209workflow_id:
20c5939f-f3ee-4031-9303-a9e68a5a8092timestamp:
2026-02-19 05:05:10sha256:
7d7b4be35ea11f66e9a785d1b39cfb8fc31f8fd23020bc74918872ab5855253c
Models
ae18a739-1ffe-45a8-ae64-827a079579a6version:
7storage:
gridfssize:
1.91 KBtags:
bestcustom_metadata includes
checkpoint_epoch,class,config_id,learning_rate,loss, andmodel_profile.
Aggregation Method¶
Grouping key:
activity_id.Each grouped row may aggregate multiple task records (
n_tasks).Aggregated metrics currently include count/status/timing.
Generator footer example:
Provenance card generated by Flowcept | GitHub | Version: 0.9.14 on Feb 19, 2026 at 12:05 AM EST
PDF Reports (Optional)¶
PDF reports are intended for executive-friendly rendering and include plots.
pip install flowcept[report_pdf]
from flowcept import Flowcept
# 1) Generate PDF from workflow_id (DB-backed mode)
stats = Flowcept.generate_report(
report_type="provenance_report",
format="pdf",
workflow_id="5def1173-d417-420b-a7ed-61ada01772cd",
output_path="PROVENANCE_REPORT.pdf",
)
print(stats["output"])
# 2) Generate PDF from in-memory records
Flowcept.generate_report(
report_type="provenance_report",
format="pdf",
records=my_records,
output_path="PROVENANCE_REPORT_FROM_RECORDS.pdf",
)
# 3) Generate PDF from a Flowcept JSONL file
Flowcept.generate_report(
report_type="provenance_report",
format="pdf",
input_jsonl_path="/tmp/flowcept_buffer.jsonl",
output_path="PROVENANCE_REPORT_FROM_JSONL.pdf",
)
PDF report plots include:
Top slowest activities
Top fastest activities
Most resource-demanding activities (IO)
Telemetry-aware charts when telemetry fields are available