Reporting ========= Flowcept can generate summarized reports from provenance records. Current report implementations: - ``report_type="provenance_card"`` with ``format="markdown"`` (**default**) - ``report_type="provenance_report"`` with ``format="pdf"`` (executive PDF with plots) API --- Use: .. code-block:: python from flowcept import Flowcept # Default path: markdown provenance card Flowcept.generate_report( report_type="provenance_card", format="markdown", output_path="PROVENANCE_CARD.md", records=my_records, # or input_jsonl_path=..., or workflow_id/campaign_id ) Markdown Provenance Cards (Default) ----------------------------------- Markdown provenance cards are the default reporting mode. .. code-block:: python from flowcept import Flowcept # 1) Generate from workflow_id (DB-backed mode) Flowcept.generate_report( report_type="provenance_card", format="markdown", workflow_id="20c5939f-f3ee-4031-9303-a9e68a5a8092", output_path="PROVENANCE_CARD.md", ) # 2) Generate from in-memory records Flowcept.generate_report( report_type="provenance_card", format="markdown", records=my_records, output_path="PROVENANCE_CARD_FROM_RECORDS.md", ) # 3) Generate from Flowcept JSONL buffer Flowcept.generate_report( report_type="provenance_card", format="markdown", input_jsonl_path="/tmp/flowcept_buffer.jsonl", output_path="PROVENANCE_CARD_FROM_JSONL.md", ) Render Markdown Directly in Terminal (Rich) ------------------------------------------- You can optionally print the generated markdown report in a rich terminal: .. code-block:: python from flowcept import Flowcept Flowcept.generate_report( report_type="provenance_card", format="markdown", records=my_records, output_path="PROVENANCE_CARD.md", print_markdown=True, ) If Rich is not installed and ``print_markdown=True``, Flowcept raises an error. Install Rich via: .. code-block:: bash pip install flowcept["extras"] Input Modes ----------- Exactly one input mode must be provided: - ``input_jsonl_path``: read from a Flowcept JSONL buffer file. - ``records``: list of dictionaries already loaded in memory. - ``workflow_id`` or ``campaign_id``: query workflow, task, and object documents from DB. Aggregation ----------- The provenance card is summarized, not raw-dump oriented. - Grouping key: ``activity_id``. - Per-group summary includes: - number of task records aggregated (``n_tasks``) - status counts - timing aggregates (median/summary fields) This aggregation method is written in generated output under ``Aggregation Method``. Object Metadata Summary ----------------------- When objects are present, reports include metadata-only summaries: - counts by type - counts by storage mode (``in_object`` vs ``gridfs``) - linkage counts (task/workflow-linked) - object version and size summaries Blob payload bytes are excluded from report rendering. Real Example (Rendered in RST) ------------------------------ Below is a real example equivalent to generated markdown content for: ``Workflow Provenance Card: Perceptron GridSearch``. Summary ~~~~~~~ - **Workflow Name:** ``Perceptron GridSearch`` - **Workflow ID:** ``20c5939f-f3ee-4031-9303-a9e68a5a8092`` - **Campaign ID:** ``661344de-ddf4-497d-a5ba-0d01c67cfb79`` - **Execution Start (UTC):** ``2026-02-19 05:05:10`` - **Execution End (UTC):** ``2026-02-19 05:05:12`` - **Total Elapsed (s):** ``1.501`` - **User:** ``rsr`` - **System Name:** ``Darwin`` - **Environment ID:** ``laptop`` - **Workflow Subtype:** ``ml_workflow`` - **Code Repository:** ``branch=skills, short_sha=f3df676, dirty=dirty`` - **Git Remote:** ``git@github.com:ORNL/flowcept.git`` - **Workflow args:** - ``python_random_seeded``: ``True`` - ``seed``: ``42`` - ``torch_cuda_manual_seeded``: ``False`` - ``torch_cudnn_benchmark``: ``False`` - ``torch_cudnn_deterministic``: ``True`` - ``torch_deterministic_algorithms``: ``True`` - ``torch_manual_seeded``: ``True`` Workflow-level Summary ~~~~~~~~~~~~~~~~~~~~~~ - **Total Activities:** ``3`` - **Status Counts:** ``{'FINISHED': 7}`` - **Total Elapsed Workflow Time (s):** ``1.501`` - ``train_and_validate``: ``0.088 s`` - ``get_dataset``: ``0.056 s`` - ``select_best_model``: ``0.041 s`` - **Resource Totals:** - ``Memory Used``: ``7.78 MB`` - ``Average CPU (%)``: ``54.1%`` - **IO:** - ``Read``: ``38.49 MB`` - ``Write``: ``55.11 MB`` - ``Read Ops``: ``1,454`` - ``Write Ops``: ``155`` - **Key Observations:** - Slowest Activity: ``train_and_validate`` at ``0.088 s`` - Largest IO Activity: ``train_and_validate`` with Read ``31.74 MB`` and Write ``52.10 MB`` Workflow Structure ~~~~~~~~~~~~~~~~~~ .. code-block:: text input data │ ▼ get_dataset │ train_and_validate │ select_best_model ▼ output data Timing Report ~~~~~~~~~~~~~ Rows are sorted by **First Started At** (ascending). .. list-table:: :header-rows: 1 * - Activity - Status Counts - First Started At - Last Ended At - Median Elapsed (s) * - get_dataset - {'FINISHED': 1} - 2026-02-19 05:05:10 - 2026-02-19 05:05:10 - 0.056 * - train_and_validate - {'FINISHED': 5} - 2026-02-19 05:05:10 - 2026-02-19 05:05:12 - 0.088 * - select_best_model - {'FINISHED': 1} - 2026-02-19 05:05:12 - 2026-02-19 05:05:12 - 0.041 Per Activity Details ~~~~~~~~~~~~~~~~~~~~ - **get_dataset** (subtype=``dataprep``) - Used: - ``n_samples``: ``120`` - ``split_ratio``: ``0.8`` - Generated: - ``dataset_id``: ``f1e918cc-a3eb-4dd8-8036-5f6e4fc140d1`` - ``x_train_shape``: ``[96, 2]`` - ``x_val_shape``: ``[24, 2]`` - ``y_train_shape``: ``[96, 1]`` - ``y_val_shape``: ``[24, 1]`` - **train_and_validate** (``n=5``, subtype=``learning``) - Used (aggregated): includes ``epochs``, ``learning_rate``, ``n_input_neurons``, ``config_id``, and other fields. - Generated (aggregated): includes ``best_val_loss``, ``val_loss``, ``val_accuracy``, and model object ids. - **select_best_model** (subtype=``model_selection``) - Generated: - ``selected_config_id``: ``cfg_5`` - ``selected_loss``: ``0.0490574836730957`` - ``selected_model_object_id``: ``ae18a739-1ffe-45a8-ae64-827a079579a6`` Workflow-level Resource Usage ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :header-rows: 1 * - Metric - Value * - Telemetry Samples (task start/end pairs) - 7 * - CPU User Time Delta - 7.380 * - CPU System Time Delta - 1.940 * - Average CPU (%) Delta - 54.1% * - Average CPU Frequency - 3,228 * - Memory Used Delta - 7.78 MB * - Average Memory (%) - 73.7% * - Average Swap (%) - 90.0% * - Disk Read Time Delta (ms) - 224.000 * - Disk Write Time Delta (ms) - 14.000 * - Disk Busy Time Delta (ms) - 0.000 Object Artifacts Summary ~~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :header-rows: 1 * - Metric - Value * - Total Objects - 6 * - By Type - {'dataset': 1, 'ml_model': 5} * - By Storage - {'in_object': 1, 'gridfs': 5} * - Task-linked Objects - 6 * - Workflow-linked Objects - 6 * - Max Version - 7 * - Total Size - 13.66 KB * - Average Size - 2.28 KB * - Max Size - 4.10 KB Object Details by Type ~~~~~~~~~~~~~~~~~~~~~~ - **Datasets** - ``f1e918cc-a3eb-4dd8-8036-5f6e4fc140d1`` - version: ``0`` - storage: ``in_object`` - size: ``4.10 KB`` - task_id: ``1771477510.9383209`` - workflow_id: ``20c5939f-f3ee-4031-9303-a9e68a5a8092`` - timestamp: ``2026-02-19 05:05:10`` - sha256: ``7d7b4be35ea11f66e9a785d1b39cfb8fc31f8fd23020bc74918872ab5855253c`` - **Models** - ``ae18a739-1ffe-45a8-ae64-827a079579a6`` - version: ``7`` - storage: ``gridfs`` - size: ``1.91 KB`` - tags: ``best`` - custom_metadata includes ``checkpoint_epoch``, ``class``, ``config_id``, ``learning_rate``, ``loss``, and ``model_profile``. Aggregation Method ~~~~~~~~~~~~~~~~~~ - Grouping key: ``activity_id``. - Each grouped row may aggregate multiple task records (``n_tasks``). - Aggregated metrics currently include count/status/timing. Generator footer example: - Provenance card generated by Flowcept | GitHub | Version: 0.9.14 on Feb 19, 2026 at 12:05 AM EST PDF Reports (Optional) ---------------------- PDF reports are intended for executive-friendly rendering and include plots. .. code-block:: shell pip install flowcept[report_pdf] .. code-block:: python from flowcept import Flowcept # 1) Generate PDF from workflow_id (DB-backed mode) stats = Flowcept.generate_report( report_type="provenance_report", format="pdf", workflow_id="5def1173-d417-420b-a7ed-61ada01772cd", output_path="PROVENANCE_REPORT.pdf", ) print(stats["output"]) # 2) Generate PDF from in-memory records Flowcept.generate_report( report_type="provenance_report", format="pdf", records=my_records, output_path="PROVENANCE_REPORT_FROM_RECORDS.pdf", ) # 3) Generate PDF from a Flowcept JSONL file Flowcept.generate_report( report_type="provenance_report", format="pdf", input_jsonl_path="/tmp/flowcept_buffer.jsonl", output_path="PROVENANCE_REPORT_FROM_JSONL.pdf", ) PDF report plots include: - Top slowest activities - Top fastest activities - Most resource-demanding activities (IO) - Telemetry-aware charts when telemetry fields are available