Blob Data Schema ================ Flowcept stores binary payload metadata in the ``objects`` collection. This document describes the logical schema represented by ``BlobObject``. Required Fields --------------- - **object_id** (str): Unique identifier for the blob record. - **version** (int): Object version number. starts at ``0`` and increments by ``1`` on each update of the same ``object_id``. Optional Fields --------------- - **task_id** (str): Task linkage field. Use this when the blob was produced or consumed by a specific task. - **workflow_id** (str): Workflow linkage field. Use this to associate the blob with a workflow execution. If ``workflow_id`` is not explicitly passed when saving, Flowcept uses ``Flowcept.current_workflow_id`` when available. - **type** (str): User-defined category label for the blob. Typical values: ``ml_model`` (trained model/checkpoint bytes), ``dataset_snapshot`` (frozen dataset payloads), ``artifact`` (generic serialized outputs), ``input_file`` (uploaded/source binary inputs), and ``embedding_index`` (vector index payloads). - **custom_metadata** (dict): Free-form dictionary for additional tags and attributes (for example, ``{"framework": "torch", "stage": "best"}``). - **created_at** (datetime, UTC): Logical object creation timestamp. - **created_by** (str): Logical object creator identifier. - **updated_at** (datetime, UTC): Latest update timestamp. - **updated_by** (str): Latest updater identifier. - **prev_version** (int or null): Previous latest version number (``None`` for first insert in controlled mode). - **object_size_bytes** (int): Payload size in bytes when available. - **data_sha256** (str): SHA-256 hash of payload bytes for fast equality checks and integrity verification. - **data_hash_algo** (str): Hash algorithm label for the payload fingerprint (currently ``sha256``). Notes ----- - Binary payload bytes are stored either **in-object** (``data`` field) or out-of-line in GridFS depending on ``save_data_in_collection``. - When storage mode is GridFS, the document keeps ``grid_fs_file_id`` as the pointer to payload bytes. - ``BlobObject`` captures metadata/linkage fields; payload storage location is implementation-specific. - In version-control mode, Flowcept keeps latest in ``objects`` and append-only older versions in ``object_history``.