BUD-05 ====== Deterministic File Compression --------------- `draft` `optional` ## Overview This document outlines the implementation of a deterministic file compression method for Blossom, specifying the metadata that must be stored for each compressed blob and the endpoint for retrieving this metadata. ## Server Adaptation File storage servers `MUST` store the following metadata for each compressed blob: - File hash (sha256) - Uploaded timestamp (uploaded) - Original file size (size) - Original file type (type) - Compressed file hash (compressed.sha256) - Compressed file size (compressed.size) - Compression library (compression.library) - Compression library version (compression.version) - Compression library parameters (compression.parameters) Servers `MUST` normalize the blob metadata to ensure that the same compressed blob with the same parameters will always produce the same result, thereby ensuring deterministic compression and anonymizing any potential user data. ## Client adaptation Clients `MUST` verify the metadata of the compressed blob to ensure that the compressed blob matches the original blob. Clients `MUST` sign and publish a new authorization event with the new blob hash. Clients `MUST` upload the compressed blob to the server using the new authorization event and the 'compress' field set to false or not set. ## GET /metadata - Retrieve blob metadata The GET /metadata/ endpoint `MUST` return a [Blob Descriptor](https://github.com/hzrd149/blossom/blob/master/buds/02.md#blob-descriptor) containing the metadata fields for the requested blob or an error object if the blob does not exist. The endpoint `MUST` accept an optional file extension in the URL. ie. `.pdf`, `.png`, etc Example response: ```json { "url": "cdn.nostrcheck.me/b1674191a88ec5cdd733e4240a81803105dc412d6c6708d53ab94fc248f4f553", "sha256": "b1674191a88ec5cdd733e4240a81803105dc412d6c6708d53ab94fc248f4f553", "uploaded": 1708773959, "size": 123456, "type": "application/pdf", "compressed": { "sha256": "e2e2e1d2b9c04e2a9914c245e3f789d230e3c2d5e3a0f5c6e4b4e3f9c7d4f3e2", "size": 56789, "library": "brotli", "version": "1.0.7", "parameters": { "quality": 11, "mode": "text" } } } ``` ## Example Flow 1. Client signs authorization event and uploads blob to Server A with 'compress' field set to true 1. Server A compresses the blob using it's preferred library and parameters 1. Server A stores the blob metadata 1. Server A returns [Blob Descriptor](./02.md#blob-descriptor) 1. Client verify the metadata matches the original blob compressing and comparing the blob locally with the same returned parameters 1. Client signs and publishes a new authorization event with the new blob hash and the 'compress' field set to false 1. Client uploads the compressed blob to Server B using the new authorization event and the 'compress' field set to false or not set 1. Server B verifies blob hash metadata matches `x` tag in the new authorization event 1. Server B stores the blob metadata 1. Server B returns [Blob Descriptor](./02.md#blob-descriptor) (optional using the /mirror endpoint) 1. Client sends the `url` to Server C `/mirror` using the compressed authorization event 1. Server C downloads blob from Server A or B using the url field 1. Server C verifies downloaded blob hash matches `x` tag in authorization event (using sha256 or compressed.sha256) 1. Server C returns [Blob Descriptor](./02.md#blob-descriptor) ## Accepted Libraries, Versions, and Parameters The following libraries, versions, and parameters are accepted for deterministic compression: (This is just a DEMO table, the actual table will be updated with the final list of libraries, versions, and parameters) | Library | Version | Quality Range | Modes | |---------|---------|---------------|------------------------| | Brotli | 1.0.7 | 0-11 | text, font, generic | | Gzip | 1.10 | 0-9 | text, font, generic | | Zstd | 1.4.5 | 1-22 | text, font, generic | | Lz4 | 1.9.2 | 1-12 | text, font, generic |