Files
blossom/buds/05.md
2024-07-21 17:34:46 +02:00

4.1 KiB

BUD-05

Deterministic File Compression

draft optional

Overview

This document outlines the implementation of a deterministic file compression method for Blossom, specifying the metadata that must be stored for each compressed blob and the endpoint for retrieving this metadata.

Server Adaptation

File storage servers MUST store the following metadata for each compressed blob:

  • File hash (sha256)
  • Uploaded timestamp (uploaded)
  • Original file size (size)
  • Original file type (type)
  • Compressed file hash (compressed.sha256)
  • Compressed file size (compressed.size)
  • Compression library (compression.library)
  • Compression library version (compression.version)
  • Compression library parameters (compression.parameters)

Servers MUST normalize the blob metadata to ensure that the same compressed blob with the same parameters will always produce the same result, thereby ensuring deterministic compression and anonymizing any potential user data.

Client adaptation

Clients MUST verify the metadata of the compressed blob to ensure that the compressed blob matches the original blob. Clients MUST sign and publish a new authorization event with the new blob hash. Clients MUST upload the compressed blob to the server using the new authorization event and the 'compress' field set to false or not set.

GET /metadata - Retrieve blob metadata

The GET /metadata/ endpoint MUST return a Blob Descriptor containing the metadata fields for the requested blob or an error object if the blob does not exist.

The endpoint MUST accept an optional file extension in the URL. ie. .pdf, .png, etc

Example response:

{
  "url": "cdn.nostrcheck.me/b1674191a88ec5cdd733e4240a81803105dc412d6c6708d53ab94fc248f4f553",
  "sha256": "b1674191a88ec5cdd733e4240a81803105dc412d6c6708d53ab94fc248f4f553",
  "uploaded": 1708773959,
  "size": 123456,
  "type": "application/pdf",
  "compressed": {
    "sha256": "e2e2e1d2b9c04e2a9914c245e3f789d230e3c2d5e3a0f5c6e4b4e3f9c7d4f3e2",
    "size": 56789,
    "library": "brotli",
    "version": "1.0.7",
    "parameters": {
      "quality": 11,
      "mode": "text"
    }
  }
}

Example Flow

  1. Client signs authorization event and uploads blob to Server A with 'compress' field set to true
  2. Server A compresses the blob using it's preferred library and parameters
  3. Server A stores the blob metadata
  4. Server A returns Blob Descriptor
  5. Client verify the metadata matches the original blob compressing and comparing the blob locally with the same returned parameters
  6. Client signs and publishes a new authorization event with the new blob hash and the 'compress' field set to false
  7. Client uploads the compressed blob to Server B using the new authorization event and the 'compress' field set to false or not set
  8. Server B verifies blob hash metadata matches x tag in the new authorization event
  9. Server B stores the blob metadata
  10. Server B returns Blob Descriptor

(optional using the /mirror endpoint)

  1. Client sends the url to Server C /mirror using the original authorization event
  2. Server C downloads blob from Server A or B using the url field
  3. Server C verifies downloaded blob hash matches x tag in authorization event
  4. Server C verify the metadata matches the original blob compressing the blob locally with the same parameters
  5. Server C returns Blob Descriptor

Accepted Libraries, Versions, and Parameters

The following libraries, versions, and parameters are accepted for deterministic compression:

(This is just a DEMO table, the actual table will be updated with the final list of libraries, versions, and parameters)

Library Version Quality Range Modes
Brotli 1.0.7 0-11 text, font, generic
Gzip 1.10 0-9 text, font, generic
Zstd 1.4.5 1-22 text, font, generic
Lz4 1.9.2 1-12 text, font, generic