Files
c-relay/docs/final_schema_recommendation.md
2025-09-04 07:10:13 -04:00

13 KiB

Final Schema Recommendation: Hybrid Single Table Approach

Executive Summary

After analyzing the subscription query complexity, the multi-table approach creates more problems than it solves. REQ filters don't align with storage semantics - clients filter by kind, author, and tags regardless of event type classification.

Recommendation: Modified Single Table with Event Type Classification

The Multi-Table Problem

REQ Filter Reality Check

  • Clients send: {"kinds": [1, 0, 30023], "authors": ["pubkey"], "#p": ["target"]}
  • Multi-table requires: 3 separate queries + UNION + complex ordering
  • Single table requires: 1 query with simple WHERE conditions

Query Complexity Explosion

-- Multi-table nightmare for simple filter
WITH results AS (
    SELECT * FROM events_regular WHERE kind = 1 AND pubkey = ? 
    UNION ALL
    SELECT * FROM events_replaceable WHERE kind = 0 AND pubkey = ?
    UNION ALL  
    SELECT * FROM events_addressable WHERE kind = 30023 AND pubkey = ?
)
SELECT r.* FROM results r 
JOIN multiple_tag_tables t ON complex_conditions
ORDER BY created_at DESC, id ASC LIMIT ?;

-- vs Single table simplicity
SELECT e.* FROM events e, json_each(e.tags) t
WHERE e.kind IN (1, 0, 30023) 
  AND e.pubkey = ?
  AND json_extract(t.value, '$[0]') = 'p'
  AND json_extract(t.value, '$[1]') = ?
ORDER BY e.created_at DESC, e.id ASC LIMIT ?;

Core Design Philosophy

  • Single table for REQ query simplicity
  • Event type classification for protocol compliance
  • JSON tags for atomic storage and rich querying
  • Partial unique constraints for replacement logic

Schema Definition

CREATE TABLE events (
    id TEXT PRIMARY KEY,
    pubkey TEXT NOT NULL,
    created_at INTEGER NOT NULL,
    kind INTEGER NOT NULL,
    event_type TEXT NOT NULL CHECK (event_type IN ('regular', 'replaceable', 'ephemeral', 'addressable')),
    content TEXT NOT NULL,
    sig TEXT NOT NULL,
    tags JSON NOT NULL DEFAULT '[]',
    first_seen INTEGER NOT NULL DEFAULT (strftime('%s', 'now')),
    
    -- Additional fields for addressable events
    d_tag TEXT GENERATED ALWAYS AS (
        CASE 
            WHEN event_type = 'addressable' THEN
                json_extract(tags, '$[*][1]') 
                FROM json_each(tags) 
                WHERE json_extract(value, '$[0]') = 'd'
                LIMIT 1
            ELSE NULL 
        END
    ) STORED,
    
    -- Replacement tracking
    replaced_at INTEGER,
    
    -- Protocol compliance constraints
    CONSTRAINT unique_replaceable 
        UNIQUE (pubkey, kind) 
        WHERE event_type = 'replaceable',
        
    CONSTRAINT unique_addressable
        UNIQUE (pubkey, kind, d_tag) 
        WHERE event_type = 'addressable' AND d_tag IS NOT NULL
);

Event Type Classification Function

-- Function to determine event type from kind
CREATE VIEW event_type_lookup AS
SELECT 
    CASE 
        WHEN (kind >= 1000 AND kind < 10000) OR 
             (kind >= 4 AND kind < 45) OR 
             kind = 1 OR kind = 2 THEN 'regular'
        WHEN (kind >= 10000 AND kind < 20000) OR 
             kind = 0 OR kind = 3 THEN 'replaceable'
        WHEN kind >= 20000 AND kind < 30000 THEN 'ephemeral'
        WHEN kind >= 30000 AND kind < 40000 THEN 'addressable'
        ELSE 'unknown'
    END as event_type,
    kind
FROM (
    -- Generate all possible kind values for lookup
    WITH RECURSIVE kinds(kind) AS (
        SELECT 0
        UNION ALL
        SELECT kind + 1 FROM kinds WHERE kind < 65535
    )
    SELECT kind FROM kinds
);

Performance Indexes

-- Core query patterns
CREATE INDEX idx_events_pubkey ON events(pubkey);
CREATE INDEX idx_events_kind ON events(kind);  
CREATE INDEX idx_events_created_at ON events(created_at DESC);
CREATE INDEX idx_events_event_type ON events(event_type);

-- Composite indexes for common filters
CREATE INDEX idx_events_pubkey_created_at ON events(pubkey, created_at DESC);
CREATE INDEX idx_events_kind_created_at ON events(kind, created_at DESC);
CREATE INDEX idx_events_type_created_at ON events(event_type, created_at DESC);

-- JSON tag indexes for common patterns
CREATE INDEX idx_events_e_tags ON events(
    json_extract(tags, '$[*][1]')
) WHERE json_extract(tags, '$[*][0]') = 'e';

CREATE INDEX idx_events_p_tags ON events(
    json_extract(tags, '$[*][1]')
) WHERE json_extract(tags, '$[*][0]') = 'p';

CREATE INDEX idx_events_hashtags ON events(
    json_extract(tags, '$[*][1]')
) WHERE json_extract(tags, '$[*][0]') = 't';

-- Addressable events d_tag index
CREATE INDEX idx_events_d_tag ON events(d_tag) 
WHERE event_type = 'addressable' AND d_tag IS NOT NULL;

Replacement Logic Implementation

Replaceable Events Trigger

CREATE TRIGGER handle_replaceable_events
BEFORE INSERT ON events
FOR EACH ROW
WHEN NEW.event_type = 'replaceable'
BEGIN
    -- Delete older replaceable events with same pubkey+kind
    DELETE FROM events 
    WHERE event_type = 'replaceable'
      AND pubkey = NEW.pubkey 
      AND kind = NEW.kind
      AND (
          created_at < NEW.created_at OR 
          (created_at = NEW.created_at AND id > NEW.id)
      );
END;

Addressable Events Trigger

CREATE TRIGGER handle_addressable_events  
BEFORE INSERT ON events
FOR EACH ROW
WHEN NEW.event_type = 'addressable'
BEGIN
    -- Delete older addressable events with same pubkey+kind+d_tag
    DELETE FROM events
    WHERE event_type = 'addressable'
      AND pubkey = NEW.pubkey
      AND kind = NEW.kind
      AND d_tag = NEW.d_tag
      AND (
          created_at < NEW.created_at OR
          (created_at = NEW.created_at AND id > NEW.id)  
      );
END;

Implementation Strategy

C Code Integration

Event Type Classification

typedef enum {
    EVENT_TYPE_REGULAR,
    EVENT_TYPE_REPLACEABLE, 
    EVENT_TYPE_EPHEMERAL,
    EVENT_TYPE_ADDRESSABLE,
    EVENT_TYPE_UNKNOWN
} event_type_t;

event_type_t classify_event_kind(int kind) {
    if ((kind >= 1000 && kind < 10000) || 
        (kind >= 4 && kind < 45) || 
        kind == 1 || kind == 2) {
        return EVENT_TYPE_REGULAR;
    }
    if ((kind >= 10000 && kind < 20000) || 
        kind == 0 || kind == 3) {
        return EVENT_TYPE_REPLACEABLE;
    }
    if (kind >= 20000 && kind < 30000) {
        return EVENT_TYPE_EPHEMERAL;
    }
    if (kind >= 30000 && kind < 40000) {
        return EVENT_TYPE_ADDRESSABLE;
    }
    return EVENT_TYPE_UNKNOWN;
}

const char* event_type_to_string(event_type_t type) {
    switch (type) {
        case EVENT_TYPE_REGULAR: return "regular";
        case EVENT_TYPE_REPLACEABLE: return "replaceable";
        case EVENT_TYPE_EPHEMERAL: return "ephemeral";
        case EVENT_TYPE_ADDRESSABLE: return "addressable";
        default: return "unknown";
    }
}

Simplified Event Storage

int store_event(cJSON* event) {
    // Extract fields
    cJSON* id = cJSON_GetObjectItem(event, "id");
    cJSON* pubkey = cJSON_GetObjectItem(event, "pubkey");
    cJSON* created_at = cJSON_GetObjectItem(event, "created_at");
    cJSON* kind = cJSON_GetObjectItem(event, "kind");
    cJSON* content = cJSON_GetObjectItem(event, "content");
    cJSON* sig = cJSON_GetObjectItem(event, "sig");
    
    // Classify event type
    event_type_t type = classify_event_kind(cJSON_GetNumberValue(kind));
    
    // Serialize tags to JSON
    cJSON* tags = cJSON_GetObjectItem(event, "tags");
    char* tags_json = cJSON_Print(tags ? tags : cJSON_CreateArray());
    
    // Single INSERT statement - database handles replacement via triggers
    const char* sql = 
        "INSERT INTO events (id, pubkey, created_at, kind, event_type, content, sig, tags) "
        "VALUES (?, ?, ?, ?, ?, ?, ?, ?)";
        
    sqlite3_stmt* stmt;
    int rc = sqlite3_prepare_v2(g_db, sql, -1, &stmt, NULL);
    if (rc != SQLITE_OK) {
        free(tags_json);
        return -1;
    }
    
    sqlite3_bind_text(stmt, 1, cJSON_GetStringValue(id), -1, SQLITE_STATIC);
    sqlite3_bind_text(stmt, 2, cJSON_GetStringValue(pubkey), -1, SQLITE_STATIC);
    sqlite3_bind_int64(stmt, 3, (sqlite3_int64)cJSON_GetNumberValue(created_at));
    sqlite3_bind_int(stmt, 4, (int)cJSON_GetNumberValue(kind));
    sqlite3_bind_text(stmt, 5, event_type_to_string(type), -1, SQLITE_STATIC);
    sqlite3_bind_text(stmt, 6, cJSON_GetStringValue(content), -1, SQLITE_STATIC);
    sqlite3_bind_text(stmt, 7, cJSON_GetStringValue(sig), -1, SQLITE_STATIC);
    sqlite3_bind_text(stmt, 8, tags_json, -1, SQLITE_TRANSIENT);
    
    rc = sqlite3_step(stmt);
    sqlite3_finalize(stmt);
    free(tags_json);
    
    return (rc == SQLITE_DONE) ? 0 : -1;
}

Simple REQ Query Building

char* build_filter_query(cJSON* filter) {
    // Build single query against events table
    // Much simpler than multi-table approach
    
    GString* query = g_string_new("SELECT * FROM events WHERE 1=1");
    
    // Handle ids filter
    cJSON* ids = cJSON_GetObjectItem(filter, "ids");
    if (ids && cJSON_IsArray(ids)) {
        g_string_append(query, " AND id IN (");
        // Add parameter placeholders
        g_string_append(query, ")");
    }
    
    // Handle authors filter  
    cJSON* authors = cJSON_GetObjectItem(filter, "authors");
    if (authors && cJSON_IsArray(authors)) {
        g_string_append(query, " AND pubkey IN (");
        // Add parameter placeholders
        g_string_append(query, ")");
    }
    
    // Handle kinds filter
    cJSON* kinds = cJSON_GetObjectItem(filter, "kinds");
    if (kinds && cJSON_IsArray(kinds)) {
        g_string_append(query, " AND kind IN (");
        // Add parameter placeholders  
        g_string_append(query, ")");
    }
    
    // Handle tag filters (#e, #p, etc.)
    cJSON* item;
    cJSON_ArrayForEach(item, filter) {
        char* key = item->string;
        if (key && key[0] == '#' && strlen(key) == 2) {
            char tag_name = key[1];
            g_string_append_printf(query,
                " AND EXISTS (SELECT 1 FROM json_each(tags) "
                "WHERE json_extract(value, '$[0]') = '%c' "
                "AND json_extract(value, '$[1]') IN (", tag_name);
            // Add parameter placeholders
            g_string_append(query, "))");
        }
    }
    
    // Handle time range
    cJSON* since = cJSON_GetObjectItem(filter, "since");
    if (since) {
        g_string_append(query, " AND created_at >= ?");
    }
    
    cJSON* until = cJSON_GetObjectItem(filter, "until");
    if (until) {
        g_string_append(query, " AND created_at <= ?");
    }
    
    // Standard ordering and limit
    g_string_append(query, " ORDER BY created_at DESC, id ASC");
    
    cJSON* limit = cJSON_GetObjectItem(filter, "limit");
    if (limit) {
        g_string_append(query, " LIMIT ?");
    }
    
    return g_string_free(query, FALSE);
}

Benefits of This Approach

1. Query Simplicity

  • Single table = simple REQ queries
  • No UNION complexity
  • Familiar SQL patterns
  • Easy LIMIT and ORDER BY handling

2. Protocol Compliance

  • Event type classification enforced
  • Replacement logic via triggers
  • Unique constraints prevent duplicates
  • Proper handling of all event types

3. Performance

  • Unified indexes across all events
  • No join overhead for basic queries
  • JSON tag indexes for complex filters
  • Single table scan for cross-kind queries

4. Implementation Simplicity

  • Minimal changes from current code
  • Database handles replacement logic
  • Simple event storage function
  • No complex routing logic needed

5. Future Flexibility

  • Can add columns for new event types
  • Can split tables later if needed
  • Easy to add new indexes
  • Extensible constraint system

Migration Path

Phase 1: Schema Update

  1. Add event_type column to existing events table
  2. Add JSON tags column
  3. Create classification triggers
  4. Add partial unique constraints

Phase 2: Data Migration

  1. Classify existing events by kind
  2. Convert existing tag table data to JSON
  3. Verify constraint compliance
  4. Update indexes

Phase 3: Code Updates

  1. Update event storage to use new schema
  2. Simplify REQ query building
  3. Remove tag table JOIN logic
  4. Test subscription filtering

Phase 4: Optimization

  1. Monitor query performance
  2. Add specialized indexes as needed
  3. Tune replacement triggers
  4. Consider ephemeral event cleanup

Conclusion

This hybrid approach achieves the best of both worlds:

  • Protocol compliance through event type classification and constraints
  • Query simplicity through unified storage
  • Performance through targeted indexes
  • Implementation ease through minimal complexity

The multi-table approach, while theoretically cleaner, creates a subscription query nightmare that would significantly burden the implementation. The hybrid single-table approach provides all the benefits with manageable complexity.