c-relay/docs/advanced_schema_design.md

# Advanced Nostr Relay Schema Design

## Overview

This document outlines the design for an advanced multi-table schema that enforces Nostr protocol compliance at the database level, with separate tables for different event types based on their storage and replacement characteristics.

## Event Type Classification

Based on the Nostr specification, events are classified into four categories:

### 1. Regular Events
- **Kinds**: `1000 <= n < 10000` || `4 <= n < 45` || `n == 1` || `n == 2`
- **Storage Policy**: All events stored permanently
- **Examples**: Text notes (1), Reposts (6), Reactions (7), Direct Messages (4)

### 2. Replaceable Events
- **Kinds**: `10000 <= n < 20000` || `n == 0` || `n == 3`
- **Storage Policy**: Only latest per `(pubkey, kind)` combination
- **Replacement Logic**: Latest `created_at`, then lowest `id` lexically
- **Examples**: Metadata (0), Contacts (3), Mute List (10000)

### 3. Ephemeral Events
- **Kinds**: `20000 <= n < 30000`
- **Storage Policy**: Not expected to be stored (optional temporary storage)
- **Examples**: Typing indicators, presence updates, ephemeral messages

### 4. Addressable Events
- **Kinds**: `30000 <= n < 40000`
- **Storage Policy**: Only latest per `(pubkey, kind, d_tag)` combination
- **Replacement Logic**: Same as replaceable events
- **Examples**: Long-form content (30023), Application-specific data

## SQLite JSON Capabilities Research

SQLite provides powerful JSON functions that could be leveraged for tag storage:

### Core JSON Functions
```sql
-- Extract specific values
json_extract(column, '$.path')

-- Iterate through arrays
json_each(json_array_column)

-- Flatten nested structures
json_tree(json_column)

-- Validate JSON structure
json_valid(column)

-- Array operations
json_array_length(column)
json_extract(column, '$[0]')  -- First element
```

### Tag Query Examples

#### Find all 'e' tag references:
```sql
SELECT
    id,
    json_extract(value, '$[1]') as referenced_event_id,
    json_extract(value, '$[2]') as relay_hint,
    json_extract(value, '$[3]') as marker
FROM events, json_each(tags)
WHERE json_extract(value, '$[0]') = 'e';
```

#### Find events with specific hashtags:
```sql
SELECT id, content
FROM events, json_each(tags)
WHERE json_extract(value, '$[0]') = 't'
  AND json_extract(value, '$[1]') = 'bitcoin';
```

#### Extract 'd' tag for addressable events:
```sql
SELECT
    id,
    json_extract(value, '$[1]') as d_tag_value
FROM events, json_each(tags)
WHERE json_extract(value, '$[0]') = 'd'
LIMIT 1;
```

### JSON Functional Indexes
```sql
-- Index on hashtags
CREATE INDEX idx_hashtags ON events(
    json_extract(tags, '$[*][1]')
) WHERE json_extract(tags, '$[*][0]') = 't';

-- Index on 'd' tags for addressable events
CREATE INDEX idx_d_tags ON events_addressable(
    json_extract(tags, '$[*][1]')
) WHERE json_extract(tags, '$[*][0]') = 'd';
```

## Proposed Schema Design

### Option 1: Separate Tables with JSON Tags

```sql
-- Regular Events (permanent storage)
CREATE TABLE events_regular (
    id TEXT PRIMARY KEY,
    pubkey TEXT NOT NULL,
    created_at INTEGER NOT NULL,
    kind INTEGER NOT NULL,
    content TEXT NOT NULL,
    sig TEXT NOT NULL,
    tags JSON,
    first_seen INTEGER DEFAULT (strftime('%s', 'now')),
    CONSTRAINT kind_regular CHECK (
        (kind >= 1000 AND kind < 10000) OR
        (kind >= 4 AND kind < 45) OR
        kind = 1 OR kind = 2
    )
);

-- Replaceable Events (latest per pubkey+kind)
CREATE TABLE events_replaceable (
    pubkey TEXT NOT NULL,
    kind INTEGER NOT NULL,
    id TEXT NOT NULL,
    created_at INTEGER NOT NULL,
    content TEXT NOT NULL,
    sig TEXT NOT NULL,
    tags JSON,
    replaced_at INTEGER DEFAULT (strftime('%s', 'now')),
    PRIMARY KEY (pubkey, kind),
    CONSTRAINT kind_replaceable CHECK (
        (kind >= 10000 AND kind < 20000) OR
        kind = 0 OR kind = 3
    )
);

-- Ephemeral Events (temporary/optional storage)
CREATE TABLE events_ephemeral (
    id TEXT PRIMARY KEY,
    pubkey TEXT NOT NULL,
    created_at INTEGER NOT NULL,
    kind INTEGER NOT NULL,
    content TEXT NOT NULL,
    sig TEXT NOT NULL,
    tags JSON,
    expires_at INTEGER DEFAULT (strftime('%s', 'now', '+1 hour')),
    CONSTRAINT kind_ephemeral CHECK (
        kind >= 20000 AND kind < 30000
    )
);

-- Addressable Events (latest per pubkey+kind+d_tag)
CREATE TABLE events_addressable (
    pubkey TEXT NOT NULL,
    kind INTEGER NOT NULL,
    d_tag TEXT NOT NULL,
    id TEXT NOT NULL,
    created_at INTEGER NOT NULL,
    content TEXT NOT NULL,
    sig TEXT NOT NULL,
    tags JSON,
    replaced_at INTEGER DEFAULT (strftime('%s', 'now')),
    PRIMARY KEY (pubkey, kind, d_tag),
    CONSTRAINT kind_addressable CHECK (
        kind >= 30000 AND kind < 40000
    )
);
```

### Indexes for Performance

```sql
-- Regular events indexes
CREATE INDEX idx_regular_pubkey ON events_regular(pubkey);
CREATE INDEX idx_regular_kind ON events_regular(kind);
CREATE INDEX idx_regular_created_at ON events_regular(created_at);
CREATE INDEX idx_regular_kind_created_at ON events_regular(kind, created_at);

-- Replaceable events indexes
CREATE INDEX idx_replaceable_created_at ON events_replaceable(created_at);
CREATE INDEX idx_replaceable_id ON events_replaceable(id);

-- Ephemeral events indexes
CREATE INDEX idx_ephemeral_expires_at ON events_ephemeral(expires_at);
CREATE INDEX idx_ephemeral_pubkey ON events_ephemeral(pubkey);

-- Addressable events indexes
CREATE INDEX idx_addressable_created_at ON events_addressable(created_at);
CREATE INDEX idx_addressable_id ON events_addressable(id);

-- JSON tag indexes (examples)
CREATE INDEX idx_regular_e_tags ON events_regular(
    json_extract(tags, '$[*][1]')
) WHERE json_extract(tags, '$[*][0]') = 'e';

CREATE INDEX idx_regular_p_tags ON events_regular(
    json_extract(tags, '$[*][1]')
) WHERE json_extract(tags, '$[*][0]') = 'p';
```

### Option 2: Unified Tag Table Approach

```sql
-- Unified tag storage (alternative to JSON)
CREATE TABLE tags_unified (
    event_id TEXT NOT NULL,
    event_type TEXT NOT NULL, -- 'regular', 'replaceable', 'ephemeral', 'addressable'
    tag_index INTEGER NOT NULL, -- Position in tag array
    name TEXT NOT NULL,
    value TEXT NOT NULL,
    param_2 TEXT, -- Third element if present
    param_3 TEXT, -- Fourth element if present
    param_json TEXT, -- JSON for additional parameters
    PRIMARY KEY (event_id, tag_index)
);

CREATE INDEX idx_tags_name_value ON tags_unified(name, value);
CREATE INDEX idx_tags_event_type ON tags_unified(event_type);
```

## Implementation Strategy

### 1. Kind Classification Function (C Code)
```c
typedef enum {
    EVENT_TYPE_REGULAR,
    EVENT_TYPE_REPLACEABLE,
    EVENT_TYPE_EPHEMERAL,
    EVENT_TYPE_ADDRESSABLE,
    EVENT_TYPE_INVALID
} event_type_t;

event_type_t classify_event_kind(int kind) {
    if ((kind >= 1000 && kind < 10000) ||
        (kind >= 4 && kind < 45) ||
        kind == 1 || kind == 2) {
        return EVENT_TYPE_REGULAR;
    }

    if ((kind >= 10000 && kind < 20000) ||
        kind == 0 || kind == 3) {
        return EVENT_TYPE_REPLACEABLE;
    }

    if (kind >= 20000 && kind < 30000) {
        return EVENT_TYPE_EPHEMERAL;
    }

    if (kind >= 30000 && kind < 40000) {
        return EVENT_TYPE_ADDRESSABLE;
    }

    return EVENT_TYPE_INVALID;
}
```

### 2. Replacement Logic for Replaceable Events
```sql
-- Trigger for replaceable events
CREATE TRIGGER replace_event_on_insert
BEFORE INSERT ON events_replaceable
FOR EACH ROW
WHEN EXISTS (
    SELECT 1 FROM events_replaceable
    WHERE pubkey = NEW.pubkey AND kind = NEW.kind
)
BEGIN
    DELETE FROM events_replaceable
    WHERE pubkey = NEW.pubkey
      AND kind = NEW.kind
      AND (
          created_at < NEW.created_at OR
          (created_at = NEW.created_at AND id > NEW.id)
      );
END;
```

### 3. D-Tag Extraction for Addressable Events
```c
char* extract_d_tag(cJSON* tags) {
    if (!tags || !cJSON_IsArray(tags)) {
        return NULL;
    }

    cJSON* tag;
    cJSON_ArrayForEach(tag, tags) {
        if (cJSON_IsArray(tag) && cJSON_GetArraySize(tag) >= 2) {
            cJSON* tag_name = cJSON_GetArrayItem(tag, 0);
            cJSON* tag_value = cJSON_GetArrayItem(tag, 1);

            if (cJSON_IsString(tag_name) && cJSON_IsString(tag_value)) {
                if (strcmp(cJSON_GetStringValue(tag_name), "d") == 0) {
                    return strdup(cJSON_GetStringValue(tag_value));
                }
            }
        }
    }

    return strdup(""); // Default empty d-tag
}
```

## Advantages of This Design

### 1. Protocol Compliance
- **Enforced at DB level**: Schema constraints prevent invalid event storage
- **Automatic replacement**: Triggers handle replaceable/addressable event logic
- **Type safety**: Separate tables ensure correct handling per event type

### 2. Performance Benefits
- **Targeted indexes**: Each table optimized for its access patterns
- **Reduced storage**: Ephemeral events can be auto-expired
- **Query optimization**: SQLite can optimize queries per table structure

### 3. JSON Tag Benefits
- **Atomic storage**: Tags stored with their event
- **Rich querying**: SQLite JSON functions enable complex tag queries
- **Schema flexibility**: Can handle arbitrary tag structures
- **Functional indexes**: Index specific tag patterns efficiently

## Migration Strategy

1. **Phase 1**: Create new schema alongside existing
2. **Phase 2**: Implement kind classification and routing logic
3. **Phase 3**: Migrate existing data to appropriate tables
4. **Phase 4**: Update application logic to use new tables
5. **Phase 5**: Drop old schema after verification

## Next Steps for Implementation

1. **Prototype JSON performance**: Create test database with sample data
2. **Benchmark query patterns**: Compare JSON vs normalized approaches
3. **Implement kind classification**: Add routing logic to C code
4. **Create migration scripts**: Handle existing data transformation
5. **Update test suite**: Verify compliance with new schema