Files
c-relay/docs/advanced_schema_design.md
2025-09-04 07:10:13 -04:00

337 lines
9.8 KiB
Markdown

# Advanced Nostr Relay Schema Design
## Overview
This document outlines the design for an advanced multi-table schema that enforces Nostr protocol compliance at the database level, with separate tables for different event types based on their storage and replacement characteristics.
## Event Type Classification
Based on the Nostr specification, events are classified into four categories:
### 1. Regular Events
- **Kinds**: `1000 <= n < 10000` || `4 <= n < 45` || `n == 1` || `n == 2`
- **Storage Policy**: All events stored permanently
- **Examples**: Text notes (1), Reposts (6), Reactions (7), Direct Messages (4)
### 2. Replaceable Events
- **Kinds**: `10000 <= n < 20000` || `n == 0` || `n == 3`
- **Storage Policy**: Only latest per `(pubkey, kind)` combination
- **Replacement Logic**: Latest `created_at`, then lowest `id` lexically
- **Examples**: Metadata (0), Contacts (3), Mute List (10000)
### 3. Ephemeral Events
- **Kinds**: `20000 <= n < 30000`
- **Storage Policy**: Not expected to be stored (optional temporary storage)
- **Examples**: Typing indicators, presence updates, ephemeral messages
### 4. Addressable Events
- **Kinds**: `30000 <= n < 40000`
- **Storage Policy**: Only latest per `(pubkey, kind, d_tag)` combination
- **Replacement Logic**: Same as replaceable events
- **Examples**: Long-form content (30023), Application-specific data
## SQLite JSON Capabilities Research
SQLite provides powerful JSON functions that could be leveraged for tag storage:
### Core JSON Functions
```sql
-- Extract specific values
json_extract(column, '$.path')
-- Iterate through arrays
json_each(json_array_column)
-- Flatten nested structures
json_tree(json_column)
-- Validate JSON structure
json_valid(column)
-- Array operations
json_array_length(column)
json_extract(column, '$[0]') -- First element
```
### Tag Query Examples
#### Find all 'e' tag references:
```sql
SELECT
id,
json_extract(value, '$[1]') as referenced_event_id,
json_extract(value, '$[2]') as relay_hint,
json_extract(value, '$[3]') as marker
FROM events, json_each(tags)
WHERE json_extract(value, '$[0]') = 'e';
```
#### Find events with specific hashtags:
```sql
SELECT id, content
FROM events, json_each(tags)
WHERE json_extract(value, '$[0]') = 't'
AND json_extract(value, '$[1]') = 'bitcoin';
```
#### Extract 'd' tag for addressable events:
```sql
SELECT
id,
json_extract(value, '$[1]') as d_tag_value
FROM events, json_each(tags)
WHERE json_extract(value, '$[0]') = 'd'
LIMIT 1;
```
### JSON Functional Indexes
```sql
-- Index on hashtags
CREATE INDEX idx_hashtags ON events(
json_extract(tags, '$[*][1]')
) WHERE json_extract(tags, '$[*][0]') = 't';
-- Index on 'd' tags for addressable events
CREATE INDEX idx_d_tags ON events_addressable(
json_extract(tags, '$[*][1]')
) WHERE json_extract(tags, '$[*][0]') = 'd';
```
## Proposed Schema Design
### Option 1: Separate Tables with JSON Tags
```sql
-- Regular Events (permanent storage)
CREATE TABLE events_regular (
id TEXT PRIMARY KEY,
pubkey TEXT NOT NULL,
created_at INTEGER NOT NULL,
kind INTEGER NOT NULL,
content TEXT NOT NULL,
sig TEXT NOT NULL,
tags JSON,
first_seen INTEGER DEFAULT (strftime('%s', 'now')),
CONSTRAINT kind_regular CHECK (
(kind >= 1000 AND kind < 10000) OR
(kind >= 4 AND kind < 45) OR
kind = 1 OR kind = 2
)
);
-- Replaceable Events (latest per pubkey+kind)
CREATE TABLE events_replaceable (
pubkey TEXT NOT NULL,
kind INTEGER NOT NULL,
id TEXT NOT NULL,
created_at INTEGER NOT NULL,
content TEXT NOT NULL,
sig TEXT NOT NULL,
tags JSON,
replaced_at INTEGER DEFAULT (strftime('%s', 'now')),
PRIMARY KEY (pubkey, kind),
CONSTRAINT kind_replaceable CHECK (
(kind >= 10000 AND kind < 20000) OR
kind = 0 OR kind = 3
)
);
-- Ephemeral Events (temporary/optional storage)
CREATE TABLE events_ephemeral (
id TEXT PRIMARY KEY,
pubkey TEXT NOT NULL,
created_at INTEGER NOT NULL,
kind INTEGER NOT NULL,
content TEXT NOT NULL,
sig TEXT NOT NULL,
tags JSON,
expires_at INTEGER DEFAULT (strftime('%s', 'now', '+1 hour')),
CONSTRAINT kind_ephemeral CHECK (
kind >= 20000 AND kind < 30000
)
);
-- Addressable Events (latest per pubkey+kind+d_tag)
CREATE TABLE events_addressable (
pubkey TEXT NOT NULL,
kind INTEGER NOT NULL,
d_tag TEXT NOT NULL,
id TEXT NOT NULL,
created_at INTEGER NOT NULL,
content TEXT NOT NULL,
sig TEXT NOT NULL,
tags JSON,
replaced_at INTEGER DEFAULT (strftime('%s', 'now')),
PRIMARY KEY (pubkey, kind, d_tag),
CONSTRAINT kind_addressable CHECK (
kind >= 30000 AND kind < 40000
)
);
```
### Indexes for Performance
```sql
-- Regular events indexes
CREATE INDEX idx_regular_pubkey ON events_regular(pubkey);
CREATE INDEX idx_regular_kind ON events_regular(kind);
CREATE INDEX idx_regular_created_at ON events_regular(created_at);
CREATE INDEX idx_regular_kind_created_at ON events_regular(kind, created_at);
-- Replaceable events indexes
CREATE INDEX idx_replaceable_created_at ON events_replaceable(created_at);
CREATE INDEX idx_replaceable_id ON events_replaceable(id);
-- Ephemeral events indexes
CREATE INDEX idx_ephemeral_expires_at ON events_ephemeral(expires_at);
CREATE INDEX idx_ephemeral_pubkey ON events_ephemeral(pubkey);
-- Addressable events indexes
CREATE INDEX idx_addressable_created_at ON events_addressable(created_at);
CREATE INDEX idx_addressable_id ON events_addressable(id);
-- JSON tag indexes (examples)
CREATE INDEX idx_regular_e_tags ON events_regular(
json_extract(tags, '$[*][1]')
) WHERE json_extract(tags, '$[*][0]') = 'e';
CREATE INDEX idx_regular_p_tags ON events_regular(
json_extract(tags, '$[*][1]')
) WHERE json_extract(tags, '$[*][0]') = 'p';
```
### Option 2: Unified Tag Table Approach
```sql
-- Unified tag storage (alternative to JSON)
CREATE TABLE tags_unified (
event_id TEXT NOT NULL,
event_type TEXT NOT NULL, -- 'regular', 'replaceable', 'ephemeral', 'addressable'
tag_index INTEGER NOT NULL, -- Position in tag array
name TEXT NOT NULL,
value TEXT NOT NULL,
param_2 TEXT, -- Third element if present
param_3 TEXT, -- Fourth element if present
param_json TEXT, -- JSON for additional parameters
PRIMARY KEY (event_id, tag_index)
);
CREATE INDEX idx_tags_name_value ON tags_unified(name, value);
CREATE INDEX idx_tags_event_type ON tags_unified(event_type);
```
## Implementation Strategy
### 1. Kind Classification Function (C Code)
```c
typedef enum {
EVENT_TYPE_REGULAR,
EVENT_TYPE_REPLACEABLE,
EVENT_TYPE_EPHEMERAL,
EVENT_TYPE_ADDRESSABLE,
EVENT_TYPE_INVALID
} event_type_t;
event_type_t classify_event_kind(int kind) {
if ((kind >= 1000 && kind < 10000) ||
(kind >= 4 && kind < 45) ||
kind == 1 || kind == 2) {
return EVENT_TYPE_REGULAR;
}
if ((kind >= 10000 && kind < 20000) ||
kind == 0 || kind == 3) {
return EVENT_TYPE_REPLACEABLE;
}
if (kind >= 20000 && kind < 30000) {
return EVENT_TYPE_EPHEMERAL;
}
if (kind >= 30000 && kind < 40000) {
return EVENT_TYPE_ADDRESSABLE;
}
return EVENT_TYPE_INVALID;
}
```
### 2. Replacement Logic for Replaceable Events
```sql
-- Trigger for replaceable events
CREATE TRIGGER replace_event_on_insert
BEFORE INSERT ON events_replaceable
FOR EACH ROW
WHEN EXISTS (
SELECT 1 FROM events_replaceable
WHERE pubkey = NEW.pubkey AND kind = NEW.kind
)
BEGIN
DELETE FROM events_replaceable
WHERE pubkey = NEW.pubkey
AND kind = NEW.kind
AND (
created_at < NEW.created_at OR
(created_at = NEW.created_at AND id > NEW.id)
);
END;
```
### 3. D-Tag Extraction for Addressable Events
```c
char* extract_d_tag(cJSON* tags) {
if (!tags || !cJSON_IsArray(tags)) {
return NULL;
}
cJSON* tag;
cJSON_ArrayForEach(tag, tags) {
if (cJSON_IsArray(tag) && cJSON_GetArraySize(tag) >= 2) {
cJSON* tag_name = cJSON_GetArrayItem(tag, 0);
cJSON* tag_value = cJSON_GetArrayItem(tag, 1);
if (cJSON_IsString(tag_name) && cJSON_IsString(tag_value)) {
if (strcmp(cJSON_GetStringValue(tag_name), "d") == 0) {
return strdup(cJSON_GetStringValue(tag_value));
}
}
}
}
return strdup(""); // Default empty d-tag
}
```
## Advantages of This Design
### 1. Protocol Compliance
- **Enforced at DB level**: Schema constraints prevent invalid event storage
- **Automatic replacement**: Triggers handle replaceable/addressable event logic
- **Type safety**: Separate tables ensure correct handling per event type
### 2. Performance Benefits
- **Targeted indexes**: Each table optimized for its access patterns
- **Reduced storage**: Ephemeral events can be auto-expired
- **Query optimization**: SQLite can optimize queries per table structure
### 3. JSON Tag Benefits
- **Atomic storage**: Tags stored with their event
- **Rich querying**: SQLite JSON functions enable complex tag queries
- **Schema flexibility**: Can handle arbitrary tag structures
- **Functional indexes**: Index specific tag patterns efficiently
## Migration Strategy
1. **Phase 1**: Create new schema alongside existing
2. **Phase 2**: Implement kind classification and routing logic
3. **Phase 3**: Migrate existing data to appropriate tables
4. **Phase 4**: Update application logic to use new tables
5. **Phase 5**: Drop old schema after verification
## Next Steps for Implementation
1. **Prototype JSON performance**: Create test database with sample data
2. **Benchmark query patterns**: Compare JSON vs normalized approaches
3. **Implement kind classification**: Add routing logic to C code
4. **Create migration scripts**: Handle existing data transformation
5. **Update test suite**: Verify compliance with new schema