Basic relay functionality completed

2025-09-04 07:10:13 -04:00
parent 227c579147
commit 662feab881
15 changed files with 2207 additions and 210 deletions
--- a/docs/subscription_query_analysis.md
+++ b/docs/subscription_query_analysis.md
@@ -0,0 +1,331 @@
+# Subscription Query Complexity Analysis
+
+## Overview
+
+This document analyzes how Nostr REQ subscription filters would be implemented across different schema designs, focusing on query complexity, performance implications, and implementation burden.
+
+## Nostr REQ Filter Specification Recap
+
+Clients send REQ messages with filters containing:
+- **`ids`**: List of specific event IDs  
+- **`authors`**: List of pubkeys
+- **`kinds`**: List of event kinds
+- **`#<letter>`**: Tag filters (e.g., `#e` for event refs, `#p` for pubkey mentions)
+- **`since`/`until`**: Time range filters
+- **`limit`**: Maximum events to return
+
+### Key Filter Behaviors:
+- **Multiple filters = OR logic**: Match any filter
+- **Within filter = AND logic**: Match all specified conditions  
+- **Lists = IN logic**: Match any value in the list
+- **Tag filters**: Must have at least one matching tag
+
+## Schema Comparison for REQ Handling
+
+### Current Simple Schema (Single Table)
+```sql
+CREATE TABLE event (
+    id TEXT PRIMARY KEY,
+    pubkey TEXT NOT NULL,
+    created_at INTEGER NOT NULL,
+    kind INTEGER NOT NULL,
+    content TEXT NOT NULL,
+    sig TEXT NOT NULL
+);
+
+CREATE TABLE tag (
+    id TEXT NOT NULL, -- event ID
+    name TEXT NOT NULL,
+    value TEXT NOT NULL,
+    parameters TEXT
+);
+```
+
+#### Sample REQ Query Implementation:
+```sql
+-- Filter: {"authors": ["pubkey1", "pubkey2"], "kinds": [1, 6], "#p": ["target_pubkey"]}
+SELECT DISTINCT e.*
+FROM event e
+WHERE e.pubkey IN ('pubkey1', 'pubkey2')
+  AND e.kind IN (1, 6)
+  AND EXISTS (
+      SELECT 1 FROM tag t 
+      WHERE t.id = e.id AND t.name = 'p' AND t.value = 'target_pubkey'
+  )
+ORDER BY e.created_at DESC, e.id ASC
+LIMIT ?;
+```
+
+### Multi-Table Schema Challenge
+
+With separate tables (`events_regular`, `events_replaceable`, `events_ephemeral`, `events_addressable`), a REQ filter could potentially match events across ALL tables.
+
+#### Problem Example:
+Filter: `{"kinds": [1, 0, 20001, 30023]}`
+- Kind 1 → `events_regular`
+- Kind 0 → `events_replaceable`  
+- Kind 20001 → `events_ephemeral`
+- Kind 30023 → `events_addressable`
+
+This requires **4 separate queries + UNION**, significantly complicating the implementation.
+
+## Multi-Table Query Complexity
+
+### Scenario 1: Cross-Table Kind Filter
+```sql
+-- Filter: {"kinds": [1, 0, 30023]}
+-- Requires querying 3 different tables
+
+SELECT id, pubkey, created_at, kind, content, sig FROM events_regular 
+WHERE kind = 1
+UNION ALL
+SELECT id, pubkey, created_at, kind, content, sig FROM events_replaceable
+WHERE kind = 0  
+UNION ALL
+SELECT id, pubkey, created_at, kind, content, sig FROM events_addressable
+WHERE kind = 30023
+ORDER BY created_at DESC, id ASC
+LIMIT ?;
+```
+
+### Scenario 2: Cross-Table Author Filter  
+```sql
+-- Filter: {"authors": ["pubkey1"]}
+-- Must check ALL tables for this author
+
+SELECT id, pubkey, created_at, kind, content, sig FROM events_regular
+WHERE pubkey = 'pubkey1'
+UNION ALL
+SELECT id, pubkey, created_at, kind, content, sig FROM events_replaceable  
+WHERE pubkey = 'pubkey1'
+UNION ALL
+SELECT id, pubkey, created_at, kind, content, sig FROM events_ephemeral
+WHERE pubkey = 'pubkey1'
+UNION ALL  
+SELECT id, pubkey, created_at, kind, content, sig FROM events_addressable
+WHERE pubkey = 'pubkey1'
+ORDER BY created_at DESC, id ASC
+LIMIT ?;
+```
+
+### Scenario 3: Complex Multi-Condition Filter
+```sql
+-- Filter: {"authors": ["pubkey1"], "kinds": [1, 0], "#p": ["target"], "since": 1234567890}
+-- Extremely complex with multiple UNIONs and tag JOINs
+
+WITH regular_results AS (
+    SELECT DISTINCT r.*
+    FROM events_regular r
+    JOIN tags_regular tr ON r.id = tr.event_id
+    WHERE r.pubkey = 'pubkey1'
+      AND r.kind = 1
+      AND r.created_at >= 1234567890
+      AND tr.name = 'p' AND tr.value = 'target'
+),
+replaceable_results AS (
+    SELECT DISTINCT rp.*
+    FROM events_replaceable rp  
+    JOIN tags_replaceable trp ON (rp.pubkey, rp.kind) = (trp.event_pubkey, trp.event_kind)
+    WHERE rp.pubkey = 'pubkey1'
+      AND rp.kind = 0
+      AND rp.created_at >= 1234567890
+      AND trp.name = 'p' AND trp.value = 'target'
+)
+SELECT * FROM regular_results
+UNION ALL
+SELECT * FROM replaceable_results
+ORDER BY created_at DESC, id ASC
+LIMIT ?;
+```
+
+## Implementation Burden Analysis
+
+### Single Table Approach
+```c
+// Simple - one query builder function
+char* build_filter_query(cJSON* filters) {
+    // Build single SELECT with WHERE conditions
+    // Single ORDER BY and LIMIT
+    // One execution path
+}
+```
+
+### Multi-Table Approach
+```c
+// Complex - requires routing and union logic  
+char* build_multi_table_query(cJSON* filters) {
+    // 1. Analyze kinds to determine which tables to query
+    // 2. Split filters per table type
+    // 3. Build separate queries for each table
+    // 4. Union results with complex ORDER BY
+    // 5. Handle LIMIT across UNION (tricky!)
+}
+
+typedef struct {
+    bool needs_regular;
+    bool needs_replaceable; 
+    bool needs_ephemeral;
+    bool needs_addressable;
+    cJSON* regular_filter;
+    cJSON* replaceable_filter;
+    cJSON* ephemeral_filter;
+    cJSON* addressable_filter;
+} filter_routing_t;
+```
+
+### Query Routing Complexity
+
+For each REQ filter, we must:
+
+1. **Analyze kinds** → Determine which tables to query
+2. **Split filters** → Create per-table filter conditions  
+3. **Handle tag filters** → Different tag table references per event type
+4. **Union results** → Merge with proper ordering
+5. **Apply LIMIT** → Complex with UNION queries
+
+## Performance Implications
+
+### Single Table Advantages:
+- ✅ **Single query execution**
+- ✅ **One index strategy** 
+- ✅ **Simple LIMIT handling**
+- ✅ **Unified ORDER BY**
+- ✅ **No UNION overhead**
+
+### Multi-Table Disadvantages:
+- ❌ **Multiple query executions**
+- ❌ **UNION sorting overhead**
+- ❌ **Complex LIMIT application**
+- ❌ **Index fragmentation across tables**
+- ❌ **Result set merging complexity**
+
+## Specific REQ Filter Challenges
+
+### 1. LIMIT Handling with UNION
+```sql
+-- WRONG: Limit applies to each subquery
+(SELECT * FROM events_regular WHERE ... LIMIT 100)
+UNION ALL  
+(SELECT * FROM events_replaceable WHERE ... LIMIT 100)
+-- Could return 200 events!
+
+-- CORRECT: Limit applies to final result
+SELECT * FROM (
+    SELECT * FROM events_regular WHERE ...
+    UNION ALL
+    SELECT * FROM events_replaceable WHERE ...
+    ORDER BY created_at DESC, id ASC
+) LIMIT 100;
+-- But this sorts ALL results before limiting!
+```
+
+### 2. Tag Filter Complexity
+Each event type needs different tag table joins:
+- `events_regular` → `tags_regular`
+- `events_replaceable` → `tags_replaceable` (with composite key)
+- `events_addressable` → `tags_addressable` (with composite key)
+- `events_ephemeral` → `tags_ephemeral`
+
+### 3. Subscription State Management
+With multiple tables, subscription state becomes complex:
+- Which tables does this subscription monitor?
+- How to efficiently check new events across tables?
+- Different trigger/notification patterns per table
+
+## Alternative: Unified Event View
+
+### Hybrid Approach: Views Over Multi-Tables
+```sql
+-- Create unified view for queries
+CREATE VIEW all_events AS
+SELECT 
+    'regular' as event_type,
+    id, pubkey, created_at, kind, content, sig
+FROM events_regular
+UNION ALL
+SELECT 
+    'replaceable' as event_type,
+    id, pubkey, created_at, kind, content, sig  
+FROM events_replaceable
+UNION ALL
+SELECT 
+    'ephemeral' as event_type,
+    id, pubkey, created_at, kind, content, sig
+FROM events_ephemeral  
+UNION ALL
+SELECT
+    'addressable' as event_type,
+    id, pubkey, created_at, kind, content, sig
+FROM events_addressable;
+
+-- Unified tag view
+CREATE VIEW all_tags AS  
+SELECT event_id, name, value, parameters FROM tags_regular
+UNION ALL
+SELECT CONCAT(event_pubkey, ':', event_kind), name, value, parameters FROM tags_replaceable
+UNION ALL  
+SELECT event_id, name, value, parameters FROM tags_ephemeral
+UNION ALL
+SELECT CONCAT(event_pubkey, ':', event_kind, ':', d_tag), name, value, parameters FROM tags_addressable;
+```
+
+### REQ Query Against Views:
+```sql
+-- Much simpler - back to single-table complexity
+SELECT DISTINCT e.*
+FROM all_events e
+JOIN all_tags t ON e.id = t.event_id
+WHERE e.pubkey IN (?)
+  AND e.kind IN (?)
+  AND t.name = 'p' AND t.value = ?
+ORDER BY e.created_at DESC, e.id ASC
+LIMIT ?;
+```
+
+## Recommendation
+
+**The multi-table approach creates significant subscription query complexity that may outweigh its benefits.**
+
+### Key Issues:
+1. **REQ filters don't map to event types** - clients filter by kind, author, tags, not storage semantics
+2. **UNION query complexity** - much harder to optimize and implement  
+3. **Subscription management burden** - must monitor multiple tables
+4. **Performance uncertainty** - UNION queries may be slower than single table
+
+### Alternative Recommendation:
+
+**Modified Single Table with Event Type Column:**
+
+```sql
+CREATE TABLE events (
+    id TEXT PRIMARY KEY,
+    pubkey TEXT NOT NULL,
+    created_at INTEGER NOT NULL,
+    kind INTEGER NOT NULL,
+    event_type TEXT NOT NULL, -- 'regular', 'replaceable', 'ephemeral', 'addressable'
+    content TEXT NOT NULL,
+    sig TEXT NOT NULL,
+    tags JSON,
+    
+    -- Replaceable event fields
+    replaced_at INTEGER,
+    
+    -- Addressable event fields  
+    d_tag TEXT,
+    
+    -- Unique constraints per event type
+    CONSTRAINT unique_replaceable 
+        UNIQUE (pubkey, kind) WHERE event_type = 'replaceable',
+    CONSTRAINT unique_addressable
+        UNIQUE (pubkey, kind, d_tag) WHERE event_type = 'addressable'
+);
+```
+
+### Benefits:
+- ✅ **Simple REQ queries** - single table, familiar patterns
+- ✅ **Type enforcement** - partial unique constraints handle replacement logic  
+- ✅ **Performance** - unified indexes, no UNIONs
+- ✅ **Implementation simplicity** - minimal changes from current code
+- ✅ **Future flexibility** - can split tables later if needed
+
+This approach gets the best of both worlds: protocol compliance through constraints, but query simplicity through unified storage.