# Subscription Query Complexity Analysis ## Overview This document analyzes how Nostr REQ subscription filters would be implemented across different schema designs, focusing on query complexity, performance implications, and implementation burden. ## Nostr REQ Filter Specification Recap Clients send REQ messages with filters containing: - **`ids`**: List of specific event IDs - **`authors`**: List of pubkeys - **`kinds`**: List of event kinds - **`#`**: Tag filters (e.g., `#e` for event refs, `#p` for pubkey mentions) - **`since`/`until`**: Time range filters - **`limit`**: Maximum events to return ### Key Filter Behaviors: - **Multiple filters = OR logic**: Match any filter - **Within filter = AND logic**: Match all specified conditions - **Lists = IN logic**: Match any value in the list - **Tag filters**: Must have at least one matching tag ## Schema Comparison for REQ Handling ### Current Simple Schema (Single Table) ```sql CREATE TABLE event ( id TEXT PRIMARY KEY, pubkey TEXT NOT NULL, created_at INTEGER NOT NULL, kind INTEGER NOT NULL, content TEXT NOT NULL, sig TEXT NOT NULL ); CREATE TABLE tag ( id TEXT NOT NULL, -- event ID name TEXT NOT NULL, value TEXT NOT NULL, parameters TEXT ); ``` #### Sample REQ Query Implementation: ```sql -- Filter: {"authors": ["pubkey1", "pubkey2"], "kinds": [1, 6], "#p": ["target_pubkey"]} SELECT DISTINCT e.* FROM event e WHERE e.pubkey IN ('pubkey1', 'pubkey2') AND e.kind IN (1, 6) AND EXISTS ( SELECT 1 FROM tag t WHERE t.id = e.id AND t.name = 'p' AND t.value = 'target_pubkey' ) ORDER BY e.created_at DESC, e.id ASC LIMIT ?; ``` ### Multi-Table Schema Challenge With separate tables (`events_regular`, `events_replaceable`, `events_ephemeral`, `events_addressable`), a REQ filter could potentially match events across ALL tables. #### Problem Example: Filter: `{"kinds": [1, 0, 20001, 30023]}` - Kind 1 → `events_regular` - Kind 0 → `events_replaceable` - Kind 20001 → `events_ephemeral` - Kind 30023 → `events_addressable` This requires **4 separate queries + UNION**, significantly complicating the implementation. ## Multi-Table Query Complexity ### Scenario 1: Cross-Table Kind Filter ```sql -- Filter: {"kinds": [1, 0, 30023]} -- Requires querying 3 different tables SELECT id, pubkey, created_at, kind, content, sig FROM events_regular WHERE kind = 1 UNION ALL SELECT id, pubkey, created_at, kind, content, sig FROM events_replaceable WHERE kind = 0 UNION ALL SELECT id, pubkey, created_at, kind, content, sig FROM events_addressable WHERE kind = 30023 ORDER BY created_at DESC, id ASC LIMIT ?; ``` ### Scenario 2: Cross-Table Author Filter ```sql -- Filter: {"authors": ["pubkey1"]} -- Must check ALL tables for this author SELECT id, pubkey, created_at, kind, content, sig FROM events_regular WHERE pubkey = 'pubkey1' UNION ALL SELECT id, pubkey, created_at, kind, content, sig FROM events_replaceable WHERE pubkey = 'pubkey1' UNION ALL SELECT id, pubkey, created_at, kind, content, sig FROM events_ephemeral WHERE pubkey = 'pubkey1' UNION ALL SELECT id, pubkey, created_at, kind, content, sig FROM events_addressable WHERE pubkey = 'pubkey1' ORDER BY created_at DESC, id ASC LIMIT ?; ``` ### Scenario 3: Complex Multi-Condition Filter ```sql -- Filter: {"authors": ["pubkey1"], "kinds": [1, 0], "#p": ["target"], "since": 1234567890} -- Extremely complex with multiple UNIONs and tag JOINs WITH regular_results AS ( SELECT DISTINCT r.* FROM events_regular r JOIN tags_regular tr ON r.id = tr.event_id WHERE r.pubkey = 'pubkey1' AND r.kind = 1 AND r.created_at >= 1234567890 AND tr.name = 'p' AND tr.value = 'target' ), replaceable_results AS ( SELECT DISTINCT rp.* FROM events_replaceable rp JOIN tags_replaceable trp ON (rp.pubkey, rp.kind) = (trp.event_pubkey, trp.event_kind) WHERE rp.pubkey = 'pubkey1' AND rp.kind = 0 AND rp.created_at >= 1234567890 AND trp.name = 'p' AND trp.value = 'target' ) SELECT * FROM regular_results UNION ALL SELECT * FROM replaceable_results ORDER BY created_at DESC, id ASC LIMIT ?; ``` ## Implementation Burden Analysis ### Single Table Approach ```c // Simple - one query builder function char* build_filter_query(cJSON* filters) { // Build single SELECT with WHERE conditions // Single ORDER BY and LIMIT // One execution path } ``` ### Multi-Table Approach ```c // Complex - requires routing and union logic char* build_multi_table_query(cJSON* filters) { // 1. Analyze kinds to determine which tables to query // 2. Split filters per table type // 3. Build separate queries for each table // 4. Union results with complex ORDER BY // 5. Handle LIMIT across UNION (tricky!) } typedef struct { bool needs_regular; bool needs_replaceable; bool needs_ephemeral; bool needs_addressable; cJSON* regular_filter; cJSON* replaceable_filter; cJSON* ephemeral_filter; cJSON* addressable_filter; } filter_routing_t; ``` ### Query Routing Complexity For each REQ filter, we must: 1. **Analyze kinds** → Determine which tables to query 2. **Split filters** → Create per-table filter conditions 3. **Handle tag filters** → Different tag table references per event type 4. **Union results** → Merge with proper ordering 5. **Apply LIMIT** → Complex with UNION queries ## Performance Implications ### Single Table Advantages: - ✅ **Single query execution** - ✅ **One index strategy** - ✅ **Simple LIMIT handling** - ✅ **Unified ORDER BY** - ✅ **No UNION overhead** ### Multi-Table Disadvantages: - ❌ **Multiple query executions** - ❌ **UNION sorting overhead** - ❌ **Complex LIMIT application** - ❌ **Index fragmentation across tables** - ❌ **Result set merging complexity** ## Specific REQ Filter Challenges ### 1. LIMIT Handling with UNION ```sql -- WRONG: Limit applies to each subquery (SELECT * FROM events_regular WHERE ... LIMIT 100) UNION ALL (SELECT * FROM events_replaceable WHERE ... LIMIT 100) -- Could return 200 events! -- CORRECT: Limit applies to final result SELECT * FROM ( SELECT * FROM events_regular WHERE ... UNION ALL SELECT * FROM events_replaceable WHERE ... ORDER BY created_at DESC, id ASC ) LIMIT 100; -- But this sorts ALL results before limiting! ``` ### 2. Tag Filter Complexity Each event type needs different tag table joins: - `events_regular` → `tags_regular` - `events_replaceable` → `tags_replaceable` (with composite key) - `events_addressable` → `tags_addressable` (with composite key) - `events_ephemeral` → `tags_ephemeral` ### 3. Subscription State Management With multiple tables, subscription state becomes complex: - Which tables does this subscription monitor? - How to efficiently check new events across tables? - Different trigger/notification patterns per table ## Alternative: Unified Event View ### Hybrid Approach: Views Over Multi-Tables ```sql -- Create unified view for queries CREATE VIEW all_events AS SELECT 'regular' as event_type, id, pubkey, created_at, kind, content, sig FROM events_regular UNION ALL SELECT 'replaceable' as event_type, id, pubkey, created_at, kind, content, sig FROM events_replaceable UNION ALL SELECT 'ephemeral' as event_type, id, pubkey, created_at, kind, content, sig FROM events_ephemeral UNION ALL SELECT 'addressable' as event_type, id, pubkey, created_at, kind, content, sig FROM events_addressable; -- Unified tag view CREATE VIEW all_tags AS SELECT event_id, name, value, parameters FROM tags_regular UNION ALL SELECT CONCAT(event_pubkey, ':', event_kind), name, value, parameters FROM tags_replaceable UNION ALL SELECT event_id, name, value, parameters FROM tags_ephemeral UNION ALL SELECT CONCAT(event_pubkey, ':', event_kind, ':', d_tag), name, value, parameters FROM tags_addressable; ``` ### REQ Query Against Views: ```sql -- Much simpler - back to single-table complexity SELECT DISTINCT e.* FROM all_events e JOIN all_tags t ON e.id = t.event_id WHERE e.pubkey IN (?) AND e.kind IN (?) AND t.name = 'p' AND t.value = ? ORDER BY e.created_at DESC, e.id ASC LIMIT ?; ``` ## Recommendation **The multi-table approach creates significant subscription query complexity that may outweigh its benefits.** ### Key Issues: 1. **REQ filters don't map to event types** - clients filter by kind, author, tags, not storage semantics 2. **UNION query complexity** - much harder to optimize and implement 3. **Subscription management burden** - must monitor multiple tables 4. **Performance uncertainty** - UNION queries may be slower than single table ### Alternative Recommendation: **Modified Single Table with Event Type Column:** ```sql CREATE TABLE events ( id TEXT PRIMARY KEY, pubkey TEXT NOT NULL, created_at INTEGER NOT NULL, kind INTEGER NOT NULL, event_type TEXT NOT NULL, -- 'regular', 'replaceable', 'ephemeral', 'addressable' content TEXT NOT NULL, sig TEXT NOT NULL, tags JSON, -- Replaceable event fields replaced_at INTEGER, -- Addressable event fields d_tag TEXT, -- Unique constraints per event type CONSTRAINT unique_replaceable UNIQUE (pubkey, kind) WHERE event_type = 'replaceable', CONSTRAINT unique_addressable UNIQUE (pubkey, kind, d_tag) WHERE event_type = 'addressable' ); ``` ### Benefits: - ✅ **Simple REQ queries** - single table, familiar patterns - ✅ **Type enforcement** - partial unique constraints handle replacement logic - ✅ **Performance** - unified indexes, no UNIONs - ✅ **Implementation simplicity** - minimal changes from current code - ✅ **Future flexibility** - can split tables later if needed This approach gets the best of both worlds: protocol compliance through constraints, but query simplicity through unified storage.