Files
c-relay/docs/subscription_query_analysis.md
2025-09-04 07:10:13 -04:00

9.7 KiB

Subscription Query Complexity Analysis

Overview

This document analyzes how Nostr REQ subscription filters would be implemented across different schema designs, focusing on query complexity, performance implications, and implementation burden.

Nostr REQ Filter Specification Recap

Clients send REQ messages with filters containing:

  • ids: List of specific event IDs
  • authors: List of pubkeys
  • kinds: List of event kinds
  • #<letter>: Tag filters (e.g., #e for event refs, #p for pubkey mentions)
  • since/until: Time range filters
  • limit: Maximum events to return

Key Filter Behaviors:

  • Multiple filters = OR logic: Match any filter
  • Within filter = AND logic: Match all specified conditions
  • Lists = IN logic: Match any value in the list
  • Tag filters: Must have at least one matching tag

Schema Comparison for REQ Handling

Current Simple Schema (Single Table)

CREATE TABLE event (
    id TEXT PRIMARY KEY,
    pubkey TEXT NOT NULL,
    created_at INTEGER NOT NULL,
    kind INTEGER NOT NULL,
    content TEXT NOT NULL,
    sig TEXT NOT NULL
);

CREATE TABLE tag (
    id TEXT NOT NULL, -- event ID
    name TEXT NOT NULL,
    value TEXT NOT NULL,
    parameters TEXT
);

Sample REQ Query Implementation:

-- Filter: {"authors": ["pubkey1", "pubkey2"], "kinds": [1, 6], "#p": ["target_pubkey"]}
SELECT DISTINCT e.*
FROM event e
WHERE e.pubkey IN ('pubkey1', 'pubkey2')
  AND e.kind IN (1, 6)
  AND EXISTS (
      SELECT 1 FROM tag t 
      WHERE t.id = e.id AND t.name = 'p' AND t.value = 'target_pubkey'
  )
ORDER BY e.created_at DESC, e.id ASC
LIMIT ?;

Multi-Table Schema Challenge

With separate tables (events_regular, events_replaceable, events_ephemeral, events_addressable), a REQ filter could potentially match events across ALL tables.

Problem Example:

Filter: {"kinds": [1, 0, 20001, 30023]}

  • Kind 1 → events_regular
  • Kind 0 → events_replaceable
  • Kind 20001 → events_ephemeral
  • Kind 30023 → events_addressable

This requires 4 separate queries + UNION, significantly complicating the implementation.

Multi-Table Query Complexity

Scenario 1: Cross-Table Kind Filter

-- Filter: {"kinds": [1, 0, 30023]}
-- Requires querying 3 different tables

SELECT id, pubkey, created_at, kind, content, sig FROM events_regular 
WHERE kind = 1
UNION ALL
SELECT id, pubkey, created_at, kind, content, sig FROM events_replaceable
WHERE kind = 0  
UNION ALL
SELECT id, pubkey, created_at, kind, content, sig FROM events_addressable
WHERE kind = 30023
ORDER BY created_at DESC, id ASC
LIMIT ?;

Scenario 2: Cross-Table Author Filter

-- Filter: {"authors": ["pubkey1"]}
-- Must check ALL tables for this author

SELECT id, pubkey, created_at, kind, content, sig FROM events_regular
WHERE pubkey = 'pubkey1'
UNION ALL
SELECT id, pubkey, created_at, kind, content, sig FROM events_replaceable  
WHERE pubkey = 'pubkey1'
UNION ALL
SELECT id, pubkey, created_at, kind, content, sig FROM events_ephemeral
WHERE pubkey = 'pubkey1'
UNION ALL  
SELECT id, pubkey, created_at, kind, content, sig FROM events_addressable
WHERE pubkey = 'pubkey1'
ORDER BY created_at DESC, id ASC
LIMIT ?;

Scenario 3: Complex Multi-Condition Filter

-- Filter: {"authors": ["pubkey1"], "kinds": [1, 0], "#p": ["target"], "since": 1234567890}
-- Extremely complex with multiple UNIONs and tag JOINs

WITH regular_results AS (
    SELECT DISTINCT r.*
    FROM events_regular r
    JOIN tags_regular tr ON r.id = tr.event_id
    WHERE r.pubkey = 'pubkey1'
      AND r.kind = 1
      AND r.created_at >= 1234567890
      AND tr.name = 'p' AND tr.value = 'target'
),
replaceable_results AS (
    SELECT DISTINCT rp.*
    FROM events_replaceable rp  
    JOIN tags_replaceable trp ON (rp.pubkey, rp.kind) = (trp.event_pubkey, trp.event_kind)
    WHERE rp.pubkey = 'pubkey1'
      AND rp.kind = 0
      AND rp.created_at >= 1234567890
      AND trp.name = 'p' AND trp.value = 'target'
)
SELECT * FROM regular_results
UNION ALL
SELECT * FROM replaceable_results
ORDER BY created_at DESC, id ASC
LIMIT ?;

Implementation Burden Analysis

Single Table Approach

// Simple - one query builder function
char* build_filter_query(cJSON* filters) {
    // Build single SELECT with WHERE conditions
    // Single ORDER BY and LIMIT
    // One execution path
}

Multi-Table Approach

// Complex - requires routing and union logic  
char* build_multi_table_query(cJSON* filters) {
    // 1. Analyze kinds to determine which tables to query
    // 2. Split filters per table type
    // 3. Build separate queries for each table
    // 4. Union results with complex ORDER BY
    // 5. Handle LIMIT across UNION (tricky!)
}

typedef struct {
    bool needs_regular;
    bool needs_replaceable; 
    bool needs_ephemeral;
    bool needs_addressable;
    cJSON* regular_filter;
    cJSON* replaceable_filter;
    cJSON* ephemeral_filter;
    cJSON* addressable_filter;
} filter_routing_t;

Query Routing Complexity

For each REQ filter, we must:

  1. Analyze kinds → Determine which tables to query
  2. Split filters → Create per-table filter conditions
  3. Handle tag filters → Different tag table references per event type
  4. Union results → Merge with proper ordering
  5. Apply LIMIT → Complex with UNION queries

Performance Implications

Single Table Advantages:

  • Single query execution
  • One index strategy
  • Simple LIMIT handling
  • Unified ORDER BY
  • No UNION overhead

Multi-Table Disadvantages:

  • Multiple query executions
  • UNION sorting overhead
  • Complex LIMIT application
  • Index fragmentation across tables
  • Result set merging complexity

Specific REQ Filter Challenges

1. LIMIT Handling with UNION

-- WRONG: Limit applies to each subquery
(SELECT * FROM events_regular WHERE ... LIMIT 100)
UNION ALL  
(SELECT * FROM events_replaceable WHERE ... LIMIT 100)
-- Could return 200 events!

-- CORRECT: Limit applies to final result
SELECT * FROM (
    SELECT * FROM events_regular WHERE ...
    UNION ALL
    SELECT * FROM events_replaceable WHERE ...
    ORDER BY created_at DESC, id ASC
) LIMIT 100;
-- But this sorts ALL results before limiting!

2. Tag Filter Complexity

Each event type needs different tag table joins:

  • events_regulartags_regular
  • events_replaceabletags_replaceable (with composite key)
  • events_addressabletags_addressable (with composite key)
  • events_ephemeraltags_ephemeral

3. Subscription State Management

With multiple tables, subscription state becomes complex:

  • Which tables does this subscription monitor?
  • How to efficiently check new events across tables?
  • Different trigger/notification patterns per table

Alternative: Unified Event View

Hybrid Approach: Views Over Multi-Tables

-- Create unified view for queries
CREATE VIEW all_events AS
SELECT 
    'regular' as event_type,
    id, pubkey, created_at, kind, content, sig
FROM events_regular
UNION ALL
SELECT 
    'replaceable' as event_type,
    id, pubkey, created_at, kind, content, sig  
FROM events_replaceable
UNION ALL
SELECT 
    'ephemeral' as event_type,
    id, pubkey, created_at, kind, content, sig
FROM events_ephemeral  
UNION ALL
SELECT
    'addressable' as event_type,
    id, pubkey, created_at, kind, content, sig
FROM events_addressable;

-- Unified tag view
CREATE VIEW all_tags AS  
SELECT event_id, name, value, parameters FROM tags_regular
UNION ALL
SELECT CONCAT(event_pubkey, ':', event_kind), name, value, parameters FROM tags_replaceable
UNION ALL  
SELECT event_id, name, value, parameters FROM tags_ephemeral
UNION ALL
SELECT CONCAT(event_pubkey, ':', event_kind, ':', d_tag), name, value, parameters FROM tags_addressable;

REQ Query Against Views:

-- Much simpler - back to single-table complexity
SELECT DISTINCT e.*
FROM all_events e
JOIN all_tags t ON e.id = t.event_id
WHERE e.pubkey IN (?)
  AND e.kind IN (?)
  AND t.name = 'p' AND t.value = ?
ORDER BY e.created_at DESC, e.id ASC
LIMIT ?;

Recommendation

The multi-table approach creates significant subscription query complexity that may outweigh its benefits.

Key Issues:

  1. REQ filters don't map to event types - clients filter by kind, author, tags, not storage semantics
  2. UNION query complexity - much harder to optimize and implement
  3. Subscription management burden - must monitor multiple tables
  4. Performance uncertainty - UNION queries may be slower than single table

Alternative Recommendation:

Modified Single Table with Event Type Column:

CREATE TABLE events (
    id TEXT PRIMARY KEY,
    pubkey TEXT NOT NULL,
    created_at INTEGER NOT NULL,
    kind INTEGER NOT NULL,
    event_type TEXT NOT NULL, -- 'regular', 'replaceable', 'ephemeral', 'addressable'
    content TEXT NOT NULL,
    sig TEXT NOT NULL,
    tags JSON,
    
    -- Replaceable event fields
    replaced_at INTEGER,
    
    -- Addressable event fields  
    d_tag TEXT,
    
    -- Unique constraints per event type
    CONSTRAINT unique_replaceable 
        UNIQUE (pubkey, kind) WHERE event_type = 'replaceable',
    CONSTRAINT unique_addressable
        UNIQUE (pubkey, kind, d_tag) WHERE event_type = 'addressable'
);

Benefits:

  • Simple REQ queries - single table, familiar patterns
  • Type enforcement - partial unique constraints handle replacement logic
  • Performance - unified indexes, no UNIONs
  • Implementation simplicity - minimal changes from current code
  • Future flexibility - can split tables later if needed

This approach gets the best of both worlds: protocol compliance through constraints, but query simplicity through unified storage.