c-relay/docs/subscription_query_analysis.md

# Subscription Query Complexity Analysis

## Overview

This document analyzes how Nostr REQ subscription filters would be implemented across different schema designs, focusing on query complexity, performance implications, and implementation burden.

## Nostr REQ Filter Specification Recap

Clients send REQ messages with filters containing:
- **`ids`**: List of specific event IDs
- **`authors`**: List of pubkeys
- **`kinds`**: List of event kinds
- **`#<letter>`**: Tag filters (e.g., `#e` for event refs, `#p` for pubkey mentions)
- **`since`/`until`**: Time range filters
- **`limit`**: Maximum events to return

### Key Filter Behaviors:
- **Multiple filters = OR logic**: Match any filter
- **Within filter = AND logic**: Match all specified conditions
- **Lists = IN logic**: Match any value in the list
- **Tag filters**: Must have at least one matching tag

## Schema Comparison for REQ Handling

### Current Simple Schema (Single Table)
```sql
CREATE TABLE event (
    id TEXT PRIMARY KEY,
    pubkey TEXT NOT NULL,
    created_at INTEGER NOT NULL,
    kind INTEGER NOT NULL,
    content TEXT NOT NULL,
    sig TEXT NOT NULL
);

CREATE TABLE tag (
    id TEXT NOT NULL, -- event ID
    name TEXT NOT NULL,
    value TEXT NOT NULL,
    parameters TEXT
);
```

#### Sample REQ Query Implementation:
```sql
-- Filter: {"authors": ["pubkey1", "pubkey2"], "kinds": [1, 6], "#p": ["target_pubkey"]}
SELECT DISTINCT e.*
FROM event e
WHERE e.pubkey IN ('pubkey1', 'pubkey2')
  AND e.kind IN (1, 6)
  AND EXISTS (
      SELECT 1 FROM tag t
      WHERE t.id = e.id AND t.name = 'p' AND t.value = 'target_pubkey'
  )
ORDER BY e.created_at DESC, e.id ASC
LIMIT ?;
```

### Multi-Table Schema Challenge

With separate tables (`events_regular`, `events_replaceable`, `events_ephemeral`, `events_addressable`), a REQ filter could potentially match events across ALL tables.

#### Problem Example:
Filter: `{"kinds": [1, 0, 20001, 30023]}`
- Kind 1 → `events_regular`
- Kind 0 → `events_replaceable`
- Kind 20001 → `events_ephemeral`
- Kind 30023 → `events_addressable`

This requires **4 separate queries + UNION**, significantly complicating the implementation.

## Multi-Table Query Complexity

### Scenario 1: Cross-Table Kind Filter
```sql
-- Filter: {"kinds": [1, 0, 30023]}
-- Requires querying 3 different tables

SELECT id, pubkey, created_at, kind, content, sig FROM events_regular
WHERE kind = 1
UNION ALL
SELECT id, pubkey, created_at, kind, content, sig FROM events_replaceable
WHERE kind = 0
UNION ALL
SELECT id, pubkey, created_at, kind, content, sig FROM events_addressable
WHERE kind = 30023
ORDER BY created_at DESC, id ASC
LIMIT ?;
```

### Scenario 2: Cross-Table Author Filter
```sql
-- Filter: {"authors": ["pubkey1"]}
-- Must check ALL tables for this author

SELECT id, pubkey, created_at, kind, content, sig FROM events_regular
WHERE pubkey = 'pubkey1'
UNION ALL
SELECT id, pubkey, created_at, kind, content, sig FROM events_replaceable
WHERE pubkey = 'pubkey1'
UNION ALL
SELECT id, pubkey, created_at, kind, content, sig FROM events_ephemeral
WHERE pubkey = 'pubkey1'
UNION ALL
SELECT id, pubkey, created_at, kind, content, sig FROM events_addressable
WHERE pubkey = 'pubkey1'
ORDER BY created_at DESC, id ASC
LIMIT ?;
```

### Scenario 3: Complex Multi-Condition Filter
```sql
-- Filter: {"authors": ["pubkey1"], "kinds": [1, 0], "#p": ["target"], "since": 1234567890}
-- Extremely complex with multiple UNIONs and tag JOINs

WITH regular_results AS (
    SELECT DISTINCT r.*
    FROM events_regular r
    JOIN tags_regular tr ON r.id = tr.event_id
    WHERE r.pubkey = 'pubkey1'
      AND r.kind = 1
      AND r.created_at >= 1234567890
      AND tr.name = 'p' AND tr.value = 'target'
),
replaceable_results AS (
    SELECT DISTINCT rp.*
    FROM events_replaceable rp
    JOIN tags_replaceable trp ON (rp.pubkey, rp.kind) = (trp.event_pubkey, trp.event_kind)
    WHERE rp.pubkey = 'pubkey1'
      AND rp.kind = 0
      AND rp.created_at >= 1234567890
      AND trp.name = 'p' AND trp.value = 'target'
)
SELECT * FROM regular_results
UNION ALL
SELECT * FROM replaceable_results
ORDER BY created_at DESC, id ASC
LIMIT ?;
```

## Implementation Burden Analysis

### Single Table Approach
```c
// Simple - one query builder function
char* build_filter_query(cJSON* filters) {
    // Build single SELECT with WHERE conditions
    // Single ORDER BY and LIMIT
    // One execution path
}
```

### Multi-Table Approach
```c
// Complex - requires routing and union logic
char* build_multi_table_query(cJSON* filters) {
    // 1. Analyze kinds to determine which tables to query
    // 2. Split filters per table type
    // 3. Build separate queries for each table
    // 4. Union results with complex ORDER BY
    // 5. Handle LIMIT across UNION (tricky!)
}

typedef struct {
    bool needs_regular;
    bool needs_replaceable;
    bool needs_ephemeral;
    bool needs_addressable;
    cJSON* regular_filter;
    cJSON* replaceable_filter;
    cJSON* ephemeral_filter;
    cJSON* addressable_filter;
} filter_routing_t;
```

### Query Routing Complexity

For each REQ filter, we must:

1. **Analyze kinds** → Determine which tables to query
2. **Split filters** → Create per-table filter conditions
3. **Handle tag filters** → Different tag table references per event type
4. **Union results** → Merge with proper ordering
5. **Apply LIMIT** → Complex with UNION queries

## Performance Implications

### Single Table Advantages:
- ✅ **Single query execution**
- ✅ **One index strategy**
- ✅ **Simple LIMIT handling**
- ✅ **Unified ORDER BY**
- ✅ **No UNION overhead**

### Multi-Table Disadvantages:
- ❌ **Multiple query executions**
- ❌ **UNION sorting overhead**
- ❌ **Complex LIMIT application**
- ❌ **Index fragmentation across tables**
- ❌ **Result set merging complexity**

## Specific REQ Filter Challenges

### 1. LIMIT Handling with UNION
```sql
-- WRONG: Limit applies to each subquery
(SELECT * FROM events_regular WHERE ... LIMIT 100)
UNION ALL
(SELECT * FROM events_replaceable WHERE ... LIMIT 100)
-- Could return 200 events!

-- CORRECT: Limit applies to final result
SELECT * FROM (
    SELECT * FROM events_regular WHERE ...
    UNION ALL
    SELECT * FROM events_replaceable WHERE ...
    ORDER BY created_at DESC, id ASC
) LIMIT 100;
-- But this sorts ALL results before limiting!
```

### 2. Tag Filter Complexity
Each event type needs different tag table joins:
- `events_regular` → `tags_regular`
- `events_replaceable` → `tags_replaceable` (with composite key)
- `events_addressable` → `tags_addressable` (with composite key)
- `events_ephemeral` → `tags_ephemeral`

### 3. Subscription State Management
With multiple tables, subscription state becomes complex:
- Which tables does this subscription monitor?
- How to efficiently check new events across tables?
- Different trigger/notification patterns per table

## Alternative: Unified Event View

### Hybrid Approach: Views Over Multi-Tables
```sql
-- Create unified view for queries
CREATE VIEW all_events AS
SELECT
    'regular' as event_type,
    id, pubkey, created_at, kind, content, sig
FROM events_regular
UNION ALL
SELECT
    'replaceable' as event_type,
    id, pubkey, created_at, kind, content, sig
FROM events_replaceable
UNION ALL
SELECT
    'ephemeral' as event_type,
    id, pubkey, created_at, kind, content, sig
FROM events_ephemeral
UNION ALL
SELECT
    'addressable' as event_type,
    id, pubkey, created_at, kind, content, sig
FROM events_addressable;

-- Unified tag view
CREATE VIEW all_tags AS
SELECT event_id, name, value, parameters FROM tags_regular
UNION ALL
SELECT CONCAT(event_pubkey, ':', event_kind), name, value, parameters FROM tags_replaceable
UNION ALL
SELECT event_id, name, value, parameters FROM tags_ephemeral
UNION ALL
SELECT CONCAT(event_pubkey, ':', event_kind, ':', d_tag), name, value, parameters FROM tags_addressable;
```

### REQ Query Against Views:
```sql
-- Much simpler - back to single-table complexity
SELECT DISTINCT e.*
FROM all_events e
JOIN all_tags t ON e.id = t.event_id
WHERE e.pubkey IN (?)
  AND e.kind IN (?)
  AND t.name = 'p' AND t.value = ?
ORDER BY e.created_at DESC, e.id ASC
LIMIT ?;
```

## Recommendation

**The multi-table approach creates significant subscription query complexity that may outweigh its benefits.**

### Key Issues:
1. **REQ filters don't map to event types** - clients filter by kind, author, tags, not storage semantics
2. **UNION query complexity** - much harder to optimize and implement
3. **Subscription management burden** - must monitor multiple tables
4. **Performance uncertainty** - UNION queries may be slower than single table

### Alternative Recommendation:

**Modified Single Table with Event Type Column:**

```sql
CREATE TABLE events (
    id TEXT PRIMARY KEY,
    pubkey TEXT NOT NULL,
    created_at INTEGER NOT NULL,
    kind INTEGER NOT NULL,
    event_type TEXT NOT NULL, -- 'regular', 'replaceable', 'ephemeral', 'addressable'
    content TEXT NOT NULL,
    sig TEXT NOT NULL,
    tags JSON,

    -- Replaceable event fields
    replaced_at INTEGER,

    -- Addressable event fields
    d_tag TEXT,

    -- Unique constraints per event type
    CONSTRAINT unique_replaceable
        UNIQUE (pubkey, kind) WHERE event_type = 'replaceable',
    CONSTRAINT unique_addressable
        UNIQUE (pubkey, kind, d_tag) WHERE event_type = 'addressable'
);
```

### Benefits:
- ✅ **Simple REQ queries** - single table, familiar patterns
- ✅ **Type enforcement** - partial unique constraints handle replacement logic
- ✅ **Performance** - unified indexes, no UNIONs
- ✅ **Implementation simplicity** - minimal changes from current code
- ✅ **Future flexibility** - can split tables later if needed

This approach gets the best of both worlds: protocol compliance through constraints, but query simplicity through unified storage.