Files
c-relay/docs/subscription_query_analysis.md
2025-09-04 07:10:13 -04:00

331 lines
9.7 KiB
Markdown

# Subscription Query Complexity Analysis
## Overview
This document analyzes how Nostr REQ subscription filters would be implemented across different schema designs, focusing on query complexity, performance implications, and implementation burden.
## Nostr REQ Filter Specification Recap
Clients send REQ messages with filters containing:
- **`ids`**: List of specific event IDs
- **`authors`**: List of pubkeys
- **`kinds`**: List of event kinds
- **`#<letter>`**: Tag filters (e.g., `#e` for event refs, `#p` for pubkey mentions)
- **`since`/`until`**: Time range filters
- **`limit`**: Maximum events to return
### Key Filter Behaviors:
- **Multiple filters = OR logic**: Match any filter
- **Within filter = AND logic**: Match all specified conditions
- **Lists = IN logic**: Match any value in the list
- **Tag filters**: Must have at least one matching tag
## Schema Comparison for REQ Handling
### Current Simple Schema (Single Table)
```sql
CREATE TABLE event (
id TEXT PRIMARY KEY,
pubkey TEXT NOT NULL,
created_at INTEGER NOT NULL,
kind INTEGER NOT NULL,
content TEXT NOT NULL,
sig TEXT NOT NULL
);
CREATE TABLE tag (
id TEXT NOT NULL, -- event ID
name TEXT NOT NULL,
value TEXT NOT NULL,
parameters TEXT
);
```
#### Sample REQ Query Implementation:
```sql
-- Filter: {"authors": ["pubkey1", "pubkey2"], "kinds": [1, 6], "#p": ["target_pubkey"]}
SELECT DISTINCT e.*
FROM event e
WHERE e.pubkey IN ('pubkey1', 'pubkey2')
AND e.kind IN (1, 6)
AND EXISTS (
SELECT 1 FROM tag t
WHERE t.id = e.id AND t.name = 'p' AND t.value = 'target_pubkey'
)
ORDER BY e.created_at DESC, e.id ASC
LIMIT ?;
```
### Multi-Table Schema Challenge
With separate tables (`events_regular`, `events_replaceable`, `events_ephemeral`, `events_addressable`), a REQ filter could potentially match events across ALL tables.
#### Problem Example:
Filter: `{"kinds": [1, 0, 20001, 30023]}`
- Kind 1 → `events_regular`
- Kind 0 → `events_replaceable`
- Kind 20001 → `events_ephemeral`
- Kind 30023 → `events_addressable`
This requires **4 separate queries + UNION**, significantly complicating the implementation.
## Multi-Table Query Complexity
### Scenario 1: Cross-Table Kind Filter
```sql
-- Filter: {"kinds": [1, 0, 30023]}
-- Requires querying 3 different tables
SELECT id, pubkey, created_at, kind, content, sig FROM events_regular
WHERE kind = 1
UNION ALL
SELECT id, pubkey, created_at, kind, content, sig FROM events_replaceable
WHERE kind = 0
UNION ALL
SELECT id, pubkey, created_at, kind, content, sig FROM events_addressable
WHERE kind = 30023
ORDER BY created_at DESC, id ASC
LIMIT ?;
```
### Scenario 2: Cross-Table Author Filter
```sql
-- Filter: {"authors": ["pubkey1"]}
-- Must check ALL tables for this author
SELECT id, pubkey, created_at, kind, content, sig FROM events_regular
WHERE pubkey = 'pubkey1'
UNION ALL
SELECT id, pubkey, created_at, kind, content, sig FROM events_replaceable
WHERE pubkey = 'pubkey1'
UNION ALL
SELECT id, pubkey, created_at, kind, content, sig FROM events_ephemeral
WHERE pubkey = 'pubkey1'
UNION ALL
SELECT id, pubkey, created_at, kind, content, sig FROM events_addressable
WHERE pubkey = 'pubkey1'
ORDER BY created_at DESC, id ASC
LIMIT ?;
```
### Scenario 3: Complex Multi-Condition Filter
```sql
-- Filter: {"authors": ["pubkey1"], "kinds": [1, 0], "#p": ["target"], "since": 1234567890}
-- Extremely complex with multiple UNIONs and tag JOINs
WITH regular_results AS (
SELECT DISTINCT r.*
FROM events_regular r
JOIN tags_regular tr ON r.id = tr.event_id
WHERE r.pubkey = 'pubkey1'
AND r.kind = 1
AND r.created_at >= 1234567890
AND tr.name = 'p' AND tr.value = 'target'
),
replaceable_results AS (
SELECT DISTINCT rp.*
FROM events_replaceable rp
JOIN tags_replaceable trp ON (rp.pubkey, rp.kind) = (trp.event_pubkey, trp.event_kind)
WHERE rp.pubkey = 'pubkey1'
AND rp.kind = 0
AND rp.created_at >= 1234567890
AND trp.name = 'p' AND trp.value = 'target'
)
SELECT * FROM regular_results
UNION ALL
SELECT * FROM replaceable_results
ORDER BY created_at DESC, id ASC
LIMIT ?;
```
## Implementation Burden Analysis
### Single Table Approach
```c
// Simple - one query builder function
char* build_filter_query(cJSON* filters) {
// Build single SELECT with WHERE conditions
// Single ORDER BY and LIMIT
// One execution path
}
```
### Multi-Table Approach
```c
// Complex - requires routing and union logic
char* build_multi_table_query(cJSON* filters) {
// 1. Analyze kinds to determine which tables to query
// 2. Split filters per table type
// 3. Build separate queries for each table
// 4. Union results with complex ORDER BY
// 5. Handle LIMIT across UNION (tricky!)
}
typedef struct {
bool needs_regular;
bool needs_replaceable;
bool needs_ephemeral;
bool needs_addressable;
cJSON* regular_filter;
cJSON* replaceable_filter;
cJSON* ephemeral_filter;
cJSON* addressable_filter;
} filter_routing_t;
```
### Query Routing Complexity
For each REQ filter, we must:
1. **Analyze kinds** → Determine which tables to query
2. **Split filters** → Create per-table filter conditions
3. **Handle tag filters** → Different tag table references per event type
4. **Union results** → Merge with proper ordering
5. **Apply LIMIT** → Complex with UNION queries
## Performance Implications
### Single Table Advantages:
-**Single query execution**
-**One index strategy**
-**Simple LIMIT handling**
-**Unified ORDER BY**
-**No UNION overhead**
### Multi-Table Disadvantages:
-**Multiple query executions**
-**UNION sorting overhead**
-**Complex LIMIT application**
-**Index fragmentation across tables**
-**Result set merging complexity**
## Specific REQ Filter Challenges
### 1. LIMIT Handling with UNION
```sql
-- WRONG: Limit applies to each subquery
(SELECT * FROM events_regular WHERE ... LIMIT 100)
UNION ALL
(SELECT * FROM events_replaceable WHERE ... LIMIT 100)
-- Could return 200 events!
-- CORRECT: Limit applies to final result
SELECT * FROM (
SELECT * FROM events_regular WHERE ...
UNION ALL
SELECT * FROM events_replaceable WHERE ...
ORDER BY created_at DESC, id ASC
) LIMIT 100;
-- But this sorts ALL results before limiting!
```
### 2. Tag Filter Complexity
Each event type needs different tag table joins:
- `events_regular``tags_regular`
- `events_replaceable``tags_replaceable` (with composite key)
- `events_addressable``tags_addressable` (with composite key)
- `events_ephemeral``tags_ephemeral`
### 3. Subscription State Management
With multiple tables, subscription state becomes complex:
- Which tables does this subscription monitor?
- How to efficiently check new events across tables?
- Different trigger/notification patterns per table
## Alternative: Unified Event View
### Hybrid Approach: Views Over Multi-Tables
```sql
-- Create unified view for queries
CREATE VIEW all_events AS
SELECT
'regular' as event_type,
id, pubkey, created_at, kind, content, sig
FROM events_regular
UNION ALL
SELECT
'replaceable' as event_type,
id, pubkey, created_at, kind, content, sig
FROM events_replaceable
UNION ALL
SELECT
'ephemeral' as event_type,
id, pubkey, created_at, kind, content, sig
FROM events_ephemeral
UNION ALL
SELECT
'addressable' as event_type,
id, pubkey, created_at, kind, content, sig
FROM events_addressable;
-- Unified tag view
CREATE VIEW all_tags AS
SELECT event_id, name, value, parameters FROM tags_regular
UNION ALL
SELECT CONCAT(event_pubkey, ':', event_kind), name, value, parameters FROM tags_replaceable
UNION ALL
SELECT event_id, name, value, parameters FROM tags_ephemeral
UNION ALL
SELECT CONCAT(event_pubkey, ':', event_kind, ':', d_tag), name, value, parameters FROM tags_addressable;
```
### REQ Query Against Views:
```sql
-- Much simpler - back to single-table complexity
SELECT DISTINCT e.*
FROM all_events e
JOIN all_tags t ON e.id = t.event_id
WHERE e.pubkey IN (?)
AND e.kind IN (?)
AND t.name = 'p' AND t.value = ?
ORDER BY e.created_at DESC, e.id ASC
LIMIT ?;
```
## Recommendation
**The multi-table approach creates significant subscription query complexity that may outweigh its benefits.**
### Key Issues:
1. **REQ filters don't map to event types** - clients filter by kind, author, tags, not storage semantics
2. **UNION query complexity** - much harder to optimize and implement
3. **Subscription management burden** - must monitor multiple tables
4. **Performance uncertainty** - UNION queries may be slower than single table
### Alternative Recommendation:
**Modified Single Table with Event Type Column:**
```sql
CREATE TABLE events (
id TEXT PRIMARY KEY,
pubkey TEXT NOT NULL,
created_at INTEGER NOT NULL,
kind INTEGER NOT NULL,
event_type TEXT NOT NULL, -- 'regular', 'replaceable', 'ephemeral', 'addressable'
content TEXT NOT NULL,
sig TEXT NOT NULL,
tags JSON,
-- Replaceable event fields
replaced_at INTEGER,
-- Addressable event fields
d_tag TEXT,
-- Unique constraints per event type
CONSTRAINT unique_replaceable
UNIQUE (pubkey, kind) WHERE event_type = 'replaceable',
CONSTRAINT unique_addressable
UNIQUE (pubkey, kind, d_tag) WHERE event_type = 'addressable'
);
```
### Benefits:
-**Simple REQ queries** - single table, familiar patterns
-**Type enforcement** - partial unique constraints handle replacement logic
-**Performance** - unified indexes, no UNIONs
-**Implementation simplicity** - minimal changes from current code
-**Future flexibility** - can split tables later if needed
This approach gets the best of both worlds: protocol compliance through constraints, but query simplicity through unified storage.