Files
c-relay/docs/realtime_monitoring_design.md
2025-10-18 14:48:16 -04:00

1189 lines
31 KiB
Markdown

# Real-Time Admin Monitoring System Design
## Kind 34567 Addressable Events
**Version:** 1.0
**Date:** 2025-10-16
**Status:** Design Phase
---
## Table of Contents
1. [Overview](#overview)
2. [Event Structure](#event-structure)
3. [Monitoring Data Types](#monitoring-data-types)
4. [Periodic Query System](#periodic-query-system)
5. [Trigger-Based Updates](#trigger-based-updates)
6. [Configuration System](#configuration-system)
7. [Load Management](#load-management)
8. [Frontend Integration](#frontend-integration)
9. [Database Schema](#database-schema)
10. [Implementation Plan](#implementation-plan)
11. [Security Considerations](#security-considerations)
---
## Overview
### Purpose
Create a real-time monitoring system that allows administrators to subscribe to live relay statistics and metrics through kind 34567 addressable events. The system generates events periodically or based on triggers, enabling admin dashboards to display continuously updated data without polling.
### Key Features
- **Addressable Events**: Use kind 34567 with "d" tags to identify different data types
- **Periodic Updates**: Configurable intervals for different metric types
- **Trigger-Based Updates**: Immediate updates on significant events
- **Load-Aware**: Adjusts update frequency based on relay load
- **Subscription-Based**: Admins subscribe once and receive continuous updates
- **Configurable**: Enable/disable features and control update frequencies
### Architecture Principles
1. **Non-Blocking**: Monitoring must not impact relay performance
2. **Efficient**: Minimize database queries and CPU usage
3. **Scalable**: Handle multiple concurrent admin subscriptions
4. **Flexible**: Easy to add new monitoring data types
5. **Secure**: Only authorized admins can access monitoring data
---
## Event Structure
### Kind 34567 Event Format
```json
{
"id": "<event_id>",
"pubkey": "<relay_pubkey>",
"created_at": 1697123456,
"kind": 34567,
"content": "<json_data>",
"tags": [
["d", "<data_type>"],
["relay", "<relay_pubkey>"],
["interval", "<update_interval_seconds>"],
["version", "1"]
],
"sig": "<signature>"
}
```
### Tag Specifications
#### Required Tags
- **`d` tag**: Identifies the data type (e.g., "time_stats", "event_kinds", "connections")
- **`relay` tag**: Relay's public key for filtering
- **`version` tag**: Schema version for forward compatibility
#### Optional Tags
- **`interval` tag**: Update interval in seconds (for periodic updates)
- **`trigger` tag**: Trigger type for event-driven updates (e.g., "threshold", "event")
- **`priority` tag**: Update priority ("high", "normal", "low")
### Content Structure
The `content` field contains JSON-encoded monitoring data specific to each data type:
```json
{
"data_type": "time_stats",
"timestamp": 1697123456,
"data": {
// Data type-specific fields
},
"metadata": {
"query_time_ms": 15,
"cached": false
}
}
```
---
## Monitoring Data Types
### 1. Time-Based Statistics (`d=time_stats`)
**Update Frequency**: Every 60 seconds (configurable)
**Trigger**: None (periodic only)
```json
{
"data_type": "time_stats",
"timestamp": 1697123456,
"data": {
"total_events": 125000,
"last_24h": 5420,
"last_7d": 32100,
"last_30d": 98500,
"events_per_hour_24h": 225.8,
"events_per_day_7d": 4585.7
},
"metadata": {
"query_time_ms": 12,
"cached": false
}
}
```
### 2. Event Kind Distribution (`d=event_kinds`)
**Update Frequency**: Every 120 seconds (configurable)
**Trigger**: Significant change (>10% shift in distribution)
```json
{
"data_type": "event_kinds",
"timestamp": 1697123456,
"data": {
"distribution": [
{"kind": 1, "count": 45000, "percentage": 36.0},
{"kind": 3, "count": 12500, "percentage": 10.0},
{"kind": 7, "count": 8900, "percentage": 7.1}
],
"total_kinds": 15,
"most_active_kind": 1
},
"metadata": {
"query_time_ms": 18,
"cached": false
}
}
```
### 3. Top Publishers (`d=top_pubkeys`)
**Update Frequency**: Every 300 seconds (configurable)
**Trigger**: New top-10 entry
```json
{
"data_type": "top_pubkeys",
"timestamp": 1697123456,
"data": {
"top_publishers": [
{
"pubkey": "abc123...",
"event_count": 5420,
"percentage": 4.3,
"last_event_at": 1697123400
}
],
"total_unique_pubkeys": 8542
},
"metadata": {
"query_time_ms": 25,
"cached": false
}
}
```
### 4. Active Connections (`d=connections`)
**Update Frequency**: Every 30 seconds (configurable)
**Trigger**: Connection count change >10%
```json
{
"data_type": "connections",
"timestamp": 1697123456,
"data": {
"active_connections": 142,
"max_connections": 1000,
"utilization_percentage": 14.2,
"connections_by_type": {
"websocket": 140,
"http": 2
},
"avg_connection_duration_seconds": 3600
},
"metadata": {
"query_time_ms": 2,
"cached": false
}
}
```
### 5. Subscription Statistics (`d=subscriptions`)
**Update Frequency**: Every 45 seconds (configurable)
**Trigger**: Subscription limit reached
```json
{
"data_type": "subscriptions",
"timestamp": 1697123456,
"data": {
"total_subscriptions": 856,
"max_subscriptions": 10000,
"utilization_percentage": 8.56,
"subscriptions_per_client_avg": 6.0,
"most_subscriptions_per_client": 25
},
"metadata": {
"query_time_ms": 5,
"cached": false
}
}
```
### 6. Database Statistics (`d=database`)
**Update Frequency**: Every 600 seconds (configurable)
**Trigger**: Database size change >5%
```json
{
"data_type": "database",
"timestamp": 1697123456,
"data": {
"size_bytes": 524288000,
"size_mb": 500.0,
"table_sizes": {
"events": 450.5,
"config": 0.1,
"auth_rules": 0.2
},
"oldest_event_timestamp": 1690000000,
"newest_event_timestamp": 1697123400
},
"metadata": {
"query_time_ms": 8,
"cached": false
}
}
```
### 7. System Performance (`d=performance`)
**Update Frequency**: Every 15 seconds (configurable)
**Trigger**: CPU/Memory threshold exceeded
```json
{
"data_type": "performance",
"timestamp": 1697123456,
"data": {
"cpu_usage_percentage": 12.5,
"memory_usage_mb": 256.8,
"memory_usage_percentage": 25.0,
"query_avg_time_ms": 8.5,
"events_processed_per_second": 45.2
},
"metadata": {
"query_time_ms": 1,
"cached": true
}
}
```
### 8. Recent Events (`d=recent_events`)
**Update Frequency**: Every 10 seconds (configurable)
**Trigger**: New event stored
```json
{
"data_type": "recent_events",
"timestamp": 1697123456,
"data": {
"events": [
{
"id": "abc123...",
"kind": 1,
"pubkey": "def456...",
"created_at": 1697123450,
"content_preview": "Hello world..."
}
],
"count": 10
},
"metadata": {
"query_time_ms": 3,
"cached": false
}
}
```
---
## Periodic Query System
### Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ Monitoring Thread │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Timer 1 │ │ Timer 2 │ │ Timer N │ │
│ │ (time_stats) │ │(event_kinds) │ │ (database) │ │
│ │ 60 seconds │ │ 120 seconds │ │ 600 seconds │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ └───────────────────┴───────────────────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ Query Executor │ │
│ └────────┬────────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ Event Generator │ │
│ └────────┬────────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │Event Broadcaster│ │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
### Implementation Components
#### 1. Monitoring Thread (`src/monitoring.c`)
```c
typedef struct {
pthread_t thread;
pthread_mutex_t lock;
int running;
monitoring_config_t config;
monitoring_timer_t* timers;
int timer_count;
} monitoring_manager_t;
// Initialize monitoring system
int init_monitoring_system(void);
// Cleanup monitoring system
void cleanup_monitoring_system(void);
// Main monitoring thread function
void* monitoring_thread_func(void* arg);
```
#### 2. Timer Management
```c
typedef struct monitoring_timer {
char data_type[32]; // e.g., "time_stats"
int interval_seconds; // Update interval
time_t last_execution; // Last execution timestamp
int enabled; // Enable/disable flag
query_func_t query_func; // Function to execute
struct monitoring_timer* next;
} monitoring_timer_t;
// Create a new timer
monitoring_timer_t* create_monitoring_timer(
const char* data_type,
int interval_seconds,
query_func_t query_func
);
// Check if timer should execute
int should_execute_timer(monitoring_timer_t* timer);
// Execute timer and generate event
int execute_monitoring_timer(monitoring_timer_t* timer);
```
#### 3. Query Executor
```c
typedef char* (*query_func_t)(void);
// Query functions for each data type
char* query_time_stats(void);
char* query_event_kinds(void);
char* query_top_pubkeys(void);
char* query_connections(void);
char* query_subscriptions(void);
char* query_database_stats(void);
char* query_performance_stats(void);
char* query_recent_events(void);
```
#### 4. Event Generator
```c
// Generate kind 34567 event from query result
cJSON* generate_monitoring_event(
const char* data_type,
const char* json_data,
int interval_seconds
);
// Sign and broadcast monitoring event
int broadcast_monitoring_event(cJSON* event);
```
### Timer Configuration
Default intervals (configurable via database):
```c
static const monitoring_timer_config_t default_timers[] = {
{"time_stats", 60, query_time_stats},
{"event_kinds", 120, query_event_kinds},
{"top_pubkeys", 300, query_top_pubkeys},
{"connections", 30, query_connections},
{"subscriptions", 45, query_subscriptions},
{"database", 600, query_database_stats},
{"performance", 15, query_performance_stats},
{"recent_events", 10, query_recent_events},
{NULL, 0, NULL} // Sentinel
};
```
---
## Trigger-Based Updates
### Trigger Types
#### 1. Threshold Triggers
Execute when a metric crosses a threshold:
```c
typedef struct {
char metric_name[64];
double threshold_value;
comparison_type_t comparison; // GT, LT, GTE, LTE
char data_type[32];
query_func_t query_func;
} threshold_trigger_t;
// Example: Trigger when connections > 90% of max
threshold_trigger_t connection_threshold = {
.metric_name = "connection_utilization",
.threshold_value = 90.0,
.comparison = GT,
.data_type = "connections",
.query_func = query_connections
};
```
#### 2. Event Triggers
Execute when specific events occur:
```c
typedef enum {
TRIGGER_EVENT_STORED,
TRIGGER_CONNECTION_OPENED,
TRIGGER_CONNECTION_CLOSED,
TRIGGER_SUBSCRIPTION_CREATED,
TRIGGER_SUBSCRIPTION_CLOSED,
TRIGGER_CONFIG_CHANGED,
TRIGGER_ERROR_OCCURRED
} trigger_event_type_t;
// Register trigger for event type
int register_event_trigger(
trigger_event_type_t event_type,
const char* data_type,
query_func_t query_func
);
// Fire trigger when event occurs
void fire_event_trigger(trigger_event_type_t event_type);
```
#### 3. Change Detection Triggers
Execute when data changes significantly:
```c
typedef struct {
char data_type[32];
double change_threshold_percentage; // e.g., 10.0 for 10%
time_t last_check;
char* last_value;
query_func_t query_func;
} change_trigger_t;
// Check if data has changed significantly
int check_change_trigger(change_trigger_t* trigger);
```
### Trigger Integration
```c
// In event storage function (main.c)
int store_event(cJSON* event) {
// ... existing code ...
// Fire monitoring trigger
if (monitoring_enabled()) {
fire_event_trigger(TRIGGER_EVENT_STORED);
}
return 0;
}
// In connection handler (websockets.c)
void handle_connection_opened(struct lws* wsi) {
// ... existing code ...
// Fire monitoring trigger
if (monitoring_enabled()) {
fire_event_trigger(TRIGGER_CONNECTION_OPENED);
}
}
```
---
## Configuration System
### Database Configuration Table
Add monitoring configuration to existing `config` table:
```sql
-- Monitoring system configuration
INSERT INTO config (key, value, data_type) VALUES
('monitoring_enabled', 'true', 'boolean'),
('monitoring_admin_only', 'true', 'boolean'),
('monitoring_max_subscribers', '10', 'integer'),
-- Timer intervals (seconds)
('monitoring_interval_time_stats', '60', 'integer'),
('monitoring_interval_event_kinds', '120', 'integer'),
('monitoring_interval_top_pubkeys', '300', 'integer'),
('monitoring_interval_connections', '30', 'integer'),
('monitoring_interval_subscriptions', '45', 'integer'),
('monitoring_interval_database', '600', 'integer'),
('monitoring_interval_performance', '15', 'integer'),
('monitoring_interval_recent_events', '10', 'integer'),
-- Load management
('monitoring_load_threshold_cpu', '80.0', 'float'),
('monitoring_load_threshold_memory', '85.0', 'float'),
('monitoring_load_action', 'throttle', 'string'), -- throttle, pause, disable
('monitoring_throttle_multiplier', '2.0', 'float'), -- Multiply intervals by this
-- Trigger configuration
('monitoring_triggers_enabled', 'true', 'boolean'),
('monitoring_trigger_connection_threshold', '90.0', 'float'),
('monitoring_trigger_subscription_threshold', '90.0', 'float'),
('monitoring_trigger_change_threshold', '10.0', 'float');
```
### Configuration API
```c
// Get monitoring configuration
typedef struct {
int enabled;
int admin_only;
int max_subscribers;
// Timer intervals
int interval_time_stats;
int interval_event_kinds;
int interval_top_pubkeys;
int interval_connections;
int interval_subscriptions;
int interval_database;
int interval_performance;
int interval_recent_events;
// Load management
double load_threshold_cpu;
double load_threshold_memory;
char load_action[32];
double throttle_multiplier;
// Triggers
int triggers_enabled;
double trigger_connection_threshold;
double trigger_subscription_threshold;
double trigger_change_threshold;
} monitoring_config_t;
// Load configuration from database
int load_monitoring_config(monitoring_config_t* config);
// Update configuration
int update_monitoring_config(const char* key, const char* value);
// Reload configuration (called when config changes)
void reload_monitoring_config(void);
```
---
## Load Management
### Load Detection
```c
typedef struct {
double cpu_usage;
double memory_usage;
int active_connections;
int active_subscriptions;
double query_avg_time_ms;
} system_load_t;
// Get current system load
system_load_t get_system_load(void);
// Check if system is under high load
int is_high_load(system_load_t* load, monitoring_config_t* config);
```
### Load-Based Actions
#### 1. Throttle Mode
Multiply all timer intervals by throttle multiplier:
```c
void apply_throttle_mode(monitoring_manager_t* manager) {
double multiplier = manager->config.throttle_multiplier;
for (monitoring_timer_t* timer = manager->timers;
timer != NULL;
timer = timer->next) {
timer->interval_seconds = (int)(timer->interval_seconds * multiplier);
}
}
```
#### 2. Pause Mode
Temporarily stop all monitoring:
```c
void apply_pause_mode(monitoring_manager_t* manager) {
manager->running = 0;
// Resume when load decreases
}
```
#### 3. Disable Mode
Disable specific high-cost queries:
```c
void apply_disable_mode(monitoring_manager_t* manager) {
// Disable expensive queries
disable_timer(manager, "top_pubkeys");
disable_timer(manager, "database");
}
```
### Adaptive Intervals
```c
// Adjust intervals based on subscriber count
void adjust_intervals_for_subscribers(
monitoring_manager_t* manager,
int subscriber_count
) {
if (subscriber_count == 0) {
// No subscribers - pause monitoring
manager->running = 0;
} else if (subscriber_count > 5) {
// Many subscribers - reduce frequency
apply_throttle_mode(manager);
}
}
```
---
## Frontend Integration
### Subscription Pattern
Admin dashboard subscribes to monitoring events:
```javascript
// Subscribe to all monitoring data types
const monitoringSubscription = {
kinds: [34567],
authors: [relayPubkey],
"#relay": [relayPubkey]
};
relay.subscribe([monitoringSubscription], {
onevent: (event) => {
handleMonitoringEvent(event);
}
});
```
### Event Handling
```javascript
function handleMonitoringEvent(event) {
// Extract data type from d tag
const dTag = event.tags.find(t => t[0] === 'd');
if (!dTag) return;
const dataType = dTag[1];
const content = JSON.parse(event.content);
// Route to appropriate handler
switch (dataType) {
case 'time_stats':
updateTimeStatsChart(content.data);
break;
case 'event_kinds':
updateEventKindsChart(content.data);
break;
case 'connections':
updateConnectionsGauge(content.data);
break;
// ... other handlers
}
}
```
### Selective Subscription
Subscribe to specific data types only:
```javascript
// Subscribe only to performance metrics
const performanceSubscription = {
kinds: [34567],
authors: [relayPubkey],
"#d": ["performance", "connections", "subscriptions"]
};
```
### Historical Data
Query past monitoring events:
```javascript
// Get last hour of time_stats
const historicalQuery = {
kinds: [34567],
authors: [relayPubkey],
"#d": ["time_stats"],
since: Math.floor(Date.now() / 1000) - 3600
};
```
---
## Database Schema
### Monitoring State Table
Track monitoring execution state:
```sql
CREATE TABLE IF NOT EXISTS monitoring_state (
data_type TEXT PRIMARY KEY,
last_execution INTEGER NOT NULL,
last_value TEXT,
execution_count INTEGER DEFAULT 0,
avg_query_time_ms REAL DEFAULT 0.0,
last_error TEXT,
enabled INTEGER DEFAULT 1,
created_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now')),
updated_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now'))
);
CREATE INDEX idx_monitoring_state_execution
ON monitoring_state(last_execution);
```
### Monitoring Subscribers Table
Track active monitoring subscribers:
```sql
CREATE TABLE IF NOT EXISTS monitoring_subscribers (
id INTEGER PRIMARY KEY AUTOINCREMENT,
pubkey TEXT NOT NULL,
data_types TEXT, -- JSON array of subscribed data types
subscribed_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now')),
last_seen INTEGER NOT NULL DEFAULT (strftime('%s', 'now')),
active INTEGER DEFAULT 1
);
CREATE INDEX idx_monitoring_subscribers_pubkey
ON monitoring_subscribers(pubkey);
CREATE INDEX idx_monitoring_subscribers_active
ON monitoring_subscribers(active);
```
### Monitoring Metrics History (Optional)
Store historical metrics for trending:
```sql
CREATE TABLE IF NOT EXISTS monitoring_history (
id INTEGER PRIMARY KEY AUTOINCREMENT,
data_type TEXT NOT NULL,
timestamp INTEGER NOT NULL,
data TEXT NOT NULL, -- JSON data
query_time_ms REAL
);
CREATE INDEX idx_monitoring_history_type_time
ON monitoring_history(data_type, timestamp);
-- Cleanup old history (keep last 7 days)
DELETE FROM monitoring_history
WHERE timestamp < strftime('%s', 'now', '-7 days');
```
---
## Implementation Plan
### Phase 1: Core Infrastructure (Week 1)
**Files to Create:**
- `src/monitoring.h` - Header with types and function declarations
- `src/monitoring.c` - Core monitoring system implementation
**Tasks:**
1. ✅ Design event structure and data types
2. ⬜ Implement monitoring manager initialization
3. ⬜ Create monitoring thread with timer system
4. ⬜ Add configuration loading from database
5. ⬜ Implement basic event generation and broadcasting
**Deliverables:**
- Working monitoring thread that generates events periodically
- Configuration system integrated with existing config table
- Basic event broadcasting to subscriptions
### Phase 2: Query Functions (Week 2)
**Files to Modify:**
- `src/monitoring.c` - Add query implementations
**Tasks:**
1. ⬜ Implement `query_time_stats()`
2. ⬜ Implement `query_event_kinds()`
3. ⬜ Implement `query_top_pubkeys()`
4. ⬜ Implement `query_connections()`
5. ⬜ Implement `query_subscriptions()`
6. ⬜ Implement `query_database_stats()`
7. ⬜ Implement `query_performance_stats()`
8. ⬜ Implement `query_recent_events()`
**Deliverables:**
- All 8 query functions working and tested
- Efficient SQL queries with minimal performance impact
- JSON output matching design specifications
### Phase 3: Trigger System (Week 3)
**Files to Modify:**
- `src/monitoring.c` - Add trigger implementation
- `src/main.c` - Add trigger hooks
- `src/websockets.c` - Add trigger hooks
**Tasks:**
1. ⬜ Implement threshold trigger system
2. ⬜ Implement event trigger system
3. ⬜ Implement change detection triggers
4. ⬜ Add trigger hooks to event storage
5. ⬜ Add trigger hooks to connection management
6. ⬜ Add trigger hooks to subscription management
**Deliverables:**
- Working trigger system for immediate updates
- Integration with existing relay operations
- Configurable trigger thresholds
### Phase 4: Load Management (Week 4)
**Files to Modify:**
- `src/monitoring.c` - Add load management
**Tasks:**
1. ⬜ Implement system load detection
2. ⬜ Implement throttle mode
3. ⬜ Implement pause mode
4. ⬜ Implement adaptive intervals
5. ⬜ Add subscriber count tracking
6. ⬜ Test under various load conditions
**Deliverables:**
- Load-aware monitoring system
- Automatic throttling under high load
- Subscriber-based optimization
### Phase 5: Frontend Integration (Week 5)
**Files to Create/Modify:**
- `api/monitoring.html` - Monitoring dashboard
- `api/monitoring.js` - Dashboard JavaScript
- `api/monitoring.css` - Dashboard styles
**Tasks:**
1. ⬜ Create monitoring dashboard UI
2. ⬜ Implement WebSocket subscription handling
3. ⬜ Create real-time charts and gauges
4. ⬜ Add historical data visualization
5. ⬜ Implement selective subscription controls
6. ⬜ Add export/download functionality
**Deliverables:**
- Working admin monitoring dashboard
- Real-time data visualization
- Historical data queries
### Phase 6: Testing & Documentation (Week 6)
**Files to Create:**
- `tests/monitoring_tests.sh` - Test suite
- `docs/monitoring_user_guide.md` - User documentation
**Tasks:**
1. ⬜ Write unit tests for query functions
2. ⬜ Write integration tests for monitoring system
3. ⬜ Test load management under stress
4. ⬜ Test frontend with multiple subscribers
5. ⬜ Write user documentation
6. ⬜ Write API documentation
**Deliverables:**
- Comprehensive test suite
- User and developer documentation
- Performance benchmarks
---
## Security Considerations
### Access Control
1. **Admin-Only Access**
- Monitoring events only sent to authorized admin pubkeys
- Check admin authorization before broadcasting
- Configurable via `monitoring_admin_only` setting
```c
int is_authorized_monitoring_subscriber(const char* pubkey) {
// Check if monitoring is admin-only
int admin_only = get_config_bool("monitoring_admin_only", 1);
if (!admin_only) {
return 1; // Open to all
}
// Check if pubkey matches admin pubkey
const char* admin_pubkey = get_config_value("admin_pubkey");
return (strcmp(pubkey, admin_pubkey) == 0);
}
```
2. **Subscription Limits**
- Maximum number of concurrent monitoring subscribers
- Prevents resource exhaustion
- Configurable via `monitoring_max_subscribers`
3. **Rate Limiting**
- Prevent abuse of monitoring system
- Limit subscription requests per IP/pubkey
- Automatic throttling under high load
### Data Privacy
1. **Sensitive Data Filtering**
- Don't expose full pubkeys in monitoring data
- Truncate or hash sensitive information
- Configurable data exposure levels
2. **Content Filtering**
- Don't include event content in monitoring data
- Only include metadata (kind, timestamp, etc.)
- Prevent information leakage
### Performance Protection
1. **Query Timeouts**
- All monitoring queries have strict timeouts
- Prevent long-running queries from blocking
- Automatic fallback to cached data
2. **Resource Limits**
- Maximum query result sizes
- Memory limits for monitoring data
- CPU usage monitoring and throttling
3. **Graceful Degradation**
- System continues working if monitoring fails
- Monitoring errors don't affect relay operations
- Automatic recovery from failures
---
## Future Enhancements
### Phase 7+ (Future)
1. **Advanced Analytics**
- Trend analysis and predictions
- Anomaly detection
- Automated alerting
2. **Custom Metrics**
- User-defined monitoring queries
- Custom data types
- Pluggable query system
3. **Multi-Relay Monitoring**
- Aggregate metrics from multiple relays
- Distributed monitoring
- Relay comparison tools
4. **Export & Integration**
- Prometheus metrics export
- Grafana integration
- CSV/JSON data export
5. **Mobile Dashboard**
- Mobile-optimized monitoring UI
- Push notifications
- Offline data caching
---
## Appendix A: Configuration Reference
### Complete Configuration Keys
```
monitoring_enabled - Enable/disable monitoring system
monitoring_admin_only - Restrict to admin pubkey only
monitoring_max_subscribers - Maximum concurrent subscribers
monitoring_interval_time_stats - Time stats update interval (seconds)
monitoring_interval_event_kinds - Event kinds update interval (seconds)
monitoring_interval_top_pubkeys - Top pubkeys update interval (seconds)
monitoring_interval_connections - Connections update interval (seconds)
monitoring_interval_subscriptions - Subscriptions update interval (seconds)
monitoring_interval_database - Database stats update interval (seconds)
monitoring_interval_performance - Performance update interval (seconds)
monitoring_interval_recent_events - Recent events update interval (seconds)
monitoring_load_threshold_cpu - CPU threshold for load management (%)
monitoring_load_threshold_memory - Memory threshold for load management (%)
monitoring_load_action - Action on high load (throttle/pause/disable)
monitoring_throttle_multiplier - Interval multiplier when throttling
monitoring_triggers_enabled - Enable/disable trigger system
monitoring_trigger_connection_threshold - Connection utilization trigger (%)
monitoring_trigger_subscription_threshold - Subscription utilization trigger (%)
monitoring_trigger_change_threshold - Change detection threshold (%)
```
---
## Appendix B: API Reference
### C API Functions
```c
// Initialization
int init_monitoring_system(void);
void cleanup_monitoring_system(void);
// Configuration
int load_monitoring_config(monitoring_config_t* config);
int update_monitoring_config(const char* key, const char* value);
void reload_monitoring_config(void);
// Timer Management
monitoring_timer_t* create_monitoring_timer(const char* data_type,
int interval_seconds,
query_func_t query_func);
int should_execute_timer(monitoring_timer_t* timer);
int execute_monitoring_timer(monitoring_timer_t* timer);
// Query Functions
char* query_time_stats(void);
char* query_event_kinds(void);
char* query_top_pubkeys(void);
char* query_connections(void);
char* query_subscriptions(void);
char* query_database_stats(void);
char* query_performance_stats(void);
char* query_recent_events(void);
// Event Generation
cJSON* generate_monitoring_event(const char* data_type,
const char* json_data,
int interval_seconds);
int broadcast_monitoring_event(cJSON* event);
// Trigger System
int register_event_trigger(trigger_event_type_t event_type,
const char* data_type,
query_func_t query_func);
void fire_event_trigger(trigger_event_type_t event_type);
// Load Management
system_load_t get_system_load(void);
int is_high_load(system_load_t* load, monitoring_config_t* config);
void apply_throttle_mode(monitoring_manager_t* manager);
void apply_pause_mode(monitoring_manager_t* manager);
// Subscriber Management
int is_authorized_monitoring_subscriber(const char* pubkey);
int add_monitoring_subscriber(const char* pubkey, const char* data_types);
int remove_monitoring_subscriber(const char* pubkey);
int get_subscriber_count(void);
```
---
## Appendix C: Example Usage
### Admin Dashboard Subscription
```javascript
// Connect to relay
const relay = new WebSocket('ws://localhost:8888');
// Subscribe to all monitoring events
relay.send(JSON.stringify([
"REQ",
"monitoring-sub",
{
kinds: [34567],
authors: [relayPubkey],
"#relay": [relayPubkey]
}
]));
// Handle incoming events
relay.onmessage = (msg) => {
const [type, subId, event] = JSON.parse(msg.data);
if (type === 'EVENT' && event.kind === 34567) {
const dTag = event.tags.find(t => t[0] === 'd')[1];
const content = JSON.parse(event.content);
console.log(`Received ${dTag} update:`, content.data);
updateDashboard(dTag, content.data);
}
};
```
### Configuration Update
```bash
# Enable monitoring
curl -X POST http://localhost:8888/api/config \
-H "Content-Type: application/json" \
-d '{"key": "monitoring_enabled", "value": "true"}'
# Set update interval
curl -X POST http://localhost:8888/api/config \
-H "Content-Type: application/json" \
-d '{"key": "monitoring_interval_time_stats", "value": "30"}'
```
---
**End of Design Document**