1189 lines
31 KiB
Markdown
1189 lines
31 KiB
Markdown
# Real-Time Admin Monitoring System Design
|
|
## Kind 34567 Addressable Events
|
|
|
|
**Version:** 1.0
|
|
**Date:** 2025-10-16
|
|
**Status:** Design Phase
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
1. [Overview](#overview)
|
|
2. [Event Structure](#event-structure)
|
|
3. [Monitoring Data Types](#monitoring-data-types)
|
|
4. [Periodic Query System](#periodic-query-system)
|
|
5. [Trigger-Based Updates](#trigger-based-updates)
|
|
6. [Configuration System](#configuration-system)
|
|
7. [Load Management](#load-management)
|
|
8. [Frontend Integration](#frontend-integration)
|
|
9. [Database Schema](#database-schema)
|
|
10. [Implementation Plan](#implementation-plan)
|
|
11. [Security Considerations](#security-considerations)
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
### Purpose
|
|
Create a real-time monitoring system that allows administrators to subscribe to live relay statistics and metrics through kind 34567 addressable events. The system generates events periodically or based on triggers, enabling admin dashboards to display continuously updated data without polling.
|
|
|
|
### Key Features
|
|
- **Addressable Events**: Use kind 34567 with "d" tags to identify different data types
|
|
- **Periodic Updates**: Configurable intervals for different metric types
|
|
- **Trigger-Based Updates**: Immediate updates on significant events
|
|
- **Load-Aware**: Adjusts update frequency based on relay load
|
|
- **Subscription-Based**: Admins subscribe once and receive continuous updates
|
|
- **Configurable**: Enable/disable features and control update frequencies
|
|
|
|
### Architecture Principles
|
|
1. **Non-Blocking**: Monitoring must not impact relay performance
|
|
2. **Efficient**: Minimize database queries and CPU usage
|
|
3. **Scalable**: Handle multiple concurrent admin subscriptions
|
|
4. **Flexible**: Easy to add new monitoring data types
|
|
5. **Secure**: Only authorized admins can access monitoring data
|
|
|
|
---
|
|
|
|
## Event Structure
|
|
|
|
### Kind 34567 Event Format
|
|
|
|
```json
|
|
{
|
|
"id": "<event_id>",
|
|
"pubkey": "<relay_pubkey>",
|
|
"created_at": 1697123456,
|
|
"kind": 34567,
|
|
"content": "<json_data>",
|
|
"tags": [
|
|
["d", "<data_type>"],
|
|
["relay", "<relay_pubkey>"],
|
|
["interval", "<update_interval_seconds>"],
|
|
["version", "1"]
|
|
],
|
|
"sig": "<signature>"
|
|
}
|
|
```
|
|
|
|
### Tag Specifications
|
|
|
|
#### Required Tags
|
|
- **`d` tag**: Identifies the data type (e.g., "time_stats", "event_kinds", "connections")
|
|
- **`relay` tag**: Relay's public key for filtering
|
|
- **`version` tag**: Schema version for forward compatibility
|
|
|
|
#### Optional Tags
|
|
- **`interval` tag**: Update interval in seconds (for periodic updates)
|
|
- **`trigger` tag**: Trigger type for event-driven updates (e.g., "threshold", "event")
|
|
- **`priority` tag**: Update priority ("high", "normal", "low")
|
|
|
|
### Content Structure
|
|
|
|
The `content` field contains JSON-encoded monitoring data specific to each data type:
|
|
|
|
```json
|
|
{
|
|
"data_type": "time_stats",
|
|
"timestamp": 1697123456,
|
|
"data": {
|
|
// Data type-specific fields
|
|
},
|
|
"metadata": {
|
|
"query_time_ms": 15,
|
|
"cached": false
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Monitoring Data Types
|
|
|
|
### 1. Time-Based Statistics (`d=time_stats`)
|
|
|
|
**Update Frequency**: Every 60 seconds (configurable)
|
|
**Trigger**: None (periodic only)
|
|
|
|
```json
|
|
{
|
|
"data_type": "time_stats",
|
|
"timestamp": 1697123456,
|
|
"data": {
|
|
"total_events": 125000,
|
|
"last_24h": 5420,
|
|
"last_7d": 32100,
|
|
"last_30d": 98500,
|
|
"events_per_hour_24h": 225.8,
|
|
"events_per_day_7d": 4585.7
|
|
},
|
|
"metadata": {
|
|
"query_time_ms": 12,
|
|
"cached": false
|
|
}
|
|
}
|
|
```
|
|
|
|
### 2. Event Kind Distribution (`d=event_kinds`)
|
|
|
|
**Update Frequency**: Every 120 seconds (configurable)
|
|
**Trigger**: Significant change (>10% shift in distribution)
|
|
|
|
```json
|
|
{
|
|
"data_type": "event_kinds",
|
|
"timestamp": 1697123456,
|
|
"data": {
|
|
"distribution": [
|
|
{"kind": 1, "count": 45000, "percentage": 36.0},
|
|
{"kind": 3, "count": 12500, "percentage": 10.0},
|
|
{"kind": 7, "count": 8900, "percentage": 7.1}
|
|
],
|
|
"total_kinds": 15,
|
|
"most_active_kind": 1
|
|
},
|
|
"metadata": {
|
|
"query_time_ms": 18,
|
|
"cached": false
|
|
}
|
|
}
|
|
```
|
|
|
|
### 3. Top Publishers (`d=top_pubkeys`)
|
|
|
|
**Update Frequency**: Every 300 seconds (configurable)
|
|
**Trigger**: New top-10 entry
|
|
|
|
```json
|
|
{
|
|
"data_type": "top_pubkeys",
|
|
"timestamp": 1697123456,
|
|
"data": {
|
|
"top_publishers": [
|
|
{
|
|
"pubkey": "abc123...",
|
|
"event_count": 5420,
|
|
"percentage": 4.3,
|
|
"last_event_at": 1697123400
|
|
}
|
|
],
|
|
"total_unique_pubkeys": 8542
|
|
},
|
|
"metadata": {
|
|
"query_time_ms": 25,
|
|
"cached": false
|
|
}
|
|
}
|
|
```
|
|
|
|
### 4. Active Connections (`d=connections`)
|
|
|
|
**Update Frequency**: Every 30 seconds (configurable)
|
|
**Trigger**: Connection count change >10%
|
|
|
|
```json
|
|
{
|
|
"data_type": "connections",
|
|
"timestamp": 1697123456,
|
|
"data": {
|
|
"active_connections": 142,
|
|
"max_connections": 1000,
|
|
"utilization_percentage": 14.2,
|
|
"connections_by_type": {
|
|
"websocket": 140,
|
|
"http": 2
|
|
},
|
|
"avg_connection_duration_seconds": 3600
|
|
},
|
|
"metadata": {
|
|
"query_time_ms": 2,
|
|
"cached": false
|
|
}
|
|
}
|
|
```
|
|
|
|
### 5. Subscription Statistics (`d=subscriptions`)
|
|
|
|
**Update Frequency**: Every 45 seconds (configurable)
|
|
**Trigger**: Subscription limit reached
|
|
|
|
```json
|
|
{
|
|
"data_type": "subscriptions",
|
|
"timestamp": 1697123456,
|
|
"data": {
|
|
"total_subscriptions": 856,
|
|
"max_subscriptions": 10000,
|
|
"utilization_percentage": 8.56,
|
|
"subscriptions_per_client_avg": 6.0,
|
|
"most_subscriptions_per_client": 25
|
|
},
|
|
"metadata": {
|
|
"query_time_ms": 5,
|
|
"cached": false
|
|
}
|
|
}
|
|
```
|
|
|
|
### 6. Database Statistics (`d=database`)
|
|
|
|
**Update Frequency**: Every 600 seconds (configurable)
|
|
**Trigger**: Database size change >5%
|
|
|
|
```json
|
|
{
|
|
"data_type": "database",
|
|
"timestamp": 1697123456,
|
|
"data": {
|
|
"size_bytes": 524288000,
|
|
"size_mb": 500.0,
|
|
"table_sizes": {
|
|
"events": 450.5,
|
|
"config": 0.1,
|
|
"auth_rules": 0.2
|
|
},
|
|
"oldest_event_timestamp": 1690000000,
|
|
"newest_event_timestamp": 1697123400
|
|
},
|
|
"metadata": {
|
|
"query_time_ms": 8,
|
|
"cached": false
|
|
}
|
|
}
|
|
```
|
|
|
|
### 7. System Performance (`d=performance`)
|
|
|
|
**Update Frequency**: Every 15 seconds (configurable)
|
|
**Trigger**: CPU/Memory threshold exceeded
|
|
|
|
```json
|
|
{
|
|
"data_type": "performance",
|
|
"timestamp": 1697123456,
|
|
"data": {
|
|
"cpu_usage_percentage": 12.5,
|
|
"memory_usage_mb": 256.8,
|
|
"memory_usage_percentage": 25.0,
|
|
"query_avg_time_ms": 8.5,
|
|
"events_processed_per_second": 45.2
|
|
},
|
|
"metadata": {
|
|
"query_time_ms": 1,
|
|
"cached": true
|
|
}
|
|
}
|
|
```
|
|
|
|
### 8. Recent Events (`d=recent_events`)
|
|
|
|
**Update Frequency**: Every 10 seconds (configurable)
|
|
**Trigger**: New event stored
|
|
|
|
```json
|
|
{
|
|
"data_type": "recent_events",
|
|
"timestamp": 1697123456,
|
|
"data": {
|
|
"events": [
|
|
{
|
|
"id": "abc123...",
|
|
"kind": 1,
|
|
"pubkey": "def456...",
|
|
"created_at": 1697123450,
|
|
"content_preview": "Hello world..."
|
|
}
|
|
],
|
|
"count": 10
|
|
},
|
|
"metadata": {
|
|
"query_time_ms": 3,
|
|
"cached": false
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Periodic Query System
|
|
|
|
### Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Monitoring Thread │
|
|
│ │
|
|
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
|
│ │ Timer 1 │ │ Timer 2 │ │ Timer N │ │
|
|
│ │ (time_stats) │ │(event_kinds) │ │ (database) │ │
|
|
│ │ 60 seconds │ │ 120 seconds │ │ 600 seconds │ │
|
|
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
|
|
│ │ │ │ │
|
|
│ └───────────────────┴───────────────────┘ │
|
|
│ │ │
|
|
│ ┌────────▼────────┐ │
|
|
│ │ Query Executor │ │
|
|
│ └────────┬────────┘ │
|
|
│ │ │
|
|
│ ┌────────▼────────┐ │
|
|
│ │ Event Generator │ │
|
|
│ └────────┬────────┘ │
|
|
│ │ │
|
|
│ ┌────────▼────────┐ │
|
|
│ │Event Broadcaster│ │
|
|
│ └─────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Implementation Components
|
|
|
|
#### 1. Monitoring Thread (`src/monitoring.c`)
|
|
|
|
```c
|
|
typedef struct {
|
|
pthread_t thread;
|
|
pthread_mutex_t lock;
|
|
int running;
|
|
monitoring_config_t config;
|
|
monitoring_timer_t* timers;
|
|
int timer_count;
|
|
} monitoring_manager_t;
|
|
|
|
// Initialize monitoring system
|
|
int init_monitoring_system(void);
|
|
|
|
// Cleanup monitoring system
|
|
void cleanup_monitoring_system(void);
|
|
|
|
// Main monitoring thread function
|
|
void* monitoring_thread_func(void* arg);
|
|
```
|
|
|
|
#### 2. Timer Management
|
|
|
|
```c
|
|
typedef struct monitoring_timer {
|
|
char data_type[32]; // e.g., "time_stats"
|
|
int interval_seconds; // Update interval
|
|
time_t last_execution; // Last execution timestamp
|
|
int enabled; // Enable/disable flag
|
|
query_func_t query_func; // Function to execute
|
|
struct monitoring_timer* next;
|
|
} monitoring_timer_t;
|
|
|
|
// Create a new timer
|
|
monitoring_timer_t* create_monitoring_timer(
|
|
const char* data_type,
|
|
int interval_seconds,
|
|
query_func_t query_func
|
|
);
|
|
|
|
// Check if timer should execute
|
|
int should_execute_timer(monitoring_timer_t* timer);
|
|
|
|
// Execute timer and generate event
|
|
int execute_monitoring_timer(monitoring_timer_t* timer);
|
|
```
|
|
|
|
#### 3. Query Executor
|
|
|
|
```c
|
|
typedef char* (*query_func_t)(void);
|
|
|
|
// Query functions for each data type
|
|
char* query_time_stats(void);
|
|
char* query_event_kinds(void);
|
|
char* query_top_pubkeys(void);
|
|
char* query_connections(void);
|
|
char* query_subscriptions(void);
|
|
char* query_database_stats(void);
|
|
char* query_performance_stats(void);
|
|
char* query_recent_events(void);
|
|
```
|
|
|
|
#### 4. Event Generator
|
|
|
|
```c
|
|
// Generate kind 34567 event from query result
|
|
cJSON* generate_monitoring_event(
|
|
const char* data_type,
|
|
const char* json_data,
|
|
int interval_seconds
|
|
);
|
|
|
|
// Sign and broadcast monitoring event
|
|
int broadcast_monitoring_event(cJSON* event);
|
|
```
|
|
|
|
### Timer Configuration
|
|
|
|
Default intervals (configurable via database):
|
|
|
|
```c
|
|
static const monitoring_timer_config_t default_timers[] = {
|
|
{"time_stats", 60, query_time_stats},
|
|
{"event_kinds", 120, query_event_kinds},
|
|
{"top_pubkeys", 300, query_top_pubkeys},
|
|
{"connections", 30, query_connections},
|
|
{"subscriptions", 45, query_subscriptions},
|
|
{"database", 600, query_database_stats},
|
|
{"performance", 15, query_performance_stats},
|
|
{"recent_events", 10, query_recent_events},
|
|
{NULL, 0, NULL} // Sentinel
|
|
};
|
|
```
|
|
|
|
---
|
|
|
|
## Trigger-Based Updates
|
|
|
|
### Trigger Types
|
|
|
|
#### 1. Threshold Triggers
|
|
|
|
Execute when a metric crosses a threshold:
|
|
|
|
```c
|
|
typedef struct {
|
|
char metric_name[64];
|
|
double threshold_value;
|
|
comparison_type_t comparison; // GT, LT, GTE, LTE
|
|
char data_type[32];
|
|
query_func_t query_func;
|
|
} threshold_trigger_t;
|
|
|
|
// Example: Trigger when connections > 90% of max
|
|
threshold_trigger_t connection_threshold = {
|
|
.metric_name = "connection_utilization",
|
|
.threshold_value = 90.0,
|
|
.comparison = GT,
|
|
.data_type = "connections",
|
|
.query_func = query_connections
|
|
};
|
|
```
|
|
|
|
#### 2. Event Triggers
|
|
|
|
Execute when specific events occur:
|
|
|
|
```c
|
|
typedef enum {
|
|
TRIGGER_EVENT_STORED,
|
|
TRIGGER_CONNECTION_OPENED,
|
|
TRIGGER_CONNECTION_CLOSED,
|
|
TRIGGER_SUBSCRIPTION_CREATED,
|
|
TRIGGER_SUBSCRIPTION_CLOSED,
|
|
TRIGGER_CONFIG_CHANGED,
|
|
TRIGGER_ERROR_OCCURRED
|
|
} trigger_event_type_t;
|
|
|
|
// Register trigger for event type
|
|
int register_event_trigger(
|
|
trigger_event_type_t event_type,
|
|
const char* data_type,
|
|
query_func_t query_func
|
|
);
|
|
|
|
// Fire trigger when event occurs
|
|
void fire_event_trigger(trigger_event_type_t event_type);
|
|
```
|
|
|
|
#### 3. Change Detection Triggers
|
|
|
|
Execute when data changes significantly:
|
|
|
|
```c
|
|
typedef struct {
|
|
char data_type[32];
|
|
double change_threshold_percentage; // e.g., 10.0 for 10%
|
|
time_t last_check;
|
|
char* last_value;
|
|
query_func_t query_func;
|
|
} change_trigger_t;
|
|
|
|
// Check if data has changed significantly
|
|
int check_change_trigger(change_trigger_t* trigger);
|
|
```
|
|
|
|
### Trigger Integration
|
|
|
|
```c
|
|
// In event storage function (main.c)
|
|
int store_event(cJSON* event) {
|
|
// ... existing code ...
|
|
|
|
// Fire monitoring trigger
|
|
if (monitoring_enabled()) {
|
|
fire_event_trigger(TRIGGER_EVENT_STORED);
|
|
}
|
|
|
|
return 0;
|
|
}
|
|
|
|
// In connection handler (websockets.c)
|
|
void handle_connection_opened(struct lws* wsi) {
|
|
// ... existing code ...
|
|
|
|
// Fire monitoring trigger
|
|
if (monitoring_enabled()) {
|
|
fire_event_trigger(TRIGGER_CONNECTION_OPENED);
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Configuration System
|
|
|
|
### Database Configuration Table
|
|
|
|
Add monitoring configuration to existing `config` table:
|
|
|
|
```sql
|
|
-- Monitoring system configuration
|
|
INSERT INTO config (key, value, data_type) VALUES
|
|
('monitoring_enabled', 'true', 'boolean'),
|
|
('monitoring_admin_only', 'true', 'boolean'),
|
|
('monitoring_max_subscribers', '10', 'integer'),
|
|
|
|
-- Timer intervals (seconds)
|
|
('monitoring_interval_time_stats', '60', 'integer'),
|
|
('monitoring_interval_event_kinds', '120', 'integer'),
|
|
('monitoring_interval_top_pubkeys', '300', 'integer'),
|
|
('monitoring_interval_connections', '30', 'integer'),
|
|
('monitoring_interval_subscriptions', '45', 'integer'),
|
|
('monitoring_interval_database', '600', 'integer'),
|
|
('monitoring_interval_performance', '15', 'integer'),
|
|
('monitoring_interval_recent_events', '10', 'integer'),
|
|
|
|
-- Load management
|
|
('monitoring_load_threshold_cpu', '80.0', 'float'),
|
|
('monitoring_load_threshold_memory', '85.0', 'float'),
|
|
('monitoring_load_action', 'throttle', 'string'), -- throttle, pause, disable
|
|
('monitoring_throttle_multiplier', '2.0', 'float'), -- Multiply intervals by this
|
|
|
|
-- Trigger configuration
|
|
('monitoring_triggers_enabled', 'true', 'boolean'),
|
|
('monitoring_trigger_connection_threshold', '90.0', 'float'),
|
|
('monitoring_trigger_subscription_threshold', '90.0', 'float'),
|
|
('monitoring_trigger_change_threshold', '10.0', 'float');
|
|
```
|
|
|
|
### Configuration API
|
|
|
|
```c
|
|
// Get monitoring configuration
|
|
typedef struct {
|
|
int enabled;
|
|
int admin_only;
|
|
int max_subscribers;
|
|
|
|
// Timer intervals
|
|
int interval_time_stats;
|
|
int interval_event_kinds;
|
|
int interval_top_pubkeys;
|
|
int interval_connections;
|
|
int interval_subscriptions;
|
|
int interval_database;
|
|
int interval_performance;
|
|
int interval_recent_events;
|
|
|
|
// Load management
|
|
double load_threshold_cpu;
|
|
double load_threshold_memory;
|
|
char load_action[32];
|
|
double throttle_multiplier;
|
|
|
|
// Triggers
|
|
int triggers_enabled;
|
|
double trigger_connection_threshold;
|
|
double trigger_subscription_threshold;
|
|
double trigger_change_threshold;
|
|
} monitoring_config_t;
|
|
|
|
// Load configuration from database
|
|
int load_monitoring_config(monitoring_config_t* config);
|
|
|
|
// Update configuration
|
|
int update_monitoring_config(const char* key, const char* value);
|
|
|
|
// Reload configuration (called when config changes)
|
|
void reload_monitoring_config(void);
|
|
```
|
|
|
|
---
|
|
|
|
## Load Management
|
|
|
|
### Load Detection
|
|
|
|
```c
|
|
typedef struct {
|
|
double cpu_usage;
|
|
double memory_usage;
|
|
int active_connections;
|
|
int active_subscriptions;
|
|
double query_avg_time_ms;
|
|
} system_load_t;
|
|
|
|
// Get current system load
|
|
system_load_t get_system_load(void);
|
|
|
|
// Check if system is under high load
|
|
int is_high_load(system_load_t* load, monitoring_config_t* config);
|
|
```
|
|
|
|
### Load-Based Actions
|
|
|
|
#### 1. Throttle Mode
|
|
Multiply all timer intervals by throttle multiplier:
|
|
|
|
```c
|
|
void apply_throttle_mode(monitoring_manager_t* manager) {
|
|
double multiplier = manager->config.throttle_multiplier;
|
|
|
|
for (monitoring_timer_t* timer = manager->timers;
|
|
timer != NULL;
|
|
timer = timer->next) {
|
|
timer->interval_seconds = (int)(timer->interval_seconds * multiplier);
|
|
}
|
|
}
|
|
```
|
|
|
|
#### 2. Pause Mode
|
|
Temporarily stop all monitoring:
|
|
|
|
```c
|
|
void apply_pause_mode(monitoring_manager_t* manager) {
|
|
manager->running = 0;
|
|
// Resume when load decreases
|
|
}
|
|
```
|
|
|
|
#### 3. Disable Mode
|
|
Disable specific high-cost queries:
|
|
|
|
```c
|
|
void apply_disable_mode(monitoring_manager_t* manager) {
|
|
// Disable expensive queries
|
|
disable_timer(manager, "top_pubkeys");
|
|
disable_timer(manager, "database");
|
|
}
|
|
```
|
|
|
|
### Adaptive Intervals
|
|
|
|
```c
|
|
// Adjust intervals based on subscriber count
|
|
void adjust_intervals_for_subscribers(
|
|
monitoring_manager_t* manager,
|
|
int subscriber_count
|
|
) {
|
|
if (subscriber_count == 0) {
|
|
// No subscribers - pause monitoring
|
|
manager->running = 0;
|
|
} else if (subscriber_count > 5) {
|
|
// Many subscribers - reduce frequency
|
|
apply_throttle_mode(manager);
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Frontend Integration
|
|
|
|
### Subscription Pattern
|
|
|
|
Admin dashboard subscribes to monitoring events:
|
|
|
|
```javascript
|
|
// Subscribe to all monitoring data types
|
|
const monitoringSubscription = {
|
|
kinds: [34567],
|
|
authors: [relayPubkey],
|
|
"#relay": [relayPubkey]
|
|
};
|
|
|
|
relay.subscribe([monitoringSubscription], {
|
|
onevent: (event) => {
|
|
handleMonitoringEvent(event);
|
|
}
|
|
});
|
|
```
|
|
|
|
### Event Handling
|
|
|
|
```javascript
|
|
function handleMonitoringEvent(event) {
|
|
// Extract data type from d tag
|
|
const dTag = event.tags.find(t => t[0] === 'd');
|
|
if (!dTag) return;
|
|
|
|
const dataType = dTag[1];
|
|
const content = JSON.parse(event.content);
|
|
|
|
// Route to appropriate handler
|
|
switch (dataType) {
|
|
case 'time_stats':
|
|
updateTimeStatsChart(content.data);
|
|
break;
|
|
case 'event_kinds':
|
|
updateEventKindsChart(content.data);
|
|
break;
|
|
case 'connections':
|
|
updateConnectionsGauge(content.data);
|
|
break;
|
|
// ... other handlers
|
|
}
|
|
}
|
|
```
|
|
|
|
### Selective Subscription
|
|
|
|
Subscribe to specific data types only:
|
|
|
|
```javascript
|
|
// Subscribe only to performance metrics
|
|
const performanceSubscription = {
|
|
kinds: [34567],
|
|
authors: [relayPubkey],
|
|
"#d": ["performance", "connections", "subscriptions"]
|
|
};
|
|
```
|
|
|
|
### Historical Data
|
|
|
|
Query past monitoring events:
|
|
|
|
```javascript
|
|
// Get last hour of time_stats
|
|
const historicalQuery = {
|
|
kinds: [34567],
|
|
authors: [relayPubkey],
|
|
"#d": ["time_stats"],
|
|
since: Math.floor(Date.now() / 1000) - 3600
|
|
};
|
|
```
|
|
|
|
---
|
|
|
|
## Database Schema
|
|
|
|
### Monitoring State Table
|
|
|
|
Track monitoring execution state:
|
|
|
|
```sql
|
|
CREATE TABLE IF NOT EXISTS monitoring_state (
|
|
data_type TEXT PRIMARY KEY,
|
|
last_execution INTEGER NOT NULL,
|
|
last_value TEXT,
|
|
execution_count INTEGER DEFAULT 0,
|
|
avg_query_time_ms REAL DEFAULT 0.0,
|
|
last_error TEXT,
|
|
enabled INTEGER DEFAULT 1,
|
|
created_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now')),
|
|
updated_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now'))
|
|
);
|
|
|
|
CREATE INDEX idx_monitoring_state_execution
|
|
ON monitoring_state(last_execution);
|
|
```
|
|
|
|
### Monitoring Subscribers Table
|
|
|
|
Track active monitoring subscribers:
|
|
|
|
```sql
|
|
CREATE TABLE IF NOT EXISTS monitoring_subscribers (
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
pubkey TEXT NOT NULL,
|
|
data_types TEXT, -- JSON array of subscribed data types
|
|
subscribed_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now')),
|
|
last_seen INTEGER NOT NULL DEFAULT (strftime('%s', 'now')),
|
|
active INTEGER DEFAULT 1
|
|
);
|
|
|
|
CREATE INDEX idx_monitoring_subscribers_pubkey
|
|
ON monitoring_subscribers(pubkey);
|
|
CREATE INDEX idx_monitoring_subscribers_active
|
|
ON monitoring_subscribers(active);
|
|
```
|
|
|
|
### Monitoring Metrics History (Optional)
|
|
|
|
Store historical metrics for trending:
|
|
|
|
```sql
|
|
CREATE TABLE IF NOT EXISTS monitoring_history (
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
data_type TEXT NOT NULL,
|
|
timestamp INTEGER NOT NULL,
|
|
data TEXT NOT NULL, -- JSON data
|
|
query_time_ms REAL
|
|
);
|
|
|
|
CREATE INDEX idx_monitoring_history_type_time
|
|
ON monitoring_history(data_type, timestamp);
|
|
|
|
-- Cleanup old history (keep last 7 days)
|
|
DELETE FROM monitoring_history
|
|
WHERE timestamp < strftime('%s', 'now', '-7 days');
|
|
```
|
|
|
|
---
|
|
|
|
## Implementation Plan
|
|
|
|
### Phase 1: Core Infrastructure (Week 1)
|
|
|
|
**Files to Create:**
|
|
- `src/monitoring.h` - Header with types and function declarations
|
|
- `src/monitoring.c` - Core monitoring system implementation
|
|
|
|
**Tasks:**
|
|
1. ✅ Design event structure and data types
|
|
2. ⬜ Implement monitoring manager initialization
|
|
3. ⬜ Create monitoring thread with timer system
|
|
4. ⬜ Add configuration loading from database
|
|
5. ⬜ Implement basic event generation and broadcasting
|
|
|
|
**Deliverables:**
|
|
- Working monitoring thread that generates events periodically
|
|
- Configuration system integrated with existing config table
|
|
- Basic event broadcasting to subscriptions
|
|
|
|
### Phase 2: Query Functions (Week 2)
|
|
|
|
**Files to Modify:**
|
|
- `src/monitoring.c` - Add query implementations
|
|
|
|
**Tasks:**
|
|
1. ⬜ Implement `query_time_stats()`
|
|
2. ⬜ Implement `query_event_kinds()`
|
|
3. ⬜ Implement `query_top_pubkeys()`
|
|
4. ⬜ Implement `query_connections()`
|
|
5. ⬜ Implement `query_subscriptions()`
|
|
6. ⬜ Implement `query_database_stats()`
|
|
7. ⬜ Implement `query_performance_stats()`
|
|
8. ⬜ Implement `query_recent_events()`
|
|
|
|
**Deliverables:**
|
|
- All 8 query functions working and tested
|
|
- Efficient SQL queries with minimal performance impact
|
|
- JSON output matching design specifications
|
|
|
|
### Phase 3: Trigger System (Week 3)
|
|
|
|
**Files to Modify:**
|
|
- `src/monitoring.c` - Add trigger implementation
|
|
- `src/main.c` - Add trigger hooks
|
|
- `src/websockets.c` - Add trigger hooks
|
|
|
|
**Tasks:**
|
|
1. ⬜ Implement threshold trigger system
|
|
2. ⬜ Implement event trigger system
|
|
3. ⬜ Implement change detection triggers
|
|
4. ⬜ Add trigger hooks to event storage
|
|
5. ⬜ Add trigger hooks to connection management
|
|
6. ⬜ Add trigger hooks to subscription management
|
|
|
|
**Deliverables:**
|
|
- Working trigger system for immediate updates
|
|
- Integration with existing relay operations
|
|
- Configurable trigger thresholds
|
|
|
|
### Phase 4: Load Management (Week 4)
|
|
|
|
**Files to Modify:**
|
|
- `src/monitoring.c` - Add load management
|
|
|
|
**Tasks:**
|
|
1. ⬜ Implement system load detection
|
|
2. ⬜ Implement throttle mode
|
|
3. ⬜ Implement pause mode
|
|
4. ⬜ Implement adaptive intervals
|
|
5. ⬜ Add subscriber count tracking
|
|
6. ⬜ Test under various load conditions
|
|
|
|
**Deliverables:**
|
|
- Load-aware monitoring system
|
|
- Automatic throttling under high load
|
|
- Subscriber-based optimization
|
|
|
|
### Phase 5: Frontend Integration (Week 5)
|
|
|
|
**Files to Create/Modify:**
|
|
- `api/monitoring.html` - Monitoring dashboard
|
|
- `api/monitoring.js` - Dashboard JavaScript
|
|
- `api/monitoring.css` - Dashboard styles
|
|
|
|
**Tasks:**
|
|
1. ⬜ Create monitoring dashboard UI
|
|
2. ⬜ Implement WebSocket subscription handling
|
|
3. ⬜ Create real-time charts and gauges
|
|
4. ⬜ Add historical data visualization
|
|
5. ⬜ Implement selective subscription controls
|
|
6. ⬜ Add export/download functionality
|
|
|
|
**Deliverables:**
|
|
- Working admin monitoring dashboard
|
|
- Real-time data visualization
|
|
- Historical data queries
|
|
|
|
### Phase 6: Testing & Documentation (Week 6)
|
|
|
|
**Files to Create:**
|
|
- `tests/monitoring_tests.sh` - Test suite
|
|
- `docs/monitoring_user_guide.md` - User documentation
|
|
|
|
**Tasks:**
|
|
1. ⬜ Write unit tests for query functions
|
|
2. ⬜ Write integration tests for monitoring system
|
|
3. ⬜ Test load management under stress
|
|
4. ⬜ Test frontend with multiple subscribers
|
|
5. ⬜ Write user documentation
|
|
6. ⬜ Write API documentation
|
|
|
|
**Deliverables:**
|
|
- Comprehensive test suite
|
|
- User and developer documentation
|
|
- Performance benchmarks
|
|
|
|
---
|
|
|
|
## Security Considerations
|
|
|
|
### Access Control
|
|
|
|
1. **Admin-Only Access**
|
|
- Monitoring events only sent to authorized admin pubkeys
|
|
- Check admin authorization before broadcasting
|
|
- Configurable via `monitoring_admin_only` setting
|
|
|
|
```c
|
|
int is_authorized_monitoring_subscriber(const char* pubkey) {
|
|
// Check if monitoring is admin-only
|
|
int admin_only = get_config_bool("monitoring_admin_only", 1);
|
|
if (!admin_only) {
|
|
return 1; // Open to all
|
|
}
|
|
|
|
// Check if pubkey matches admin pubkey
|
|
const char* admin_pubkey = get_config_value("admin_pubkey");
|
|
return (strcmp(pubkey, admin_pubkey) == 0);
|
|
}
|
|
```
|
|
|
|
2. **Subscription Limits**
|
|
- Maximum number of concurrent monitoring subscribers
|
|
- Prevents resource exhaustion
|
|
- Configurable via `monitoring_max_subscribers`
|
|
|
|
3. **Rate Limiting**
|
|
- Prevent abuse of monitoring system
|
|
- Limit subscription requests per IP/pubkey
|
|
- Automatic throttling under high load
|
|
|
|
### Data Privacy
|
|
|
|
1. **Sensitive Data Filtering**
|
|
- Don't expose full pubkeys in monitoring data
|
|
- Truncate or hash sensitive information
|
|
- Configurable data exposure levels
|
|
|
|
2. **Content Filtering**
|
|
- Don't include event content in monitoring data
|
|
- Only include metadata (kind, timestamp, etc.)
|
|
- Prevent information leakage
|
|
|
|
### Performance Protection
|
|
|
|
1. **Query Timeouts**
|
|
- All monitoring queries have strict timeouts
|
|
- Prevent long-running queries from blocking
|
|
- Automatic fallback to cached data
|
|
|
|
2. **Resource Limits**
|
|
- Maximum query result sizes
|
|
- Memory limits for monitoring data
|
|
- CPU usage monitoring and throttling
|
|
|
|
3. **Graceful Degradation**
|
|
- System continues working if monitoring fails
|
|
- Monitoring errors don't affect relay operations
|
|
- Automatic recovery from failures
|
|
|
|
---
|
|
|
|
## Future Enhancements
|
|
|
|
### Phase 7+ (Future)
|
|
|
|
1. **Advanced Analytics**
|
|
- Trend analysis and predictions
|
|
- Anomaly detection
|
|
- Automated alerting
|
|
|
|
2. **Custom Metrics**
|
|
- User-defined monitoring queries
|
|
- Custom data types
|
|
- Pluggable query system
|
|
|
|
3. **Multi-Relay Monitoring**
|
|
- Aggregate metrics from multiple relays
|
|
- Distributed monitoring
|
|
- Relay comparison tools
|
|
|
|
4. **Export & Integration**
|
|
- Prometheus metrics export
|
|
- Grafana integration
|
|
- CSV/JSON data export
|
|
|
|
5. **Mobile Dashboard**
|
|
- Mobile-optimized monitoring UI
|
|
- Push notifications
|
|
- Offline data caching
|
|
|
|
---
|
|
|
|
## Appendix A: Configuration Reference
|
|
|
|
### Complete Configuration Keys
|
|
|
|
```
|
|
monitoring_enabled - Enable/disable monitoring system
|
|
monitoring_admin_only - Restrict to admin pubkey only
|
|
monitoring_max_subscribers - Maximum concurrent subscribers
|
|
|
|
monitoring_interval_time_stats - Time stats update interval (seconds)
|
|
monitoring_interval_event_kinds - Event kinds update interval (seconds)
|
|
monitoring_interval_top_pubkeys - Top pubkeys update interval (seconds)
|
|
monitoring_interval_connections - Connections update interval (seconds)
|
|
monitoring_interval_subscriptions - Subscriptions update interval (seconds)
|
|
monitoring_interval_database - Database stats update interval (seconds)
|
|
monitoring_interval_performance - Performance update interval (seconds)
|
|
monitoring_interval_recent_events - Recent events update interval (seconds)
|
|
|
|
monitoring_load_threshold_cpu - CPU threshold for load management (%)
|
|
monitoring_load_threshold_memory - Memory threshold for load management (%)
|
|
monitoring_load_action - Action on high load (throttle/pause/disable)
|
|
monitoring_throttle_multiplier - Interval multiplier when throttling
|
|
|
|
monitoring_triggers_enabled - Enable/disable trigger system
|
|
monitoring_trigger_connection_threshold - Connection utilization trigger (%)
|
|
monitoring_trigger_subscription_threshold - Subscription utilization trigger (%)
|
|
monitoring_trigger_change_threshold - Change detection threshold (%)
|
|
```
|
|
|
|
---
|
|
|
|
## Appendix B: API Reference
|
|
|
|
### C API Functions
|
|
|
|
```c
|
|
// Initialization
|
|
int init_monitoring_system(void);
|
|
void cleanup_monitoring_system(void);
|
|
|
|
// Configuration
|
|
int load_monitoring_config(monitoring_config_t* config);
|
|
int update_monitoring_config(const char* key, const char* value);
|
|
void reload_monitoring_config(void);
|
|
|
|
// Timer Management
|
|
monitoring_timer_t* create_monitoring_timer(const char* data_type,
|
|
int interval_seconds,
|
|
query_func_t query_func);
|
|
int should_execute_timer(monitoring_timer_t* timer);
|
|
int execute_monitoring_timer(monitoring_timer_t* timer);
|
|
|
|
// Query Functions
|
|
char* query_time_stats(void);
|
|
char* query_event_kinds(void);
|
|
char* query_top_pubkeys(void);
|
|
char* query_connections(void);
|
|
char* query_subscriptions(void);
|
|
char* query_database_stats(void);
|
|
char* query_performance_stats(void);
|
|
char* query_recent_events(void);
|
|
|
|
// Event Generation
|
|
cJSON* generate_monitoring_event(const char* data_type,
|
|
const char* json_data,
|
|
int interval_seconds);
|
|
int broadcast_monitoring_event(cJSON* event);
|
|
|
|
// Trigger System
|
|
int register_event_trigger(trigger_event_type_t event_type,
|
|
const char* data_type,
|
|
query_func_t query_func);
|
|
void fire_event_trigger(trigger_event_type_t event_type);
|
|
|
|
// Load Management
|
|
system_load_t get_system_load(void);
|
|
int is_high_load(system_load_t* load, monitoring_config_t* config);
|
|
void apply_throttle_mode(monitoring_manager_t* manager);
|
|
void apply_pause_mode(monitoring_manager_t* manager);
|
|
|
|
// Subscriber Management
|
|
int is_authorized_monitoring_subscriber(const char* pubkey);
|
|
int add_monitoring_subscriber(const char* pubkey, const char* data_types);
|
|
int remove_monitoring_subscriber(const char* pubkey);
|
|
int get_subscriber_count(void);
|
|
```
|
|
|
|
---
|
|
|
|
## Appendix C: Example Usage
|
|
|
|
### Admin Dashboard Subscription
|
|
|
|
```javascript
|
|
// Connect to relay
|
|
const relay = new WebSocket('ws://localhost:8888');
|
|
|
|
// Subscribe to all monitoring events
|
|
relay.send(JSON.stringify([
|
|
"REQ",
|
|
"monitoring-sub",
|
|
{
|
|
kinds: [34567],
|
|
authors: [relayPubkey],
|
|
"#relay": [relayPubkey]
|
|
}
|
|
]));
|
|
|
|
// Handle incoming events
|
|
relay.onmessage = (msg) => {
|
|
const [type, subId, event] = JSON.parse(msg.data);
|
|
|
|
if (type === 'EVENT' && event.kind === 34567) {
|
|
const dTag = event.tags.find(t => t[0] === 'd')[1];
|
|
const content = JSON.parse(event.content);
|
|
|
|
console.log(`Received ${dTag} update:`, content.data);
|
|
updateDashboard(dTag, content.data);
|
|
}
|
|
};
|
|
```
|
|
|
|
### Configuration Update
|
|
|
|
```bash
|
|
# Enable monitoring
|
|
curl -X POST http://localhost:8888/api/config \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"key": "monitoring_enabled", "value": "true"}'
|
|
|
|
# Set update interval
|
|
curl -X POST http://localhost:8888/api/config \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"key": "monitoring_interval_time_stats", "value": "30"}'
|
|
```
|
|
|
|
---
|
|
|
|
**End of Design Document** |