signalk-parquet

Version:

SignalK plugin and webapp that archives SK data to Parquet files with a regimen control system, advanced querying, Claude integrated AI analysis, spatial capabilities, and REST API.

github.com/motamman/signalk-parquet

motamman/signalk-parquet

1,225 lines (996 loc) • 73.5 kB

Markdown

# <img src="public/parquet.png" alt="SignalK Parquet Data Store" width="72" height="72" style="vertical-align: middle; margin-right: 20px;"> SignalK Parquet Data Store A comprehensive SignalK plugin and webapp that saves SignalK data directly to Parquet files with manual and automated regimen-based archiving and advanced querying features, including a REST API built on the SignalK History API, Claude AI history data analysis, and spatial geographic analysis capabilities. ## Features ### Core Data Management - **Smart Data Types**: Intelligent Parquet schema detection preserves native data types (DOUBLE, BOOLEAN) instead of forcing everything to strings - **Multiple File Formats**: Support for Parquet, JSON, and CSV output formats (querying in parquet only) - **Daily Consolidation**: Automatic daily file consolidation with S3 upload capabilities - **Near Real-time Buffering**: Efficient data buffering with configurable thresholds ### Data Validation & Schema Repair - **NEW Schema Validation**: Comprehensive validation of Parquet file schemas against SignalK metadata standards - **NEW Automated Repair**: One-click repair of schema violations with proper data type conversion - **NEW Type Correction**: Automatic conversion of incorrectly stored data types (e.g., numeric strings → DOUBLE, boolean strings → BOOLEAN) - **NEW Metadata Integration**: Uses SignalK metadata (units, types) to determine correct data types for marine measurements - **NEW Safe Operations**: Creates backups before repair and quarantines corrupted files for safety - *NEW *Progress Tracking**: Real-time progress monitoring with cancellation support for large datasets #### Benefits of Proper Data Types Using correct data types in Parquet files provides significant advantages: - **Storage Efficiency**: Numeric data stored as DOUBLE uses ~50% less space than string representations - **Query Performance**: Native numeric operations are 5-10x faster than string parsing during analysis - **Data Integrity**: Type validation prevents data corruption and ensures consistent analysis results - **Analytics Compatibility**: Proper types enable advanced statistical analysis and machine learning applications - **Compression**: Parquet's columnar compression works optimally with correctly typed data #### Validation Process The validation system checks each Parquet file for: - **Field Type Consistency**: Ensures numeric marine data (position, speed, depth) is stored as DOUBLE - **Boolean Representation**: Validates true/false values are stored as BOOLEAN, not strings - **Metadata Alignment**: Compares file schemas against SignalK metadata for units like meters, volts, amperes - **Schema Standards**: Enforces data best practices for long-term data integrity ### Advanced Querying - **SignalK History API Compliance**: Full compliance with SignalK History API specifications - **Standard Time Parameters**: All 5 standard query patterns supported - **Time-Filtered Discovery**: Paths and contexts filtered by time range - **Optional Analytics**: Moving averages (EMA/SMA) available on demand - **🔄 NEW: Automatic Unit Conversion**: Optional integration with `signalk-units-preference` plugin - Server-side conversion to user's preferred units (knots, km/h, °F, °C, etc.) - Add `?convertUnits=true` to any history query - Respects all unit preferences configured in units-preference plugin - Configurable cache (1-60 minutes) balances performance vs. responsiveness - Conversion metadata included in response - **🌍 NEW: Timezone Conversion**: Convert UTC timestamps to local or specified timezone - Add `?convertTimesToLocal=true` to convert timestamps to local time - Optional `&timezone=America/New_York` for custom IANA timezone - Automatic daylight saving time handling - Clean ISO 8601 format with offset (e.g., `2025-10-20T12:34:04-04:00`) - **Flexible Time Querying**: Multiple ways to specify time ranges - Query from now, from specific times, or between time ranges - Duration-based windows (1h, 30m, 2d) for easy relative queries - Forward and backward time querying support - **Time Alignment**: Automatic alignment of data from different sensors using time bucketing - **DuckDB Integration**: Direct SQL querying of Parquet files with type-safe operations - **🌍 Spatial Analysis**: Advanced geographic analysis with DuckDB spatial extension - **Track Analysis**: Calculate vessel tracks, distances, and movement patterns - **Proximity Detection**: Multi-vessel distance calculations and collision risk analysis - **Geographic Visualization**: Generate movement boundaries, centroids, and spatial statistics - **Route Planning**: Historical track analysis for route optimization and performance analysis ### Management & Control - **Command Management**: Register, execute, and manage SignalK commands with automatic path configuration - **Regimen-Based Data Collection**: Control data collection with command-based regimens - **Multi-Vessel Support**: Wildcard vessel contexts (`vessels.*`) with MMSI-based exclusion filtering - **Source Filtering**: Filter data by SignalK source labels (bypasses server arbitration for raw data access) - **Comprehensive REST API**: Full programmatic control of queries and configuration ### User Interface & Integration - **Responsive Web Interface**: Complete web-based management interface - **S3 Integration**: Upload files to Amazon S3 with configurable timing and conflict resolution - **Context Support**: Support for multiple vessel contexts with exclusion controls ### Regimen System (Advanced) - **Operational Context Tracking**: Define regimens for operational states (mooring, anchoring, racing, passage-making) - **Command-Based Episodes**: Track state transitions using SignalK commands as regimen triggers - **Keyword Mapping**: Associate keywords with commands for intelligent Claude AI context matching - **Episode Boundary Detection**: Sophisticated SQL-based detection of operational periods using CTEs and window functions - **Contextual Data Collection**: Link SignalK paths to regimens for targeted data analysis during specific operations - **Web Interface Management**: Create, edit, and manage regimens and command keywords through the web UI ### NEW Threshold Automation - **NEW Per-Command Conditions**: Each regimen/command can define one or more thresholds that watch a single SignalK path. - **NEW True-Only Actions**: On every path update the condition is evaluated; when it is true the command is set to the threshold's `activateOnMatch` state (ON/OFF). False evaluations leave the command untouched, so use a second threshold if you want a different level to switch it back. - **NEW Stable Triggers**: Optional hysteresis (seconds) suppresses re-firing while the condition remains true, preventing rapid toggling in noisy data. - **NEW Multiple Thresholds Per Path**: Unique monitor keys allow several thresholds to observe the same SignalK path without cancelling each other. - **NEW Unit Handling**: Threshold values must match the live SignalK units (e.g., fractional 0–1 SoC values). Angular thresholds are entered in degrees in the UI and stored as radians automatically. - **NEW Automation State Machine**: When enabling automation, command is set to OFF then all thresholds are immediately evaluated. When disabling automation, threshold monitoring stops and command state remains unchanged. Default state is hardcoded to OFF on server side. ### Claude AI Integration - **AI-Powered Analysis**: Advanced maritime data analysis using Claude AI models (Opus 4, Sonnet 4) - **Regimen-Based Analysis**: Context-aware episode detection for operational states (mooring, anchoring, sailing) - **Command Integration**: Keyword-based regimen matching with customizable command configurations - **Episode Detection**: Sophisticated boundary detection for operational transitions ) - **Multi-Vessel Support**: Real-time data access from self vessel and other vessels via SignalK - **Conversation Continuity**: Follow-up questions with preserved context and specialized tools - **Timezone Intelligence**: Automatic UTC-to-local time conversion based on system timezone - **Custom Analysis**: Create custom analysis prompts for specific operational needs ## Requirements ### Core Requirements - SignalK Server v1.x or v2.x - Node.js 18+ (included with SignalK) ### Optional Plugin Integration - **signalk-units-preference** (v0.7.0+): Required for automatic unit conversion feature - Install from: https://github.com/motamman/signalk-units-preference - Provides server-side unit conversion based on user preferences - The history API will work without this plugin, but `convertUnits=true` will have no effect ## Installation ### Install from GitHub ```bash # Navigate to folder cd ~/.signalk/node_modules/ # Install from npm (after publishing) npm install signalk-parquet # Or install from GitHub npm install motamman/signalk-parquet cd ~/.signalk/node_modules/signalk-parquet npm run build # Restart SignalK sudo systemctl restart signalk ``` ## ⚠️ IMPORTANT IF UPGRADING FROM 0.5.0-beta.3: Consolidation Bug Fix **THIS FIXES A RECURSIVE BUG THAT WAS CREATING NESTED PROCESSED DIRECTORIES AND REPEATEDLY PROCESSING THE SAME FILES. THIS SHOULD FIX THAT PROBLEM BUT ANY `processed` FOLDERS NESTED INSIDE A `processed` FOLDER SHOULD BE MANUALLY DELETED.** ### Cleaning Up Nested Processed Directories No action is likely needed if upgrading from 0.5.0-beta.4 or better. If you're upgrading from a previous version, you may have nested processed directories that need cleanup: ```bash # Check for nested processed directories find data -name "*processed*" -type d | head -20 # See the deepest nesting levels find data -name "*processed*" -type d | awk -F'/' '{print NF-1, $0}' | sort -nr | head -5 # Count files in nested processed directories find data -path "*/processed/processed/*" -type f | wc -l # Remove ALL nested processed directories (RECOMMENDED) find data -name "processed" -type d -exec rm -rf {} + # Verify cleanup completed find data -path "*/processed/processed/*" -type f | wc -l # Should show 0 ``` **Note**: The processed directories only contain files that were moved during consolidation - removing them does not delete your original data. ### Development Setup ```bash # Clone or copy the signalk-parquet directory cd signalk-parquet # Install dependencies npm install # Build the TypeScript code npm run build # Copy to SignalK plugins directory cp -r . ~/.signalk/node_modules/signalk-parquet/ # Restart SignalK sudo systemctl restart signalk ``` ### Production Build ```bash # Build for production npm run build # The compiled JavaScript will be in the dist/ directory ``` ## Configuration ### Plugin Configuration Navigate to **SignalK Admin → Server → Plugin Config → SignalK Parquet Data Store** Configure basic plugin settings (path configuration is managed separately in the web interface): | Setting | Description | Default | |---------|-------------|---------| | **Buffer Size** | Number of records to buffer before writing | 1000 | | **Save Interval** | How often to save buffered data (seconds) | 30 | | **Output Directory** | Directory to save data files | SignalK data directory | | **Filename Prefix** | Prefix for generated filenames | `signalk_data` | | **File Format** | Output format (parquet, json, csv) | `parquet` | | **Retention Days** | Days to keep processed files | 7 | | **Unit Conversion Cache Duration** 🆕 | How long to cache unit conversions before reloading (minutes) | 5 | > **Note**: The Unit Conversion Cache Duration setting controls how quickly changes to unit preferences (in the signalk-units-preference plugin) are reflected in the history API. Lower values (1-2 minutes) reflect changes faster but use more resources. Higher values (30-60 minutes) reduce overhead but take longer to reflect changes. The default of 5 minutes provides a good balance for most users. ### S3 Upload Configuration Configure S3 upload settings in the plugin configuration: | Setting | Description | Default | |---------|-------------|---------| | **Enable S3 Upload** | Enable uploading to Amazon S3 | `false` | | **Upload Timing** | When to upload (realtime/consolidation) | `consolidation` | | **S3 Bucket** | Name of S3 bucket | - | | **AWS Region** | AWS region for S3 bucket | `us-east-1` | | **Key Prefix** | S3 object key prefix | - | | **Access Key ID** | AWS credentials (optional) | - | | **Secret Access Key** | AWS credentials (optional) | - | | **Delete After Upload** | Delete local files after upload | `false` | ### Claude AI Configuration Configure Claude AI integration in the plugin configuration for advanced data analysis: | Setting | Description | Default | |---------|-------------|---------| | **Enable Claude Integration** | Enable AI-powered data analysis | `false` | | **API Key** | Anthropic Claude API key (required) | - | | **Model** | Claude model to use for analysis | `claude-3-7-sonnet-20250219` | | **Max Tokens** | Maximum tokens for AI responses | `4000` | | **Temperature** | AI creativity level (0-1) | `0.3` | #### Supported Claude Models | Model | Description | Use Case | |-------|-------------|----------| | `claude-opus-4-1-20250805` | Latest Opus model - highest intelligence | Complex analysis, detailed insights | | `claude-opus-4-20250514` | Opus model - very high intelligence | Advanced analysis | | `claude-sonnet-4-20250514` | Sonnet model - balanced performance | **Recommended default** | #### Getting a Claude API Key 1. Visit [Anthropic Console](https://console.anthropic.com/) 2. Create an account or sign in 3. Navigate to **API Keys** section 4. Generate a new API key 5. Copy the key and paste it in the plugin configuration **Note**: Claude AI analysis requires an active Anthropic API subscription. Usage is billed based on tokens consumed during analysis. ## Path Configuration **Important**: Path configuration is managed exclusively through the web interface, not in the SignalK admin interface. This provides a more intuitive interface for managing data collection paths. ### Accessing Path Configuration 1. Navigate to: `http://localhost:3000/plugins/signalk-parquet` 2. Click the **⚙️ Path Configuration** tab ### Adding Data Paths Use the web interface to configure which SignalK paths to collect: 1. Click **➕ Add New Path** 2. Configure the path settings: - **SignalK Path**: The SignalK data path (e.g., `navigation.position`) - **Always Enabled**: Collect data regardless of regimen state - **Regimen Control**: Command name that controls collection - **Source Filter**: Only collect from specific sources - **Context**: SignalK context (`vessels.self`, `vessels.*`, or specific vessel) - **Exclude MMSI**: For `vessels.*` context, exclude specific MMSI numbers 3. Click **✅ Add Path** ### Managing Existing Paths - **Edit Path**: Click ✏️ Edit button to modify path settings - **Delete Path**: Click 🗑️ Remove button to delete a path - **Refresh**: Click 🔄 Refresh Paths to reload configuration - **Show/Hide Commands**: Toggle button to show/hide command paths in the table ### Command Management The plugin streamlines command management with automatic path configuration: 1. **Register Command**: Commands are automatically registered with enabled path configurations 2. **Start Command**: Click **Start** button to activate a command regimen 3. **Stop Command**: Click **Stop** button to deactivate a command regimen 4. **Remove Command**: Click **Remove** button to delete a command and its path configuration This eliminates the previous 3-step process of registering commands, adding paths, and enabling them separately. ### Path Configuration Storage Path configurations are stored separately from plugin configuration in: ``` ~/.signalk/signalk-parquet/webapp-config.json ``` This allows for: - Independent management of path configurations - Better separation of concerns - Easier backup and migration of path settings - More intuitive web-based configuration interface ### Regimen-Based Control Regimens allow you to control data collection based on SignalK commands: **Example**: Weather data collection with source filtering ```json { "path": "environment.wind.angleApparent", "enabled": false, "regimen": "captureWeather", "source": "mqtt-weatherflow-udp", "context": "vessels.self" } ``` **Note**: Source filtering accesses raw data before SignalK server arbitration, allowing collection of data from specific sources that might otherwise be filtered out. **Multi-Vessel Example**: Collect navigation data from all vessels except specific MMSI numbers ```json { "path": "navigation.position", "enabled": true, "context": "vessels.*", "excludeMMSI": ["123456789", "987654321"] } ``` **Command Path**: Command paths are automatically created when registering commands ```json { "path": "commands.captureWeather", "enabled": true, "context": "vessels.self" } ``` This path will only collect data when the command `commands.captureWeather` is active. ## TypeScript Architecture ### Type Safety The plugin uses comprehensive TypeScript interfaces: ```typescript interface PluginConfig { bufferSize: number; saveIntervalSeconds: number; outputDirectory: string; filenamePrefix: string; fileFormat: 'json' | 'csv' | 'parquet'; paths: PathConfig[]; s3Upload: S3UploadConfig; } interface PathConfig { path: string; enabled: boolean; regimen?: string; source?: string; context: string; excludeMMSI?: string[]; } interface DataRecord { received_timestamp: string; signalk_timestamp: string; context: string; path: string; value: any; source_label?: string; meta?: string; } ``` ### Plugin State Management The plugin maintains typed state: ```typescript interface PluginState { unsubscribes: Array<() => void>; dataBuffers: Map<string, DataRecord[]>; activeRegimens: Set<string>; subscribedPaths: Set<string>; parquetWriter?: ParquetWriter; s3Client?: any; currentConfig?: PluginConfig; } ``` ### Express Router Types API routes are fully typed: ```typescript router.get('/api/paths', (_: TypedRequest, res: TypedResponse<PathsApiResponse>) => { // Typed request/response handling } ); ``` ## Data Output Structure ### File Organization ``` output_directory/ ├── vessels/ │ └── self/ │ ├── navigation/ │ │ ├── position/ │ │ │ ├── signalk_data_20250716T120000.parquet │ │ │ └── signalk_data_20250716_consolidated.parquet │ │ └── speedOverGround/ │ └── environment/ │ └── wind/ │ └── angleApparent/ └── processed/ └── [moved files after consolidation] ``` ### Data Schema Each record contains: | Field | Type | Description | |-------|------|-------------| | `received_timestamp` | string | When the plugin received the data | | `signalk_timestamp` | string | Original SignalK timestamp | | `context` | string | SignalK context (e.g., `vessels.self`) | | `path` | string | SignalK path | | `value` | DOUBLE/BOOLEAN/INT64/UTF8 | **Smart typed values** - numbers stored as DOUBLE, booleans as BOOLEAN, etc. | | `value_json` | string | JSON representation for complex values | | `source` | string | Complete source information | | `source_label` | string | Source label | | `source_type` | string | Source type | | `source_pgn` | number | PGN number (if applicable) | | `meta` | string | Metadata information | #### Smart Data Types The plugin now intelligently detects and preserves native data types: - **Numbers**: Stored as `DOUBLE` (floating point) or `INT64` (integers) - **Booleans**: Stored as `BOOLEAN` - **Strings**: Stored as `UTF8` - **Objects**: Serialized to JSON and stored as `UTF8` - **Mixed Types**: Falls back to `UTF8` when a path contains multiple data types This provides better compression, faster queries, and proper type safety for data analysis. ## Web Interface ### Features - **Path Configuration**: Manage data collection paths with multi-vessel support - **Command Management**: Streamlined command registration and control - **Data Exploration**: Browse available data paths - **SQL Queries**: Execute DuckDB queries against Parquet files - **History API**: Query historical data using SignalK History API endpoints - **S3 Status**: Test S3 connectivity and configuration - **Responsive Design**: Works on desktop and mobile - **MMSI Filtering**: Exclude specific vessels from wildcard contexts ### API Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/api/paths` | GET | List available data paths | | `/api/files/:path` | GET | List files for a path | | `/api/sample/:path` | GET | Sample data from a path | | `/api/query` | POST | Execute SQL query | | `/api/config/paths` | GET/POST/PUT/DELETE | Manage path configurations | | `/api/test-s3` | POST | Test S3 connection | | `/api/health` | GET | Health check | | **Claude AI Analysis API** | | | | `/api/analyze` | POST | Perform AI analysis on data | | `/api/analyze/templates` | GET | Get available analysis templates | | `/api/analyze/followup` | POST | Follow-up analysis questions | | `/api/analyze/history` | GET | Get analysis history | | `/api/analyze/test-connection` | POST | Test Claude API connection | | **SignalK History API** | | | | `/signalk/v1/history/values` | GET | SignalK History API - Get historical values | | `/signalk/v1/history/contexts` | GET | SignalK History API - Get available contexts | | `/signalk/v1/history/paths` | GET | SignalK History API - Get available paths | ## DuckDB Integration ### Query Examples #### Basic Queries ```sql -- Get latest 10 records from navigation position SELECT * FROM read_parquet('/path/to/navigation/position/*.parquet', union_by_name=true) ORDER BY received_timestamp DESC LIMIT 10; -- Count total records SELECT COUNT(*) FROM read_parquet('/path/to/navigation/position/*.parquet', union_by_name=true); -- Filter by source SELECT * FROM read_parquet('/path/to/environment/wind/*.parquet', union_by_name=true) WHERE source_label = 'mqtt-weatherflow-udp' ORDER BY received_timestamp DESC LIMIT 100; -- Aggregate by hour SELECT DATE_TRUNC('hour', received_timestamp::timestamp) as hour, AVG(value::double) as avg_value, COUNT(*) as record_count FROM read_parquet('/path/to/data/*.parquet', union_by_name=true) GROUP BY hour ORDER BY hour; ``` #### 🌍 Spatial Analysis Queries ```sql -- Calculate distance traveled over time WITH ordered_positions AS ( SELECT signalk_timestamp, ST_Point(value_longitude, value_latitude) as position, LAG(ST_Point(value_longitude, value_latitude)) OVER (ORDER BY signalk_timestamp) as prev_position FROM read_parquet('data/vessels/urn_mrn_imo_mmsi_368396230/navigation/position/*.parquet', union_by_name=true) WHERE signalk_timestamp >= '2025-09-27T16:00:00Z' AND signalk_timestamp <= '2025-09-27T23:59:59Z' AND value_latitude IS NOT NULL AND value_longitude IS NOT NULL ), distances AS ( SELECT *, CASE WHEN prev_position IS NOT NULL THEN ST_Distance_Sphere(position, prev_position) ELSE 0 END as distance_meters FROM ordered_positions ) SELECT strftime(date_trunc('hour', signalk_timestamp::TIMESTAMP), '%Y-%m-%dT%H:%M:%SZ') as time_bucket, AVG(value_latitude) as avg_lat, AVG(value_longitude) as avg_lon, ST_AsText(ST_Centroid(ST_Collect(position))) as centroid, SUM(distance_meters) as total_distance_meters, COUNT(*) as position_records, ST_AsText(ST_ConvexHull(ST_Collect(position))) as movement_area FROM distances GROUP BY time_bucket ORDER BY time_bucket; -- Multi-vessel proximity analysis SELECT v1.context as vessel1, v2.context as vessel2, ST_Distance_Sphere( ST_Point(v1.value_longitude, v1.value_latitude), ST_Point(v2.value_longitude, v2.value_latitude) ) as distance_meters, v1.signalk_timestamp FROM read_parquet('data/vessels/*/navigation/position/*.parquet', union_by_name=true) v1 JOIN read_parquet('data/vessels/*/navigation/position/*.parquet', union_by_name=true) v2 ON v1.signalk_timestamp = v2.signalk_timestamp AND v1.context != v2.context WHERE v1.signalk_timestamp >= '2025-09-27T00:00:00Z' AND ST_Distance_Sphere( ST_Point(v1.value_longitude, v1.value_latitude), ST_Point(v2.value_longitude, v2.value_latitude) ) < 1000 -- Within 1km ORDER BY distance_meters; -- Advanced movement analysis with bounding boxes WITH ordered_positions AS ( SELECT signalk_timestamp, ST_Point(value_longitude, value_latitude) as position, value_latitude, value_longitude, LAG(ST_Point(value_longitude, value_latitude)) OVER (ORDER BY signalk_timestamp) as prev_position, strftime(date_trunc('hour', signalk_timestamp::TIMESTAMP), '%Y-%m-%dT%H:%M:%SZ') as time_bucket FROM read_parquet('data/vessels/urn_mrn_imo_mmsi_368396230/navigation/position/*.parquet', union_by_name=true) WHERE signalk_timestamp >= '2025-09-27T16:00:00Z' AND signalk_timestamp <= '2025-09-27T23:59:59Z' AND value_latitude IS NOT NULL AND value_longitude IS NOT NULL ), distances AS ( SELECT *, CASE WHEN prev_position IS NOT NULL THEN ST_Distance_Sphere(position, prev_position) ELSE 0 END as distance_meters FROM ordered_positions ) SELECT time_bucket, AVG(value_latitude) as avg_lat, AVG(value_longitude) as avg_lon, -- Calculate bounding box manually MIN(value_latitude) as min_lat, MAX(value_latitude) as max_lat, MIN(value_longitude) as min_lon, MAX(value_longitude) as max_lon, -- Distance and movement metrics SUM(distance_meters) as total_distance_meters, ROUND(SUM(distance_meters) / 1000.0, 2) as total_distance_km, COUNT(*) as position_records, -- Movement area approximation using bounding box (MAX(value_latitude) - MIN(value_latitude)) * 111320 * (MAX(value_longitude) - MIN(value_longitude)) * 111320 * COS(RADIANS(AVG(value_latitude))) as approx_area_m2 FROM distances GROUP BY time_bucket ORDER BY time_bucket; ``` #### Available Spatial Functions - `ST_Point(longitude, latitude)` - Create point geometries - `ST_Distance_Sphere(point1, point2)` - Calculate distances in meters - `ST_AsText(geometry)` - Convert to Well-Known Text format - `ST_Centroid(ST_Collect(points))` - Find center of multiple points - `ST_ConvexHull(ST_Collect(points))` - Create movement boundary polygons ## History API Integration The plugin provides full SignalK History API compliance, allowing you to query historical data using standard SignalK API endpoints with enhanced performance and filtering capabilities. ### Available Endpoints | Endpoint | Description | Parameters | |----------|-------------|------------| | `/signalk/v1/history/values` | Get historical values for specified paths | **Standard patterns** (see below)<br>**Optional**: `resolution`, `refresh`, `includeMovingAverages`, `useUTC` | | `/signalk/v1/history/contexts` | Get available vessel contexts for time range | **Time Range**: Any standard pattern (see below)<br>Returns only contexts with data in specified range | | `/signalk/v1/history/paths` | Get available SignalK paths for time range | **Time Range**: Any standard pattern (see below)<br>Returns only paths with data in specified range | ### Standard Time Range Patterns The History API supports 5 standard SignalK time query patterns: | Pattern | Parameters | Description | Example | |---------|-----------|-------------|---------| | **1** | `duration` | Query back from now | `?duration=1h` | | **2** | `from` + `duration` | Query forward from start | `?from=2025-01-01T00:00:00Z&duration=1h` | | **3** | `to` + `duration` | Query backward to end | `?to=2025-01-01T12:00:00Z&duration=1h` | | **4** | `from` | From start to now | `?from=2025-01-01T00:00:00Z` | | **5** | `from` + `to` | Specific range | `?from=2025-01-01T00:00:00Z&to=2025-01-02T00:00:00Z` | **Legacy Support**: The `start` parameter (used with `duration`) is deprecated but still supported for backward compatibility. A console warning will be shown. Use standard patterns instead. ### Query Parameters | Parameter | Description | Format | Examples | |-----------|-------------|---------|----------| | **Required for `/values`:** | | | | | `paths` | SignalK paths with optional aggregation | `path:method,path:method` | `navigation.position:first,wind.speed:average` | | **Time Range:** | Use one of the 5 standard patterns above | | | | `duration` | Time period | `[number][unit]` | `1h`, `30m`, `15s`, `2d` | | `from` | Start time (ISO 8601) | ISO datetime | `2025-01-01T00:00:00Z` | | `to` | End time (ISO 8601) | ISO datetime | `2025-01-01T06:00:00Z` | | **Optional:** | | | | | `context` | Vessel context | `vessels.self` or `vessels.<id>` | `vessels.self` (default) | | `resolution` | Time bucket size in milliseconds | Number | `60000` (1 minute buckets) | | `refresh` | Enable auto-refresh (pattern 1 only) | `true` or `1` | `refresh=true` | | `includeMovingAverages` | Include EMA/SMA calculations | `true` or `1` | `includeMovingAverages=true` | | `useUTC` | Treat datetime inputs as UTC | `true` or `1` | `useUTC=true` | | `convertUnits` | 🆕 Convert to preferred units (requires signalk-units-preference plugin) | `true` or `1` | `convertUnits=true` | | `convertTimesToLocal` | 🆕 Convert timestamps to local/specified timezone | `true` or `1` | `convertTimesToLocal=true` | | `timezone` | 🆕 IANA timezone ID (used with convertTimesToLocal) | IANA timezone | `timezone=America/New_York` | | **Deprecated:** | | | | | `start` | ⚠️ Use standard patterns instead | `now` or ISO datetime | Deprecated, use `duration` or `from`/`to` | ### Query Examples #### Pattern 1: Duration Only (Query back from now) ```bash # Last hour of wind data curl "http://localhost:3000/signalk/v1/history/values?duration=1h&paths=environment.wind.speedApparent" # Last 30 minutes with moving averages curl "http://localhost:3000/signalk/v1/history/values?duration=30m&paths=environment.wind.speedApparent&includeMovingAverages=true" # Real-time with auto-refresh curl "http://localhost:3000/signalk/v1/history/values?duration=15m&paths=navigation.position&refresh=true" ``` #### Pattern 2: From + Duration (Query forward) ```bash # 6 hours forward from specific time curl "http://localhost:3000/signalk/v1/history/values?from=2025-01-01T00:00:00Z&duration=6h&paths=navigation.position" ``` #### Pattern 3: To + Duration (Query backward) ```bash # 2 hours backward to specific time curl "http://localhost:3000/signalk/v1/history/values?to=2025-01-01T12:00:00Z&duration=2h&paths=environment.wind.speedApparent" ``` #### Pattern 4: From Only (From start to now) ```bash # From specific time until now curl "http://localhost:3000/signalk/v1/history/values?from=2025-01-01T00:00:00Z&paths=navigation.speedOverGround" ``` #### Pattern 5: From + To (Specific range) ```bash # Specific 24-hour period curl "http://localhost:3000/signalk/v1/history/values?from=2025-01-01T00:00:00Z&to=2025-01-02T00:00:00Z&paths=navigation.position" ``` #### Advanced Query Examples **Multiple paths with time alignment:** ```bash curl "http://localhost:3000/signalk/v1/history/values?duration=6h&paths=environment.wind.angleApparent,environment.wind.speedApparent,navigation.position&resolution=60000" ``` **Multiple aggregations of same path:** ```bash curl "http://localhost:3000/signalk/v1/history/values?from=2025-01-01T00:00:00Z&to=2025-01-01T06:00:00Z&paths=environment.wind.speedApparent:average,environment.wind.speedApparent:min,environment.wind.speedApparent:max&resolution=60000" ``` **With moving averages for trend analysis:** ```bash curl "http://localhost:3000/signalk/v1/history/values?duration=24h&paths=electrical.batteries.512.voltage&includeMovingAverages=true&resolution=300000" ``` **Different temporal samples:** ```bash curl "http://localhost:3000/signalk/v1/history/values?duration=1h&paths=navigation.position:first,navigation.position:middle_index,navigation.position:last&resolution=60000" ``` #### Context and Path Discovery **Get contexts with data in last hour:** ```bash curl "http://localhost:3000/signalk/v1/history/contexts?duration=1h" ``` **Get contexts for specific time range:** ```bash curl "http://localhost:3000/signalk/v1/history/contexts?from=2025-01-01T00:00:00Z&to=2025-01-07T00:00:00Z" ``` **Get available paths with recent data:** ```bash curl "http://localhost:3000/signalk/v1/history/paths?duration=24h" ``` **Get all paths (no time filter):** ```bash curl "http://localhost:3000/signalk/v1/history/paths" ``` #### Unit Conversion (NEW in v0.6.0) **Convert to user's preferred units:** ```bash # Speed in knots (if configured in signalk-units-preference) curl "http://localhost:3000/signalk/v1/history/values?duration=2d&paths=navigation.speedOverGround&convertUnits=true" # Wind speed in preferred units (knots, km/h, or mph) curl "http://localhost:3000/signalk/v1/history/values?duration=1h&paths=environment.wind.speedApparent&convertUnits=true" # Temperature in preferred units (°C or °F) curl "http://localhost:3000/signalk/v1/history/values?duration=24h&paths=environment.outside.temperature&convertUnits=true" ``` **Response includes conversion metadata:** ```json { "values": [{"path": "navigation.speedOverGround", "method": "average"}], "data": [["2025-10-20T16:12:14Z", 5.2]], "units": { "converted": true, "conversions": [{ "path": "navigation.speedOverGround", "baseUnit": "m/s", "targetUnit": "knots", "symbol": "kn" }] } } ``` #### Timezone Conversion (NEW in v0.6.0) **Convert to server's local time:** ```bash curl "http://localhost:3000/signalk/v1/history/values?duration=2d&paths=environment.wind.speedApparent&convertTimesToLocal=true" ``` **Convert to specific timezone:** ```bash # New York time (Eastern) curl "http://localhost:3000/signalk/v1/history/values?duration=2d&paths=navigation.position&convertTimesToLocal=true&timezone=America/New_York" # London time curl "http://localhost:3000/signalk/v1/history/values?duration=1h&paths=environment.wind.speedApparent&convertTimesToLocal=true&timezone=Europe/London" # Tokyo time curl "http://localhost:3000/signalk/v1/history/values?duration=1h&paths=navigation.speedOverGround&convertTimesToLocal=true&timezone=Asia/Tokyo" ``` **Response includes timezone metadata:** ```json { "range": { "from": "2025-10-20T12:12:19-04:00", "to": "2025-10-20T13:12:19-04:00" }, "data": [ ["2025-10-20T12:12:14-04:00", 5.84], ["2025-10-20T12:12:28-04:00", 5.26] ], "timezone": { "converted": true, "targetTimezone": "America/New_York", "offset": "-04:00", "description": "Converted to user-specified timezone: America/New_York (-04:00)" } } ``` **Combine both conversions:** ```bash # Convert values to knots AND timestamps to New York time curl "http://localhost:3000/signalk/v1/history/values?duration=2d&paths=navigation.speedOverGround,environment.wind.speedApparent&convertUnits=true&convertTimesToLocal=true&timezone=America/New_York" ``` **Common IANA Timezone IDs:** - `America/New_York` - Eastern Time (US) - `America/Chicago` - Central Time (US) - `America/Denver` - Mountain Time (US) - `America/Los_Angeles` - Pacific Time (US) - `Europe/London` - UK - `Europe/Paris` - Central European Time - `Asia/Tokyo` - Japan - `Pacific/Auckland` - New Zealand - `Australia/Sydney` - Australian Eastern Time #### Duration Formats - `30s` - 30 seconds - `15m` - 15 minutes - `2h` - 2 hours - `1d` - 1 day ### Timezone Handling (NEW) **Local time conversion (default behavior):** ```bash # 8:00 AM local time → automatically converted to UTC curl "http://localhost:3000/signalk/v1/history/values?context=vessels.self&start=2025-08-13T08:00:00&duration=1h&paths=navigation.position" ``` **UTC time mode:** ```bash # 8:00 AM UTC (not converted) curl "http://localhost:3000/signalk/v1/history/values?context=vessels.self&start=2025-08-13T08:00:00&duration=1h&paths=navigation.position&useUTC=true" ``` **Explicit timezone (always respected):** ```bash # Explicit UTC timezone curl "http://localhost:3000/signalk/v1/history/values?context=vessels.self&start=2025-08-13T08:00:00Z&duration=1h&paths=navigation.position" # Explicit timezone offset curl "http://localhost:3000/signalk/v1/history/values?context=vessels.self&start=2025-08-13T08:00:00-04:00&duration=1h&paths=navigation.position" ``` **Timezone behavior:** - **Default (`useUTC=false`)**: Datetime strings without timezone info are treated as local time and automatically converted to UTC - **UTC mode (`useUTC=true`)**: Datetime strings without timezone info are treated as UTC time - **Explicit timezone**: Strings with `Z`, `+HH:MM`, or `-HH:MM` are always parsed as-is regardless of `useUTC` setting - **`start=now`**: Always uses current UTC time regardless of `useUTC` setting **Get available contexts:** ```bash curl "http://localhost:3000/signalk/v1/history/contexts" ``` ### Time Alignment and Bucketing The History API automatically aligns data from different paths using time bucketing to solve the common problem of misaligned timestamps. This enables: - **Plotting**: Data points align properly on charts - **Correlation**: Compare values from different sensors at the same time - **Export**: Clean, aligned datasets for analysis **Key Features:** - **Smart Type Handling**: Automatically handles numeric values (wind speed) and JSON objects (position) - **Robust Aggregation**: Uses proper SQL type casting to prevent type errors - **Configurable Resolution**: Time bucket size in milliseconds (default: auto-calculated based on time range) - **Multiple Aggregation Methods**: `average` for numeric data, `first` for complex objects **Parameters:** - `resolution` - Time bucket size in milliseconds (default: auto-calculated) - **Aggregation methods**: `average`, `min`, `max`, `first`, `last`, `mid`, `middle_index` **Aggregation Methods:** - **`average`** - Average value in time bucket (default for numeric data) - **`min`** - Minimum value in time bucket - **`max`** - Maximum value in time bucket - **`first`** - First value in time bucket (default for objects) - **`last`** - Last value in time bucket - **`mid`** - Median value (average of middle values for even counts) - **`middle_index`** - Middle value by index (first of two middle values for even counts) **When to Use Each Method:** - **Numeric data** (wind speed, voltage, etc.): Use `average`, `min`, `max` for statistics - **Position data**: Use `first`, `last`, `middle_index` for specific readings - **String/object data**: Avoid `mid` (unpredictable), prefer `first`, `last`, `middle_index` - **Multiple stats**: Query same path with different methods (e.g., `wind:average,wind:max`) ### Response Format The History API returns time-aligned data in standard SignalK format. #### Default Response (without moving averages) ```json { "context": "vessels.self", "range": { "from": "2025-01-01T00:00:00Z", "to": "2025-01-01T06:00:00Z" }, "values": [ { "path": "environment.wind.speedApparent", "method": "average" }, { "path": "navigation.position", "method": "first" } ], "data": [ ["2025-01-01T00:00:00Z", 12.5, {"latitude": 37.7749, "longitude": -122.4194}], ["2025-01-01T00:01:00Z", 13.2, {"latitude": 37.7750, "longitude": -122.4195}], ["2025-01-01T00:02:00Z", 11.8, {"latitude": 37.7751, "longitude": -122.4196}] ] } ``` #### With Moving Averages (includeMovingAverages=true) ```json { "context": "vessels.self", "range": { "from": "2025-01-01T00:00:00Z", "to": "2025-01-01T06:00:00Z" }, "values": [ { "path": "environment.wind.speedApparent", "method": "average" }, { "path": "environment.wind.speedApparent.ema", "method": "ema" }, { "path": "environment.wind.speedApparent.sma", "method": "sma" }, { "path": "navigation.position", "method": "first" } ], "data": [ ["2025-01-01T00:00:00Z", 12.5, 12.5, 12.5, {"latitude": 37.7749, "longitude": -122.4194}], ["2025-01-01T00:01:00Z", 13.2, 12.64, 12.85, {"latitude": 37.7750, "longitude": -122.4195}], ["2025-01-01T00:02:00Z", 11.8, 12.45, 12.5, {"latitude": 37.7751, "longitude": -122.4196}] ] } ``` **Notes**: - Each data array element is `[timestamp, value1, value2, ...]` corresponding to the paths in the `values` array - Moving averages (EMA/SMA) are **opt-in** - add `includeMovingAverages=true` to include them - EMA/SMA are only calculated for numeric values; non-numeric values (objects, strings) show `null` for their EMA/SMA columns - Without `includeMovingAverages`, response size is ~66% smaller ## Claude AI Analysis The plugin integrates Claude AI to provide intelligent analysis of maritime data, offering insights that would be difficult to extract through traditional querying methods. ### Advanced Charting and Visualization Claude AI can generate interactive charts and visualizations directly from your data using Plotly.js specifications. Charts are automatically embedded in analysis responses when analysis would benefit from visualization. **Supported Chart Types:** - **Line Charts**: Time series trends for navigation, environmental, and performance data - **Bar Charts**: Categorical analysis and frequency distributions - **Scatter Plots**: Correlation analysis between different parameters - **Wind Roses/Radar Charts**: Professional wind direction and speed frequency analysis - **Multiple Series Charts**: Compare multiple data streams on the same chart - **Polar Charts**: Wind patterns, compass headings, and directional data **Marine-Specific Chart Features:** - **Wind Analysis**: Automated wind rose generation with Beaufort scale categories - **Navigation Plots**: Course over ground, speed trends, and position tracking - **Environmental Monitoring**: Temperature, pressure, and weather pattern visualization - **Performance Analysis**: Fuel efficiency, battery usage, and system performance charts - **Multi-Vessel Comparisons**: Side-by-side analysis of multiple vessels **Chart Data Integrity:** - All chart data is sourced directly from database queries - no fabricated or estimated data - Charts display exact data points from query results with full traceability - Automatic validation ensures chart data matches query output - Time-aligned data from History API ensures accurate multi-parameter visualization **Example Chart Generation:** When you ask Claude to "analyze wind patterns over the last 48 hours", it will: 1. Query your wind direction and speed data 2. Generate a wind rose chart showing frequency by compass direction 3. Color-code by wind speed categories (calm, light breeze, strong breeze, etc.) 4. Display the chart as interactive Plotly.js visualization in the web interface Charts are automatically included when analysis benefits from visualization, or you can explicitly request specific chart types like "create a line chart" or "show me a wind rose". ### PLANNED Analysis Templates NB: NOT YET IMPLEMENTED EXAMPLES OF POSSIBLE Pre-built analysis templates provide ready-to-use analysis for common maritime operations: #### Navigation & Routing Templates - **Navigation Summary**: Comprehensive analysis of navigation patterns and route efficiency - **Route Optimization**: Identify opportunities to optimize routes for efficiency and safety - **Anchoring Analysis**: Analyze anchoring patterns, duration, and safety considerations #### Weather & Environment Templates - **Weather Impact Analysis**: Analyze how weather conditions affect vessel performance - **Wind Pattern Analysis**: Detailed wind analysis for sailing optimization #### Electrical System Templates - **Battery Health Assessment**: Comprehensive battery performance and charging pattern analysis - **Power Consumption Analysis**: Analyze electrical power usage patterns and efficiency #### Safety & Monitoring Templates - **Safety Anomaly Detection**: Detect unusual patterns that might indicate safety concerns - **Equipment Health Monitoring**: Monitor equipment performance and predict maintenance needs #### Performance & Efficiency Templates - **Fuel Efficiency Analysis**: Analyze fuel consumption patterns and identify efficiency opportunities - **Overall Performance Trends**: Comprehensive vessel performance analysis over time ### Using Claude AI Analysis #### Via Web Interface 1. Navigate to the plugin's web interface 2. Go to the **🧠 AI Analysis** tab 3. Select a data path to analyze 4. Choose an analysis template or create custom analysis 5. Configure time range and analysis parameters 6. Click **Analyze Data** to generate insights #### Via API **Test Claude Connection:** ```bash curl -X POST http://localhost:3000/plugins/signalk-parquet/api/analyze/test-connection ``` **Get Available Templates:** ```bash curl http://localhost:3000/plugins/signalk-parquet/api/analyze/templates ``` **Custom Analysis:** ```bash curl -X POST http://localhost:3000/plugins/signalk-parquet/api/analyze \ -H "Content-Type: application/json" \ -d '{ "dataPath": "environment.wind.speedTrue,navigation.speedOverGround", "analysisType": "custom", "customPrompt": "Analyze the relationship between wind speed and vessel speed. Identify optimal wind conditions for best performance.", "timeRange": { "start": "2025-01-01T00:00:00Z", "end": "2025-01-07T00:00:00Z" }, "aggregationMethod": "average", "resolution": "3600000" }' ``` ### Analysis Response Format Claude AI analysis returns structured insights: ```json { "id": "analysis_1234567890_abcdef123", "analysis": "Main analysis text with detailed insights", "insights": [ "Key insight 1", "Key insight 2", "Key insight 3" ], "recommendations": [ "Actionable recommendation 1", "Actionable recommendation 2" ], "anomalies": [ { "timestamp": "2025-01-01T12:00:00Z", "value": 25.5, "expectedRange": {"min": 10.0, "max": 20.0}, "severity": "medium", "description": "Wind speed higher than normal range", "confidence": 0.87 } ], "confidence": 0.92, "dataQuality": "High quality data with 98% completeness", "timestamp": "2025-01-01T15:30:00Z", "metadata": { "dataPath": "environment.wind.speedTrue", "analysisType": "summary", "recordCount": 1440, "timeRange": { "start": "2025-01-01T00:00:00Z", "end": "2025-01-02T00:00:00Z" } } } ``` ### Analysis History All Claude AI analyses are automatically saved and can be retrieved: **Get Analysis History:** ```bash curl http://localhost:3000/plugins/signalk-parquet/api/analyze/history?limit=10 ``` History files are stored in: `data/analysis-history/analysis_*.json` ### Best Practices 1. **Data Quality**: Ensure good data coverage for more reliable analysis 2. **Time Ranges**: Use appropriate time ranges - longer for trends, shorter for anomalies 3. **Path Selection**: Combine related paths for correlation analysis 4. **Template Usage**: Start with templates then customize prompts as needed 5. **API Limits**: Be mindful of Anthropic API token limits and costs 6. **Model Selection**: Use Opus for complex analysis, Sonnet for general use, Haiku for quick insights ### Troubleshooting Claude AI **Common Issues:** - **"Claude not enabled"**: Check plugin configuration and enable Claude integration - **"API key missing"**: Add valid Anthropic API key in plugin settings - **"Analysis timeout"**: Reduce data size or use faster model (Haiku) - **"Token limit exceeded"**: Reduce time range or use data sampling **Debug Claude Integration:** ```bash # Test API connection curl -X POST http://localhost:3000/plugins/signalk-parquet/api/analyze/test-connection # Check plugin logs for Claude-specific messages journalctl -u signalk -f | grep -i claude ``` ## Moving Averages (EMA & SMA) The plugin calculates **Exponential Moving Average (EMA)** and **Simple Moving Average (SMA)** for numeric values when explicitly requested via the `includeMovingAverages` parameter, providing enhanced trend analysis capabilities. ### How to Enable **History API:** ```bash # Add includeMovingAverages=true to any query curl "http://localhost:3000/signalk/v1/history/values?duration=1h&paths=environment.wind.speedApparent&includeMovingAverages=true" ``` **Default Behavior (v0.5.6+):** - Moving averages are **opt-in** - not included by default - Reduces response size by ~66% when not needed - Better API compliance with SignalK specification **Legacy Behavior (pre-v0.5.6):** - Moving averages were automatically included for all queries - To maintain old behavior, add `includeMovingAverages=true` to all requests ### Calculation Details #### Exponential Moving Average (EMA) - **Period**: ~10 equivalent (α = 0.2) - **Formula**: `EMA = α × currentValue + (1 - α) × previousEMA` - **Characteristic**: Responds faster to recent changes, emphasizes recent data - **Use Case**: Trend detection, rapid response to data changes #### Simple Moving Average (SMA) - **Period**: 10 data points - **Formula**: Average of the last 10 values - **Characteristic**: Smooths out fluctuations, equal weight to all values in window - **Use Case**: Noise reduction, general trend analysis ### Data Flow & Continuity ```javascript // Initial Data Load (isIncremental: false) Point 1: Value=5.0, EMA=5.0, SMA=5.0 Point 2: Value=6.0, EMA=5.2, SMA=5.5 Point 3: Value=4.0, EMA=5.0, SMA=5.0 // Incremental Updates (isIncremental: true) Point 4: Value=7.0, EMA=5.4, SMA=5.5 // Continues from previous EMA Point 5: Value=5.5, EMA=5.42, SMA=5.5 // Rolling 10-point SMA window ``` ### Key Features - 🎛️ **Opt-In**: Add `includeMovingAverages=true` to enable (v0.5.6+) - ✅ **Memory Efficient**: SMA maintains rolling 10-point window - ✅ **Non-Numeric Handling**: Non-numeric values (strings, objects) show `null` for EMA/S