signalk-parquet
Version:
SignalK plugin and webapp that archives SK data to Parquet files with a regimen control system, advanced querying, Claude integrated AI analysis, spatial capabilities, and REST API.
1,225 lines (996 loc) ⢠73.5 kB
Markdown
# <img src="public/parquet.png" alt="SignalK Parquet Data Store" width="72" height="72" style="vertical-align: middle; margin-right: 20px;"> SignalK Parquet Data Store
A comprehensive SignalK plugin and webapp that saves SignalK data directly to Parquet files with manual and automated regimen-based archiving and advanced querying features, including a REST API built on the SignalK History API, Claude AI history data analysis, and spatial geographic analysis capabilities.
## Features
### Core Data Management
- **Smart Data Types**: Intelligent Parquet schema detection preserves native data types (DOUBLE, BOOLEAN) instead of forcing everything to strings
- **Multiple File Formats**: Support for Parquet, JSON, and CSV output formats (querying in parquet only)
- **Daily Consolidation**: Automatic daily file consolidation with S3 upload capabilities
- **Near Real-time Buffering**: Efficient data
buffering with configurable thresholds
### Data Validation & Schema Repair
- **NEW Schema Validation**: Comprehensive validation of Parquet file schemas against SignalK metadata standards
- **NEW Automated Repair**: One-click repair of schema violations with proper data type conversion
- **NEW Type Correction**: Automatic conversion of incorrectly stored data types (e.g., numeric strings ā DOUBLE, boolean strings ā BOOLEAN)
- **NEW Metadata Integration**: Uses SignalK metadata (units, types) to determine correct data types for marine measurements
- **NEW Safe Operations**: Creates backups before repair and quarantines corrupted files for safety
- *NEW *Progress Tracking**: Real-time progress monitoring with cancellation support for large datasets
#### Benefits of Proper Data Types
Using correct data types in Parquet files provides significant advantages:
- **Storage Efficiency**: Numeric data stored as DOUBLE uses ~50% less space than string representations
- **Query Performance**: Native numeric operations are 5-10x faster than string parsing during analysis
- **Data Integrity**: Type validation prevents data corruption and ensures consistent analysis results
- **Analytics Compatibility**: Proper types enable advanced statistical analysis and machine learning applications
- **Compression**: Parquet's columnar compression works optimally with correctly typed data
#### Validation Process
The validation system checks each Parquet file for:
- **Field Type Consistency**: Ensures numeric marine data (position, speed, depth) is stored as DOUBLE
- **Boolean Representation**: Validates true/false values are stored as BOOLEAN, not strings
- **Metadata Alignment**: Compares file schemas against SignalK metadata for units like meters, volts, amperes
- **Schema Standards**: Enforces data best practices for long-term data integrity
### Advanced Querying
- **SignalK History API Compliance**: Full compliance with SignalK History API specifications
- **Standard Time Parameters**: All 5 standard query patterns supported
- **Time-Filtered Discovery**: Paths and contexts filtered by time range
- **Optional Analytics**: Moving averages (EMA/SMA) available on demand
- **š NEW: Automatic Unit Conversion**: Optional integration with `signalk-units-preference` plugin
- Server-side conversion to user's preferred units (knots, km/h, °F, °C, etc.)
- Add `?convertUnits=true` to any history query
- Respects all unit preferences configured in units-preference plugin
- Configurable cache (1-60 minutes) balances performance vs. responsiveness
- Conversion metadata included in response
- **š NEW: Timezone Conversion**: Convert UTC timestamps to local or specified timezone
- Add `?convertTimesToLocal=true` to convert timestamps to local time
- Optional `&timezone=America/New_York` for custom IANA timezone
- Automatic daylight saving time handling
- Clean ISO 8601 format with offset (e.g., `2025-10-20T12:34:04-04:00`)
- **Flexible Time Querying**: Multiple ways to specify time ranges
- Query from now, from specific times, or between time ranges
- Duration-based windows (1h, 30m, 2d) for easy relative queries
- Forward and backward time querying support
- **Time Alignment**: Automatic alignment of data from different sensors using time bucketing
- **DuckDB Integration**: Direct SQL querying of Parquet files with type-safe operations
- **š Spatial Analysis**: Advanced geographic analysis with DuckDB spatial extension
- **Track Analysis**: Calculate vessel tracks, distances, and movement patterns
- **Proximity Detection**: Multi-vessel distance calculations and collision risk analysis
- **Geographic Visualization**: Generate movement boundaries, centroids, and spatial statistics
- **Route Planning**: Historical track analysis for route optimization and performance analysis
### Management & Control
- **Command Management**: Register, execute, and manage SignalK commands with automatic path configuration
- **Regimen-Based Data Collection**: Control data collection with command-based regimens
- **Multi-Vessel Support**: Wildcard vessel contexts (`vessels.*`) with MMSI-based exclusion filtering
- **Source Filtering**: Filter data by SignalK source labels (bypasses server arbitration for raw data access)
- **Comprehensive REST API**: Full programmatic control of queries and configuration
### User Interface & Integration
- **Responsive Web Interface**: Complete web-based management interface
- **S3 Integration**: Upload files to Amazon S3 with configurable timing and conflict resolution
- **Context Support**: Support for multiple vessel contexts with exclusion controls
### Regimen System (Advanced)
- **Operational Context Tracking**: Define regimens for operational states (mooring, anchoring, racing, passage-making)
- **Command-Based Episodes**: Track state transitions using SignalK commands as regimen triggers
- **Keyword Mapping**: Associate keywords with commands for intelligent Claude AI context matching
- **Episode Boundary Detection**: Sophisticated SQL-based detection of operational periods using CTEs and window functions
- **Contextual Data Collection**: Link SignalK paths to regimens for targeted data analysis during specific operations
- **Web Interface Management**: Create, edit, and manage regimens and command keywords through the web UI
### NEW Threshold Automation
- **NEW Per-Command Conditions**: Each regimen/command can define one or more thresholds that watch a single SignalK path.
- **NEW True-Only Actions**: On every path update the condition is evaluated; when it is true the command is set to the threshold's `activateOnMatch` state (ON/OFF). False evaluations leave the command untouched, so use a second threshold if you want a different level to switch it back.
- **NEW Stable Triggers**: Optional hysteresis (seconds) suppresses re-firing while the condition remains true, preventing rapid toggling in noisy data.
- **NEW Multiple Thresholds Per Path**: Unique monitor keys allow several thresholds to observe the same SignalK path without cancelling each other.
- **NEW Unit Handling**: Threshold values must match the live SignalK units (e.g., fractional 0ā1 SoC values). Angular thresholds are entered in degrees in the UI and stored as radians automatically.
- **NEW Automation State Machine**: When enabling automation, command is set to OFF then all thresholds are immediately evaluated. When disabling automation, threshold monitoring stops and command state remains unchanged. Default state is hardcoded to OFF on server side.
### Claude AI Integration
- **AI-Powered Analysis**: Advanced maritime data analysis using Claude AI models (Opus 4, Sonnet 4)
- **Regimen-Based Analysis**: Context-aware episode detection for operational states (mooring, anchoring, sailing)
- **Command Integration**: Keyword-based regimen matching with customizable command configurations
- **Episode Detection**: Sophisticated boundary detection for operational transitions )
- **Multi-Vessel Support**: Real-time data access from self vessel and other vessels via SignalK
- **Conversation Continuity**: Follow-up questions with preserved context and specialized tools
- **Timezone Intelligence**: Automatic UTC-to-local time conversion based on system timezone
- **Custom Analysis**: Create custom analysis prompts for specific operational needs
## Requirements
### Core Requirements
- SignalK Server v1.x or v2.x
- Node.js 18+ (included with SignalK)
### Optional Plugin Integration
- **signalk-units-preference** (v0.7.0+): Required for automatic unit conversion feature
- Install from: https://github.com/motamman/signalk-units-preference
- Provides server-side unit conversion based on user preferences
- The history API will work without this plugin, but `convertUnits=true` will have no effect
## Installation
### Install from GitHub
```bash
# Navigate to folder
cd ~/.signalk/node_modules/
# Install from npm (after publishing)
npm install signalk-parquet
# Or install from GitHub
npm install motamman/signalk-parquet
cd ~/.signalk/node_modules/signalk-parquet
npm run build
# Restart SignalK
sudo systemctl restart signalk
```
## ā ļø IMPORTANT IF UPGRADING FROM 0.5.0-beta.3: Consolidation Bug Fix
**THIS FIXES A RECURSIVE BUG THAT WAS CREATING NESTED PROCESSED DIRECTORIES AND REPEATEDLY PROCESSING THE SAME FILES. THIS SHOULD FIX THAT PROBLEM BUT ANY `processed` FOLDERS NESTED INSIDE A `processed` FOLDER SHOULD BE MANUALLY DELETED.**
### Cleaning Up Nested Processed Directories
No action is likely needed if upgrading from 0.5.0-beta.4 or better. If you're upgrading from a previous version, you may have nested processed directories that need cleanup:
```bash
# Check for nested processed directories
find data -name "*processed*" -type d | head -20
# See the deepest nesting levels
find data -name "*processed*" -type d | awk -F'/' '{print NF-1, $0}' | sort -nr | head -5
# Count files in nested processed directories
find data -path "*/processed/processed/*" -type f | wc -l
# Remove ALL nested processed directories (RECOMMENDED)
find data -name "processed" -type d -exec rm -rf {} +
# Verify cleanup completed
find data -path "*/processed/processed/*" -type f | wc -l # Should show 0
```
**Note**: The processed directories only contain files that were moved during consolidation - removing them does not delete your original data.
### Development Setup
```bash
# Clone or copy the signalk-parquet directory
cd signalk-parquet
# Install dependencies
npm install
# Build the TypeScript code
npm run build
# Copy to SignalK plugins directory
cp -r . ~/.signalk/node_modules/signalk-parquet/
# Restart SignalK
sudo systemctl restart signalk
```
### Production Build
```bash
# Build for production
npm run build
# The compiled JavaScript will be in the dist/ directory
```
## Configuration
### Plugin Configuration
Navigate to **SignalK Admin ā Server ā Plugin Config ā SignalK Parquet Data Store**
Configure basic plugin settings (path configuration is managed separately in the web interface):
| Setting | Description | Default |
|---------|-------------|---------|
| **Buffer Size** | Number of records to buffer before writing | 1000 |
| **Save Interval** | How often to save buffered data (seconds) | 30 |
| **Output Directory** | Directory to save data files | SignalK data directory |
| **Filename Prefix** | Prefix for generated filenames | `signalk_data` |
| **File Format** | Output format (parquet, json, csv) | `parquet` |
| **Retention Days** | Days to keep processed files | 7 |
| **Unit Conversion Cache Duration** š | How long to cache unit conversions before reloading (minutes) | 5 |
> **Note**: The Unit Conversion Cache Duration setting controls how quickly changes to unit preferences (in the signalk-units-preference plugin) are reflected in the history API. Lower values (1-2 minutes) reflect changes faster but use more resources. Higher values (30-60 minutes) reduce overhead but take longer to reflect changes. The default of 5 minutes provides a good balance for most users.
### S3 Upload Configuration
Configure S3 upload settings in the plugin configuration:
| Setting | Description | Default |
|---------|-------------|---------|
| **Enable S3 Upload** | Enable uploading to Amazon S3 | `false` |
| **Upload Timing** | When to upload (realtime/consolidation) | `consolidation` |
| **S3 Bucket** | Name of S3 bucket | - |
| **AWS Region** | AWS region for S3 bucket | `us-east-1` |
| **Key Prefix** | S3 object key prefix | - |
| **Access Key ID** | AWS credentials (optional) | - |
| **Secret Access Key** | AWS credentials (optional) | - |
| **Delete After Upload** | Delete local files after upload | `false` |
### Claude AI Configuration
Configure Claude AI integration in the plugin configuration for advanced data analysis:
| Setting | Description | Default |
|---------|-------------|---------|
| **Enable Claude Integration** | Enable AI-powered data analysis | `false` |
| **API Key** | Anthropic Claude API key (required) | - |
| **Model** | Claude model to use for analysis | `claude-3-7-sonnet-20250219` |
| **Max Tokens** | Maximum tokens for AI responses | `4000` |
| **Temperature** | AI creativity level (0-1) | `0.3` |
#### Supported Claude Models
| Model | Description | Use Case |
|-------|-------------|----------|
| `claude-opus-4-1-20250805` | Latest Opus model - highest intelligence | Complex analysis, detailed insights |
| `claude-opus-4-20250514` | Opus model - very high intelligence | Advanced analysis |
| `claude-sonnet-4-20250514` | Sonnet model - balanced performance | **Recommended default** |
#### Getting a Claude API Key
1. Visit [Anthropic Console](https://console.anthropic.com/)
2. Create an account or sign in
3. Navigate to **API Keys** section
4. Generate a new API key
5. Copy the key and paste it in the plugin configuration
**Note**: Claude AI analysis requires an active Anthropic API subscription. Usage is billed based on tokens consumed during analysis.
## Path Configuration
**Important**: Path configuration is managed exclusively through the web interface, not in the SignalK admin interface. This provides a more intuitive interface for managing data collection paths.
### Accessing Path Configuration
1. Navigate to: `http://localhost:3000/plugins/signalk-parquet`
2. Click the **āļø Path Configuration** tab
### Adding Data Paths
Use the web interface to configure which SignalK paths to collect:
1. Click **ā Add New Path**
2. Configure the path settings:
- **SignalK Path**: The SignalK data path (e.g., `navigation.position`)
- **Always Enabled**: Collect data regardless of regimen state
- **Regimen Control**: Command name that controls collection
- **Source Filter**: Only collect from specific sources
- **Context**: SignalK context (`vessels.self`, `vessels.*`, or specific vessel)
- **Exclude MMSI**: For `vessels.*` context, exclude specific MMSI numbers
3. Click **ā
Add Path**
### Managing Existing Paths
- **Edit Path**: Click āļø Edit button to modify path settings
- **Delete Path**: Click šļø Remove button to delete a path
- **Refresh**: Click š Refresh Paths to reload configuration
- **Show/Hide Commands**: Toggle button to show/hide command paths in the table
### Command Management
The plugin streamlines command management with automatic path configuration:
1. **Register Command**: Commands are automatically registered with enabled path configurations
2. **Start Command**: Click **Start** button to activate a command regimen
3. **Stop Command**: Click **Stop** button to deactivate a command regimen
4. **Remove Command**: Click **Remove** button to delete a command and its path configuration
This eliminates the previous 3-step process of registering commands, adding paths, and enabling them separately.
### Path Configuration Storage
Path configurations are stored separately from plugin configuration in:
```
~/.signalk/signalk-parquet/webapp-config.json
```
This allows for:
- Independent management of path configurations
- Better separation of concerns
- Easier backup and migration of path settings
- More intuitive web-based configuration interface
### Regimen-Based Control
Regimens allow you to control data collection based on SignalK commands:
**Example**: Weather data collection with source filtering
```json
{
"path": "environment.wind.angleApparent",
"enabled": false,
"regimen": "captureWeather",
"source": "mqtt-weatherflow-udp",
"context": "vessels.self"
}
```
**Note**: Source filtering accesses raw data before SignalK server arbitration, allowing collection of data from specific sources that might otherwise be filtered out.
**Multi-Vessel Example**: Collect navigation data from all vessels except specific MMSI numbers
```json
{
"path": "navigation.position",
"enabled": true,
"context": "vessels.*",
"excludeMMSI": ["123456789", "987654321"]
}
```
**Command Path**: Command paths are automatically created when registering commands
```json
{
"path": "commands.captureWeather",
"enabled": true,
"context": "vessels.self"
}
```
This path will only collect data when the command `commands.captureWeather` is active.
## TypeScript Architecture
### Type Safety
The plugin uses comprehensive TypeScript interfaces:
```typescript
interface PluginConfig {
bufferSize: number;
saveIntervalSeconds: number;
outputDirectory: string;
filenamePrefix: string;
fileFormat: 'json' | 'csv' | 'parquet';
paths: PathConfig[];
s3Upload: S3UploadConfig;
}
interface PathConfig {
path: string;
enabled: boolean;
regimen?: string;
source?: string;
context: string;
excludeMMSI?: string[];
}
interface DataRecord {
received_timestamp: string;
signalk_timestamp: string;
context: string;
path: string;
value: any;
source_label?: string;
meta?: string;
}
```
### Plugin State Management
The plugin maintains typed state:
```typescript
interface PluginState {
unsubscribes: Array<() => void>;
dataBuffers: Map<string, DataRecord[]>;
activeRegimens: Set<string>;
subscribedPaths: Set<string>;
parquetWriter?: ParquetWriter;
s3Client?: any;
currentConfig?: PluginConfig;
}
```
### Express Router Types
API routes are fully typed:
```typescript
router.get('/api/paths',
(_: TypedRequest, res: TypedResponse<PathsApiResponse>) => {
// Typed request/response handling
}
);
```
## Data Output Structure
### File Organization
```
output_directory/
āāā vessels/
ā āāā self/
ā āāā navigation/
ā ā āāā position/
ā ā ā āāā signalk_data_20250716T120000.parquet
ā ā ā āāā signalk_data_20250716_consolidated.parquet
ā ā āāā speedOverGround/
ā āāā environment/
ā āāā wind/
ā āāā angleApparent/
āāā processed/
āāā [moved files after consolidation]
```
### Data Schema
Each record contains:
| Field | Type | Description |
|-------|------|-------------|
| `received_timestamp` | string | When the plugin received the data |
| `signalk_timestamp` | string | Original SignalK timestamp |
| `context` | string | SignalK context (e.g., `vessels.self`) |
| `path` | string | SignalK path |
| `value` | DOUBLE/BOOLEAN/INT64/UTF8 | **Smart typed values** - numbers stored as DOUBLE, booleans as BOOLEAN, etc. |
| `value_json` | string | JSON representation for complex values |
| `source` | string | Complete source information |
| `source_label` | string | Source label |
| `source_type` | string | Source type |
| `source_pgn` | number | PGN number (if applicable) |
| `meta` | string | Metadata information |
#### Smart Data Types
The plugin now intelligently detects and preserves native data types:
- **Numbers**: Stored as `DOUBLE` (floating point) or `INT64` (integers)
- **Booleans**: Stored as `BOOLEAN`
- **Strings**: Stored as `UTF8`
- **Objects**: Serialized to JSON and stored as `UTF8`
- **Mixed Types**: Falls back to `UTF8` when a path contains multiple data types
This provides better compression, faster queries, and proper type safety for data analysis.
## Web Interface
### Features
- **Path Configuration**: Manage data collection paths with multi-vessel support
- **Command Management**: Streamlined command registration and control
- **Data Exploration**: Browse available data paths
- **SQL Queries**: Execute DuckDB queries against Parquet files
- **History API**: Query historical data using SignalK History API endpoints
- **S3 Status**: Test S3 connectivity and configuration
- **Responsive Design**: Works on desktop and mobile
- **MMSI Filtering**: Exclude specific vessels from wildcard contexts
### API Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/paths` | GET | List available data paths |
| `/api/files/:path` | GET | List files for a path |
| `/api/sample/:path` | GET | Sample data from a path |
| `/api/query` | POST | Execute SQL query |
| `/api/config/paths` | GET/POST/PUT/DELETE | Manage path configurations |
| `/api/test-s3` | POST | Test S3 connection |
| `/api/health` | GET | Health check |
| **Claude AI Analysis API** | | |
| `/api/analyze` | POST | Perform AI analysis on data |
| `/api/analyze/templates` | GET | Get available analysis templates |
| `/api/analyze/followup` | POST | Follow-up analysis questions |
| `/api/analyze/history` | GET | Get analysis history |
| `/api/analyze/test-connection` | POST | Test Claude API connection |
| **SignalK History API** | | |
| `/signalk/v1/history/values` | GET | SignalK History API - Get historical values |
| `/signalk/v1/history/contexts` | GET | SignalK History API - Get available contexts |
| `/signalk/v1/history/paths` | GET | SignalK History API - Get available paths |
## DuckDB Integration
### Query Examples
#### Basic Queries
```sql
-- Get latest 10 records from navigation position
SELECT * FROM read_parquet('/path/to/navigation/position/*.parquet', union_by_name=true)
ORDER BY received_timestamp DESC LIMIT 10;
-- Count total records
SELECT COUNT(*) FROM read_parquet('/path/to/navigation/position/*.parquet', union_by_name=true);
-- Filter by source
SELECT * FROM read_parquet('/path/to/environment/wind/*.parquet', union_by_name=true)
WHERE source_label = 'mqtt-weatherflow-udp'
ORDER BY received_timestamp DESC LIMIT 100;
-- Aggregate by hour
SELECT
DATE_TRUNC('hour', received_timestamp::timestamp) as hour,
AVG(value::double) as avg_value,
COUNT(*) as record_count
FROM read_parquet('/path/to/data/*.parquet', union_by_name=true)
GROUP BY hour
ORDER BY hour;
```
#### š Spatial Analysis Queries
```sql
-- Calculate distance traveled over time
WITH ordered_positions AS (
SELECT
signalk_timestamp,
ST_Point(value_longitude, value_latitude) as position,
LAG(ST_Point(value_longitude, value_latitude)) OVER (ORDER BY signalk_timestamp) as prev_position
FROM read_parquet('data/vessels/urn_mrn_imo_mmsi_368396230/navigation/position/*.parquet', union_by_name=true)
WHERE signalk_timestamp >= '2025-09-27T16:00:00Z'
AND signalk_timestamp <= '2025-09-27T23:59:59Z'
AND value_latitude IS NOT NULL AND value_longitude IS NOT NULL
),
distances AS (
SELECT *,
CASE
WHEN prev_position IS NOT NULL
THEN ST_Distance_Sphere(position, prev_position)
ELSE 0
END as distance_meters
FROM ordered_positions
)
SELECT
strftime(date_trunc('hour', signalk_timestamp::TIMESTAMP), '%Y-%m-%dT%H:%M:%SZ') as time_bucket,
AVG(value_latitude) as avg_lat,
AVG(value_longitude) as avg_lon,
ST_AsText(ST_Centroid(ST_Collect(position))) as centroid,
SUM(distance_meters) as total_distance_meters,
COUNT(*) as position_records,
ST_AsText(ST_ConvexHull(ST_Collect(position))) as movement_area
FROM distances
GROUP BY time_bucket
ORDER BY time_bucket;
-- Multi-vessel proximity analysis
SELECT
v1.context as vessel1,
v2.context as vessel2,
ST_Distance_Sphere(
ST_Point(v1.value_longitude, v1.value_latitude),
ST_Point(v2.value_longitude, v2.value_latitude)
) as distance_meters,
v1.signalk_timestamp
FROM read_parquet('data/vessels/*/navigation/position/*.parquet', union_by_name=true) v1
JOIN read_parquet('data/vessels/*/navigation/position/*.parquet', union_by_name=true) v2
ON v1.signalk_timestamp = v2.signalk_timestamp AND v1.context != v2.context
WHERE v1.signalk_timestamp >= '2025-09-27T00:00:00Z'
AND ST_Distance_Sphere(
ST_Point(v1.value_longitude, v1.value_latitude),
ST_Point(v2.value_longitude, v2.value_latitude)
) < 1000 -- Within 1km
ORDER BY distance_meters;
-- Advanced movement analysis with bounding boxes
WITH ordered_positions AS (
SELECT
signalk_timestamp,
ST_Point(value_longitude, value_latitude) as position,
value_latitude,
value_longitude,
LAG(ST_Point(value_longitude, value_latitude)) OVER (ORDER BY signalk_timestamp) as prev_position,
strftime(date_trunc('hour', signalk_timestamp::TIMESTAMP), '%Y-%m-%dT%H:%M:%SZ') as time_bucket
FROM read_parquet('data/vessels/urn_mrn_imo_mmsi_368396230/navigation/position/*.parquet', union_by_name=true)
WHERE signalk_timestamp >= '2025-09-27T16:00:00Z'
AND signalk_timestamp <= '2025-09-27T23:59:59Z'
AND value_latitude IS NOT NULL AND value_longitude IS NOT NULL
),
distances AS (
SELECT *,
CASE
WHEN prev_position IS NOT NULL
THEN ST_Distance_Sphere(position, prev_position)
ELSE 0
END as distance_meters
FROM ordered_positions
)
SELECT
time_bucket,
AVG(value_latitude) as avg_lat,
AVG(value_longitude) as avg_lon,
-- Calculate bounding box manually
MIN(value_latitude) as min_lat,
MAX(value_latitude) as max_lat,
MIN(value_longitude) as min_lon,
MAX(value_longitude) as max_lon,
-- Distance and movement metrics
SUM(distance_meters) as total_distance_meters,
ROUND(SUM(distance_meters) / 1000.0, 2) as total_distance_km,
COUNT(*) as position_records,
-- Movement area approximation using bounding box
(MAX(value_latitude) - MIN(value_latitude)) * 111320 *
(MAX(value_longitude) - MIN(value_longitude)) * 111320 *
COS(RADIANS(AVG(value_latitude))) as approx_area_m2
FROM distances
GROUP BY time_bucket
ORDER BY time_bucket;
```
#### Available Spatial Functions
- `ST_Point(longitude, latitude)` - Create point geometries
- `ST_Distance_Sphere(point1, point2)` - Calculate distances in meters
- `ST_AsText(geometry)` - Convert to Well-Known Text format
- `ST_Centroid(ST_Collect(points))` - Find center of multiple points
- `ST_ConvexHull(ST_Collect(points))` - Create movement boundary polygons
## History API Integration
The plugin provides full SignalK History API compliance, allowing you to query historical data using standard SignalK API endpoints with enhanced performance and filtering capabilities.
### Available Endpoints
| Endpoint | Description | Parameters |
|----------|-------------|------------|
| `/signalk/v1/history/values` | Get historical values for specified paths | **Standard patterns** (see below)<br>**Optional**: `resolution`, `refresh`, `includeMovingAverages`, `useUTC` |
| `/signalk/v1/history/contexts` | Get available vessel contexts for time range | **Time Range**: Any standard pattern (see below)<br>Returns only contexts with data in specified range |
| `/signalk/v1/history/paths` | Get available SignalK paths for time range | **Time Range**: Any standard pattern (see below)<br>Returns only paths with data in specified range |
### Standard Time Range Patterns
The History API supports 5 standard SignalK time query patterns:
| Pattern | Parameters | Description | Example |
|---------|-----------|-------------|---------|
| **1** | `duration` | Query back from now | `?duration=1h` |
| **2** | `from` + `duration` | Query forward from start | `?from=2025-01-01T00:00:00Z&duration=1h` |
| **3** | `to` + `duration` | Query backward to end | `?to=2025-01-01T12:00:00Z&duration=1h` |
| **4** | `from` | From start to now | `?from=2025-01-01T00:00:00Z` |
| **5** | `from` + `to` | Specific range | `?from=2025-01-01T00:00:00Z&to=2025-01-02T00:00:00Z` |
**Legacy Support**: The `start` parameter (used with `duration`) is deprecated but still supported for backward compatibility. A console warning will be shown. Use standard patterns instead.
### Query Parameters
| Parameter | Description | Format | Examples |
|-----------|-------------|---------|----------|
| **Required for `/values`:** | | | |
| `paths` | SignalK paths with optional aggregation | `path:method,path:method` | `navigation.position:first,wind.speed:average` |
| **Time Range:** | Use one of the 5 standard patterns above | | |
| `duration` | Time period | `[number][unit]` | `1h`, `30m`, `15s`, `2d` |
| `from` | Start time (ISO 8601) | ISO datetime | `2025-01-01T00:00:00Z` |
| `to` | End time (ISO 8601) | ISO datetime | `2025-01-01T06:00:00Z` |
| **Optional:** | | | |
| `context` | Vessel context | `vessels.self` or `vessels.<id>` | `vessels.self` (default) |
| `resolution` | Time bucket size in milliseconds | Number | `60000` (1 minute buckets) |
| `refresh` | Enable auto-refresh (pattern 1 only) | `true` or `1` | `refresh=true` |
| `includeMovingAverages` | Include EMA/SMA calculations | `true` or `1` | `includeMovingAverages=true` |
| `useUTC` | Treat datetime inputs as UTC | `true` or `1` | `useUTC=true` |
| `convertUnits` | š Convert to preferred units (requires signalk-units-preference plugin) | `true` or `1` | `convertUnits=true` |
| `convertTimesToLocal` | š Convert timestamps to local/specified timezone | `true` or `1` | `convertTimesToLocal=true` |
| `timezone` | š IANA timezone ID (used with convertTimesToLocal) | IANA timezone | `timezone=America/New_York` |
| **Deprecated:** | | | |
| `start` | ā ļø Use standard patterns instead | `now` or ISO datetime | Deprecated, use `duration` or `from`/`to` |
### Query Examples
#### Pattern 1: Duration Only (Query back from now)
```bash
# Last hour of wind data
curl "http://localhost:3000/signalk/v1/history/values?duration=1h&paths=environment.wind.speedApparent"
# Last 30 minutes with moving averages
curl "http://localhost:3000/signalk/v1/history/values?duration=30m&paths=environment.wind.speedApparent&includeMovingAverages=true"
# Real-time with auto-refresh
curl "http://localhost:3000/signalk/v1/history/values?duration=15m&paths=navigation.position&refresh=true"
```
#### Pattern 2: From + Duration (Query forward)
```bash
# 6 hours forward from specific time
curl "http://localhost:3000/signalk/v1/history/values?from=2025-01-01T00:00:00Z&duration=6h&paths=navigation.position"
```
#### Pattern 3: To + Duration (Query backward)
```bash
# 2 hours backward to specific time
curl "http://localhost:3000/signalk/v1/history/values?to=2025-01-01T12:00:00Z&duration=2h&paths=environment.wind.speedApparent"
```
#### Pattern 4: From Only (From start to now)
```bash
# From specific time until now
curl "http://localhost:3000/signalk/v1/history/values?from=2025-01-01T00:00:00Z&paths=navigation.speedOverGround"
```
#### Pattern 5: From + To (Specific range)
```bash
# Specific 24-hour period
curl "http://localhost:3000/signalk/v1/history/values?from=2025-01-01T00:00:00Z&to=2025-01-02T00:00:00Z&paths=navigation.position"
```
#### Advanced Query Examples
**Multiple paths with time alignment:**
```bash
curl "http://localhost:3000/signalk/v1/history/values?duration=6h&paths=environment.wind.angleApparent,environment.wind.speedApparent,navigation.position&resolution=60000"
```
**Multiple aggregations of same path:**
```bash
curl "http://localhost:3000/signalk/v1/history/values?from=2025-01-01T00:00:00Z&to=2025-01-01T06:00:00Z&paths=environment.wind.speedApparent:average,environment.wind.speedApparent:min,environment.wind.speedApparent:max&resolution=60000"
```
**With moving averages for trend analysis:**
```bash
curl "http://localhost:3000/signalk/v1/history/values?duration=24h&paths=electrical.batteries.512.voltage&includeMovingAverages=true&resolution=300000"
```
**Different temporal samples:**
```bash
curl "http://localhost:3000/signalk/v1/history/values?duration=1h&paths=navigation.position:first,navigation.position:middle_index,navigation.position:last&resolution=60000"
```
#### Context and Path Discovery
**Get contexts with data in last hour:**
```bash
curl "http://localhost:3000/signalk/v1/history/contexts?duration=1h"
```
**Get contexts for specific time range:**
```bash
curl "http://localhost:3000/signalk/v1/history/contexts?from=2025-01-01T00:00:00Z&to=2025-01-07T00:00:00Z"
```
**Get available paths with recent data:**
```bash
curl "http://localhost:3000/signalk/v1/history/paths?duration=24h"
```
**Get all paths (no time filter):**
```bash
curl "http://localhost:3000/signalk/v1/history/paths"
```
#### Unit Conversion (NEW in v0.6.0)
**Convert to user's preferred units:**
```bash
# Speed in knots (if configured in signalk-units-preference)
curl "http://localhost:3000/signalk/v1/history/values?duration=2d&paths=navigation.speedOverGround&convertUnits=true"
# Wind speed in preferred units (knots, km/h, or mph)
curl "http://localhost:3000/signalk/v1/history/values?duration=1h&paths=environment.wind.speedApparent&convertUnits=true"
# Temperature in preferred units (°C or °F)
curl "http://localhost:3000/signalk/v1/history/values?duration=24h&paths=environment.outside.temperature&convertUnits=true"
```
**Response includes conversion metadata:**
```json
{
"values": [{"path": "navigation.speedOverGround", "method": "average"}],
"data": [["2025-10-20T16:12:14Z", 5.2]],
"units": {
"converted": true,
"conversions": [{
"path": "navigation.speedOverGround",
"baseUnit": "m/s",
"targetUnit": "knots",
"symbol": "kn"
}]
}
}
```
#### Timezone Conversion (NEW in v0.6.0)
**Convert to server's local time:**
```bash
curl "http://localhost:3000/signalk/v1/history/values?duration=2d&paths=environment.wind.speedApparent&convertTimesToLocal=true"
```
**Convert to specific timezone:**
```bash
# New York time (Eastern)
curl "http://localhost:3000/signalk/v1/history/values?duration=2d&paths=navigation.position&convertTimesToLocal=true&timezone=America/New_York"
# London time
curl "http://localhost:3000/signalk/v1/history/values?duration=1h&paths=environment.wind.speedApparent&convertTimesToLocal=true&timezone=Europe/London"
# Tokyo time
curl "http://localhost:3000/signalk/v1/history/values?duration=1h&paths=navigation.speedOverGround&convertTimesToLocal=true&timezone=Asia/Tokyo"
```
**Response includes timezone metadata:**
```json
{
"range": {
"from": "2025-10-20T12:12:19-04:00",
"to": "2025-10-20T13:12:19-04:00"
},
"data": [
["2025-10-20T12:12:14-04:00", 5.84],
["2025-10-20T12:12:28-04:00", 5.26]
],
"timezone": {
"converted": true,
"targetTimezone": "America/New_York",
"offset": "-04:00",
"description": "Converted to user-specified timezone: America/New_York (-04:00)"
}
}
```
**Combine both conversions:**
```bash
# Convert values to knots AND timestamps to New York time
curl "http://localhost:3000/signalk/v1/history/values?duration=2d&paths=navigation.speedOverGround,environment.wind.speedApparent&convertUnits=true&convertTimesToLocal=true&timezone=America/New_York"
```
**Common IANA Timezone IDs:**
- `America/New_York` - Eastern Time (US)
- `America/Chicago` - Central Time (US)
- `America/Denver` - Mountain Time (US)
- `America/Los_Angeles` - Pacific Time (US)
- `Europe/London` - UK
- `Europe/Paris` - Central European Time
- `Asia/Tokyo` - Japan
- `Pacific/Auckland` - New Zealand
- `Australia/Sydney` - Australian Eastern Time
#### Duration Formats
- `30s` - 30 seconds
- `15m` - 15 minutes
- `2h` - 2 hours
- `1d` - 1 day
### Timezone Handling (NEW)
**Local time conversion (default behavior):**
```bash
# 8:00 AM local time ā automatically converted to UTC
curl "http://localhost:3000/signalk/v1/history/values?context=vessels.self&start=2025-08-13T08:00:00&duration=1h&paths=navigation.position"
```
**UTC time mode:**
```bash
# 8:00 AM UTC (not converted)
curl "http://localhost:3000/signalk/v1/history/values?context=vessels.self&start=2025-08-13T08:00:00&duration=1h&paths=navigation.position&useUTC=true"
```
**Explicit timezone (always respected):**
```bash
# Explicit UTC timezone
curl "http://localhost:3000/signalk/v1/history/values?context=vessels.self&start=2025-08-13T08:00:00Z&duration=1h&paths=navigation.position"
# Explicit timezone offset
curl "http://localhost:3000/signalk/v1/history/values?context=vessels.self&start=2025-08-13T08:00:00-04:00&duration=1h&paths=navigation.position"
```
**Timezone behavior:**
- **Default (`useUTC=false`)**: Datetime strings without timezone info are treated as local time and automatically converted to UTC
- **UTC mode (`useUTC=true`)**: Datetime strings without timezone info are treated as UTC time
- **Explicit timezone**: Strings with `Z`, `+HH:MM`, or `-HH:MM` are always parsed as-is regardless of `useUTC` setting
- **`start=now`**: Always uses current UTC time regardless of `useUTC` setting
**Get available contexts:**
```bash
curl "http://localhost:3000/signalk/v1/history/contexts"
```
### Time Alignment and Bucketing
The History API automatically aligns data from different paths using time bucketing to solve the common problem of misaligned timestamps. This enables:
- **Plotting**: Data points align properly on charts
- **Correlation**: Compare values from different sensors at the same time
- **Export**: Clean, aligned datasets for analysis
**Key Features:**
- **Smart Type Handling**: Automatically handles numeric values (wind speed) and JSON objects (position)
- **Robust Aggregation**: Uses proper SQL type casting to prevent type errors
- **Configurable Resolution**: Time bucket size in milliseconds (default: auto-calculated based on time range)
- **Multiple Aggregation Methods**: `average` for numeric data, `first` for complex objects
**Parameters:**
- `resolution` - Time bucket size in milliseconds (default: auto-calculated)
- **Aggregation methods**: `average`, `min`, `max`, `first`, `last`, `mid`, `middle_index`
**Aggregation Methods:**
- **`average`** - Average value in time bucket (default for numeric data)
- **`min`** - Minimum value in time bucket
- **`max`** - Maximum value in time bucket
- **`first`** - First value in time bucket (default for objects)
- **`last`** - Last value in time bucket
- **`mid`** - Median value (average of middle values for even counts)
- **`middle_index`** - Middle value by index (first of two middle values for even counts)
**When to Use Each Method:**
- **Numeric data** (wind speed, voltage, etc.): Use `average`, `min`, `max` for statistics
- **Position data**: Use `first`, `last`, `middle_index` for specific readings
- **String/object data**: Avoid `mid` (unpredictable), prefer `first`, `last`, `middle_index`
- **Multiple stats**: Query same path with different methods (e.g., `wind:average,wind:max`)
### Response Format
The History API returns time-aligned data in standard SignalK format.
#### Default Response (without moving averages)
```json
{
"context": "vessels.self",
"range": {
"from": "2025-01-01T00:00:00Z",
"to": "2025-01-01T06:00:00Z"
},
"values": [
{
"path": "environment.wind.speedApparent",
"method": "average"
},
{
"path": "navigation.position",
"method": "first"
}
],
"data": [
["2025-01-01T00:00:00Z", 12.5, {"latitude": 37.7749, "longitude": -122.4194}],
["2025-01-01T00:01:00Z", 13.2, {"latitude": 37.7750, "longitude": -122.4195}],
["2025-01-01T00:02:00Z", 11.8, {"latitude": 37.7751, "longitude": -122.4196}]
]
}
```
#### With Moving Averages (includeMovingAverages=true)
```json
{
"context": "vessels.self",
"range": {
"from": "2025-01-01T00:00:00Z",
"to": "2025-01-01T06:00:00Z"
},
"values": [
{
"path": "environment.wind.speedApparent",
"method": "average"
},
{
"path": "environment.wind.speedApparent.ema",
"method": "ema"
},
{
"path": "environment.wind.speedApparent.sma",
"method": "sma"
},
{
"path": "navigation.position",
"method": "first"
}
],
"data": [
["2025-01-01T00:00:00Z", 12.5, 12.5, 12.5, {"latitude": 37.7749, "longitude": -122.4194}],
["2025-01-01T00:01:00Z", 13.2, 12.64, 12.85, {"latitude": 37.7750, "longitude": -122.4195}],
["2025-01-01T00:02:00Z", 11.8, 12.45, 12.5, {"latitude": 37.7751, "longitude": -122.4196}]
]
}
```
**Notes**:
- Each data array element is `[timestamp, value1, value2, ...]` corresponding to the paths in the `values` array
- Moving averages (EMA/SMA) are **opt-in** - add `includeMovingAverages=true` to include them
- EMA/SMA are only calculated for numeric values; non-numeric values (objects, strings) show `null` for their EMA/SMA columns
- Without `includeMovingAverages`, response size is ~66% smaller
## Claude AI Analysis
The plugin integrates Claude AI to provide intelligent analysis of maritime data, offering insights that would be difficult to extract through traditional querying methods.
### Advanced Charting and Visualization
Claude AI can generate interactive charts and visualizations directly from your data using Plotly.js specifications. Charts are automatically embedded in analysis responses when analysis would benefit from visualization.
**Supported Chart Types:**
- **Line Charts**: Time series trends for navigation, environmental, and performance data
- **Bar Charts**: Categorical analysis and frequency distributions
- **Scatter Plots**: Correlation analysis between different parameters
- **Wind Roses/Radar Charts**: Professional wind direction and speed frequency analysis
- **Multiple Series Charts**: Compare multiple data streams on the same chart
- **Polar Charts**: Wind patterns, compass headings, and directional data
**Marine-Specific Chart Features:**
- **Wind Analysis**: Automated wind rose generation with Beaufort scale categories
- **Navigation Plots**: Course over ground, speed trends, and position tracking
- **Environmental Monitoring**: Temperature, pressure, and weather pattern visualization
- **Performance Analysis**: Fuel efficiency, battery usage, and system performance charts
- **Multi-Vessel Comparisons**: Side-by-side analysis of multiple vessels
**Chart Data Integrity:**
- All chart data is sourced directly from database queries - no fabricated or estimated data
- Charts display exact data points from query results with full traceability
- Automatic validation ensures chart data matches query output
- Time-aligned data from History API ensures accurate multi-parameter visualization
**Example Chart Generation:**
When you ask Claude to "analyze wind patterns over the last 48 hours", it will:
1. Query your wind direction and speed data
2. Generate a wind rose chart showing frequency by compass direction
3. Color-code by wind speed categories (calm, light breeze, strong breeze, etc.)
4. Display the chart as interactive Plotly.js visualization in the web interface
Charts are automatically included when analysis benefits from visualization, or you can explicitly request specific chart types like "create a line chart" or "show me a wind rose".
### PLANNED Analysis Templates NB: NOT YET IMPLEMENTED
EXAMPLES OF POSSIBLE Pre-built analysis templates provide ready-to-use analysis for common maritime operations:
#### Navigation & Routing Templates
- **Navigation Summary**: Comprehensive analysis of navigation patterns and route efficiency
- **Route Optimization**: Identify opportunities to optimize routes for efficiency and safety
- **Anchoring Analysis**: Analyze anchoring patterns, duration, and safety considerations
#### Weather & Environment Templates
- **Weather Impact Analysis**: Analyze how weather conditions affect vessel performance
- **Wind Pattern Analysis**: Detailed wind analysis for sailing optimization
#### Electrical System Templates
- **Battery Health Assessment**: Comprehensive battery performance and charging pattern analysis
- **Power Consumption Analysis**: Analyze electrical power usage patterns and efficiency
#### Safety & Monitoring Templates
- **Safety Anomaly Detection**: Detect unusual patterns that might indicate safety concerns
- **Equipment Health Monitoring**: Monitor equipment performance and predict maintenance needs
#### Performance & Efficiency Templates
- **Fuel Efficiency Analysis**: Analyze fuel consumption patterns and identify efficiency opportunities
- **Overall Performance Trends**: Comprehensive vessel performance analysis over time
### Using Claude AI Analysis
#### Via Web Interface
1. Navigate to the plugin's web interface
2. Go to the **š§ AI Analysis** tab
3. Select a data path to analyze
4. Choose an analysis template or create custom analysis
5. Configure time range and analysis parameters
6. Click **Analyze Data** to generate insights
#### Via API
**Test Claude Connection:**
```bash
curl -X POST http://localhost:3000/plugins/signalk-parquet/api/analyze/test-connection
```
**Get Available Templates:**
```bash
curl http://localhost:3000/plugins/signalk-parquet/api/analyze/templates
```
**Custom Analysis:**
```bash
curl -X POST http://localhost:3000/plugins/signalk-parquet/api/analyze \
-H "Content-Type: application/json" \
-d '{
"dataPath": "environment.wind.speedTrue,navigation.speedOverGround",
"analysisType": "custom",
"customPrompt": "Analyze the relationship between wind speed and vessel speed. Identify optimal wind conditions for best performance.",
"timeRange": {
"start": "2025-01-01T00:00:00Z",
"end": "2025-01-07T00:00:00Z"
},
"aggregationMethod": "average",
"resolution": "3600000"
}'
```
### Analysis Response Format
Claude AI analysis returns structured insights:
```json
{
"id": "analysis_1234567890_abcdef123",
"analysis": "Main analysis text with detailed insights",
"insights": [
"Key insight 1",
"Key insight 2",
"Key insight 3"
],
"recommendations": [
"Actionable recommendation 1",
"Actionable recommendation 2"
],
"anomalies": [
{
"timestamp": "2025-01-01T12:00:00Z",
"value": 25.5,
"expectedRange": {"min": 10.0, "max": 20.0},
"severity": "medium",
"description": "Wind speed higher than normal range",
"confidence": 0.87
}
],
"confidence": 0.92,
"dataQuality": "High quality data with 98% completeness",
"timestamp": "2025-01-01T15:30:00Z",
"metadata": {
"dataPath": "environment.wind.speedTrue",
"analysisType": "summary",
"recordCount": 1440,
"timeRange": {
"start": "2025-01-01T00:00:00Z",
"end": "2025-01-02T00:00:00Z"
}
}
}
```
### Analysis History
All Claude AI analyses are automatically saved and can be retrieved:
**Get Analysis History:**
```bash
curl http://localhost:3000/plugins/signalk-parquet/api/analyze/history?limit=10
```
History files are stored in: `data/analysis-history/analysis_*.json`
### Best Practices
1. **Data Quality**: Ensure good data coverage for more reliable analysis
2. **Time Ranges**: Use appropriate time ranges - longer for trends, shorter for anomalies
3. **Path Selection**: Combine related paths for correlation analysis
4. **Template Usage**: Start with templates then customize prompts as needed
5. **API Limits**: Be mindful of Anthropic API token limits and costs
6. **Model Selection**: Use Opus for complex analysis, Sonnet for general use, Haiku for quick insights
### Troubleshooting Claude AI
**Common Issues:**
- **"Claude not enabled"**: Check plugin configuration and enable Claude integration
- **"API key missing"**: Add valid Anthropic API key in plugin settings
- **"Analysis timeout"**: Reduce data size or use faster model (Haiku)
- **"Token limit exceeded"**: Reduce time range or use data sampling
**Debug Claude Integration:**
```bash
# Test API connection
curl -X POST http://localhost:3000/plugins/signalk-parquet/api/analyze/test-connection
# Check plugin logs for Claude-specific messages
journalctl -u signalk -f | grep -i claude
```
## Moving Averages (EMA & SMA)
The plugin calculates **Exponential Moving Average (EMA)** and **Simple Moving Average (SMA)** for numeric values when explicitly requested via the `includeMovingAverages` parameter, providing enhanced trend analysis capabilities.
### How to Enable
**History API:**
```bash
# Add includeMovingAverages=true to any query
curl "http://localhost:3000/signalk/v1/history/values?duration=1h&paths=environment.wind.speedApparent&includeMovingAverages=true"
```
**Default Behavior (v0.5.6+):**
- Moving averages are **opt-in** - not included by default
- Reduces response size by ~66% when not needed
- Better API compliance with SignalK specification
**Legacy Behavior (pre-v0.5.6):**
- Moving averages were automatically included for all queries
- To maintain old behavior, add `includeMovingAverages=true` to all requests
### Calculation Details
#### Exponential Moving Average (EMA)
- **Period**: ~10 equivalent (α = 0.2)
- **Formula**: `EMA = α à currentValue + (1 - α) à previousEMA`
- **Characteristic**: Responds faster to recent changes, emphasizes recent data
- **Use Case**: Trend detection, rapid response to data changes
#### Simple Moving Average (SMA)
- **Period**: 10 data points
- **Formula**: Average of the last 10 values
- **Characteristic**: Smooths out fluctuations, equal weight to all values in window
- **Use Case**: Noise reduction, general trend analysis
### Data Flow & Continuity
```javascript
// Initial Data Load (isIncremental: false)
Point 1: Value=5.0, EMA=5.0, SMA=5.0
Point 2: Value=6.0, EMA=5.2, SMA=5.5
Point 3: Value=4.0, EMA=5.0, SMA=5.0
// Incremental Updates (isIncremental: true)
Point 4: Value=7.0, EMA=5.4, SMA=5.5 // Continues from previous EMA
Point 5: Value=5.5, EMA=5.42, SMA=5.5 // Rolling 10-point SMA window
```
### Key Features
- šļø **Opt-In**: Add `includeMovingAverages=true` to enable (v0.5.6+)
- ā
**Memory Efficient**: SMA maintains rolling 10-point window
- ā
**Non-Numeric Handling**: Non-numeric values (strings, objects) show `null` for EMA/S