jinaga
Version:
Data management for web and mobile applications.
425 lines (331 loc) • 10.3 kB
Markdown
# Jinaga Graph Serialization Protocol
## Overview
The Jinaga Graph Serialization Protocol is a text-based format for serializing and deserializing directed acyclic graphs (DAGs) of facts. The protocol is designed to efficiently represent immutable facts with their relationships (predecessors) and cryptographic signatures in a streaming format.
## Core Concepts
### Facts
A **fact** is an immutable data structure with:
- **Type**: A string identifier (e.g., "MyApp.Root", "MyApp.Child")
- **Hash**: A cryptographic hash computed from the fact's content
- **Fields**: Key-value pairs containing the fact's data
- **Predecessors**: References to other facts that this fact depends on
### Fact Envelopes
A **fact envelope** wraps a fact with its cryptographic signatures:
- **Fact**: The core fact data
- **Signatures**: Array of digital signatures from different public keys
### Predecessors
Facts can reference other facts through **predecessor relationships**:
- Each predecessor has a **role** (string key)
- A role can reference either a single fact or an array of facts
- References are made by fact type and hash
## Protocol Structure
The protocol uses a line-based text format where each line contains either:
1. JSON-encoded data
2. Protocol control markers (e.g., `PK0`, `PK1`)
3. WebSocket control frames (e.g., `SUB`, `UNSUB`, `BOOK`, `ERR`)
4. Empty lines as separators
### Basic Format
```
[Fact Block 1]
[Public Key Declaration]
[Fact Block 2]
[Public Key Declaration]
[Fact Block 3]
...
```
Public key declarations can appear at any point in the stream, interspersed with fact blocks as needed.
## Public Key Management
Public keys are declared on-demand when first encountered and referenced by index to avoid repetition. Key declarations can appear anywhere in the stream, not just at the beginning.
### Public Key Declaration
```
PK{index}
"{publicKey}"
```
**Format:**
- Line 1: `PK` followed by zero-based index number
- Line 2: JSON-encoded public key string
- Line 3: Empty line separator
**Dynamic Declaration:**
Public keys are declared the first time they are needed, which may be:
- Before the first fact that uses them
- Interspersed between fact blocks
- At any point during the stream processing
**Example:**
```
PK0
"public-key-string-1"
PK1
"public-key-string-2"
```
## Fact Serialization
Each fact is serialized as a block with the following structure:
### Fact Block Format
```
"{type}"
{predecessorIndexes}
{fields}
[Signature Lines]
```
**Components:**
1. **Type Line**: JSON-encoded fact type string
2. **Predecessor Line**: JSON object mapping roles to fact indexes
3. **Fields Line**: JSON object containing fact data
4. **Signature Lines** (optional): Zero or more signature references
5. **Empty Line**: Block separator
### Predecessor Indexing
Predecessors are referenced by their position in the serialization stream (zero-based indexing).
**Single Predecessor:**
```json
{"role": 0}
```
**Multiple Predecessors:**
```json
{"role": [0, 1, 2]}
```
**No Predecessors:**
```json
{}
```
### Signature Format
Signatures reference previously declared public keys:
```
PK{publicKeyIndex}
"{signatureString}"
```
**Example:**
```
PK0
"signature-data-1"
PK1
"signature-data-2"
```
## Data Type Encoding
### Strings
All strings are JSON-encoded with proper escaping:
```json
"MyApp.Root"
"Hello \"World\""
```
### Objects
Field data and predecessor collections are JSON objects:
```json
{"identifier": "root", "value": 42}
{"parent": 0, "children": [1, 2]}
```
### Arrays
Arrays are used for multiple predecessors in the same role:
```json
{"tags": [0, 1, 2]}
```
### Special Values
- `null`: JSON null
- `undefined`: Not directly supported (omitted from objects)
- Numbers: JSON numbers
- Booleans: JSON booleans
## Complete Examples
### Example 1: Single Fact Without Signatures
**Input Data:**
```typescript
{
fact: {
type: "MyApp.Root",
hash: "roothash",
fields: { identifier: "root" },
predecessors: {}
},
signatures: []
}
```
**Serialized Output:**
```
"MyApp.Root"
{}
{"identifier":"root"}
```
### Example 2: Two Related Facts Without Signatures
**Input Data:**
```typescript
[
{
fact: {
type: "MyApp.Root",
hash: "roothash",
fields: {},
predecessors: {}
},
signatures: []
},
{
fact: {
type: "MyApp.Child",
hash: "childhash",
fields: {},
predecessors: {
root: { type: "MyApp.Root", hash: "roothash" }
}
},
signatures: []
}
]
```
**Serialized Output:**
```
"MyApp.Root"
{}
{}
"MyApp.Child"
{"root":0}
{}
```
### Example 3: Facts With Multiple Signatures
**Input Data:**
```typescript
[
{
fact: {
type: "MyApp.Root",
hash: "roothash",
fields: {},
predecessors: {}
},
signatures: [
{ publicKey: "public", signature: "signature" }
]
},
{
fact: {
type: "MyApp.Child",
hash: "childhash",
fields: {},
predecessors: {
root: { type: "MyApp.Root", hash: "roothash" }
}
},
signatures: [
{ publicKey: "public", signature: "signature1" },
{ publicKey: "public2", signature: "signature2" }
]
}
]
```
**Serialized Output:**
```
PK0
"public"
"MyApp.Root"
{}
{}
PK0
"signature"
PK1
"public2"
"MyApp.Child"
{"root":0}
{}
PK0
"signature1"
PK1
"signature2"
```
**Note:** In this example, `PK0` is declared before the first fact that uses it, and `PK1` is declared later when first needed by the second fact. This demonstrates the dynamic, on-demand nature of public key declarations.
## Deserialization Process
### Hash Computation
During deserialization, fact hashes are recomputed from the fields and predecessors using the [`computeHash`](../src/fact/hash.ts) function. The original hash from serialization is not preserved but recalculated.
### Reference Resolution
Predecessor references are resolved using the order of facts in the stream:
- Index 0 = first fact
- Index 1 = second fact
- etc.
### Public Key Resolution
Public key references are resolved using declaration order:
- Keys can be declared at any point in the stream
- Each key gets the next available index when first declared
- Subsequent references use the assigned index
### Batch Processing
The deserializer processes facts in batches (default: 20 facts) for efficiency.
## Protocol Characteristics
### Streaming Support
- Facts can be processed as they arrive
- No need to buffer entire graph in memory
- Supports incremental processing
- Public keys are processed dynamically as declarations are encountered
## WebSocket Transport
### Overview
When transported over WebSockets, the graph is a single, continuous stream of fact blocks and public key declarations. There is no special framing to begin or end a graph. The stream may also contain control frames that manage feed subscriptions and bookmarks. Facts are global and not scoped to any one feed; only bookmarks are per-feed.
### Control Frames
Control frames are uppercase keywords followed by one or more JSON lines, terminated by a single empty line. Control frames appear only between fact blocks (never inside a block).
- SUB: Subscribe the client to a feed at a given bookmark
```
SUB
"<feed-id>"
"<bookmark>"
```
- UNSUB: Unsubscribe the client from a feed
```
UNSUB
"<feed-id>"
```
- BOOK: Server informs the client that a feed has advanced to a bookmark. All facts necessary to reach this bookmark must have already been sent on the stream as normal graph blocks.
```
BOOK
"<feed-id>"
"<bookmark>"
```
- ERR: Server reports a per-feed error (optional)
```
ERR
"<feed-id>"
"<message>"
```
### Ordering and Semantics
- The stream of facts (graph) is continuous. Public keys (`PK{index}`) and fact blocks follow the format in this document.
- Control frames may appear between fact blocks. They are delimited by a single empty line and never interleave within a fact block.
- For a given feed, `BOOK` must be sent only after all relevant facts have been streamed.
- Clients persist facts as soon as they are deserialized, then update feed bookmarks upon receiving `BOOK`.
### Reconnection
After a reconnect, the client resends `SUB` for each active feed with the last persisted bookmark. The server resumes by streaming any missing facts followed by `BOOK` updates.
### Deduplication
- Facts with identical type and hash are automatically deduplicated
- Only the first occurrence is serialized
- Subsequent references use the same index
### Ordering Requirements
- Public keys must be declared before use, but can appear anywhere in the stream
- Facts must be serialized before being referenced as predecessors
- Topological ordering is enforced for fact dependencies
- Public key declarations are processed dynamically as encountered
### Error Handling
- Invalid JSON causes parsing errors
- Out-of-range predecessor references are rejected
- Missing public key declarations are detected
- Malformed signature blocks are caught
## Versioning
The current protocol has no explicit version markers. The format is identified by:
- Line-based structure
- JSON encoding for data
- `PK{index}` markers for public keys
- Empty line separators
## Performance Considerations
### Memory Efficiency
- Streaming design minimizes memory usage
- Public key deduplication reduces size
- Fact deduplication prevents redundancy
### Network Efficiency
- Text-based format is compressible
- Minimal overhead for small graphs
- Efficient for sparse signature coverage
### Processing Efficiency
- Line-by-line parsing
- Incremental hash computation
- Batch processing for I/O efficiency
## Implementation Notes
### Key Classes
- [`GraphSerializer`](../src/http/serializer.ts): Handles serialization
- [`GraphDeserializer`](../src/http/deserializer.ts): Handles deserialization
- [`FactEnvelope`](../src/storage.ts:36): Core data structure
### Dependencies
- JSON encoding/decoding for data representation
- Hash computation for fact integrity
- Streaming I/O for large graphs
### Thread Safety
- Serializers maintain internal state (not thread-safe)
- Each serialization operation should use a fresh instance
- Deserializers are stateful and not reentrant