strictencode
Version:
Deterministic binary encoding for RGB protocol compliance - JavaScript implementation of StrictEncode
529 lines (406 loc) • 13.2 kB
Markdown
# StrictEncode Specification
## Deterministic Binary Encoding for RGB Protocol
*A layered guide to understanding StrictEncode, from basics to RGB20 contract encoding*
## Table of Contents
**Core Concepts**
- [1. Introduction](#1-introduction)
- [2. Basic Types](#2-basic-types)
- [3. Strings and Collections](#3-strings-and-collections)
- [4. Options and Variants](#4-options-and-variants)
**RGB20 Implementation**
- [5. RGB20 Data Structures](#5-rgb20-data-structures)
- [6. RGB20 Contract Encoding](#6-rgb20-contract-encoding)
**Advanced Topics**
- [Appendix A: LEB128 Encoding](#appendix-a-leb128-encoding)
- [Appendix B: HashMap Encoding](#appendix-b-hashmap-encoding)
- [Appendix C: Edge Cases and Error Handling](#appendix-c-edge-cases-and-error-handling)
- [Appendix D: Implementation Reference](#appendix-d-implementation-reference)
## 1. Introduction
**StrictEncode** is a deterministic binary serialization format designed for consensus-critical applications like the RGB protocol. Unlike JSON or other text formats, StrictEncode produces identical byte sequences for identical data, making it suitable for cryptographic commitments and contract validation.
### Key Principles
1. **Deterministic**: Same input always produces same output
2. **Compact**: Efficient binary representation
3. **Type-safe**: Each type has specific encoding rules
4. **Extensible**: Supports complex nested structures
### Why StrictEncode?
```
JSON: {"name": "RGB20", "value": 1000000} // Variable formatting
StrictEncode: 05524742323040420f0000000000 // Always identical
```
The RGB protocol requires deterministic encoding to ensure all participants can independently verify contract states and transitions.
## 2. Basic Types
### Unsigned Integers (Little-Endian)
All integers are encoded in **little-endian** byte order (least significant byte first).
```
Type | Value | Encoding
--------|-----------|----------
u8 | 255 | ff
u16 | 65535 | ffff
u32 | 1000000 | 40420f00
u64 | 1000000 | 40420f0000000000
```
**Example**: `u32(1000000)`
```
Decimal: 1,000,000
Hex: 0x00_0F_42_40
Little-end: 0x40_42_0F_00
Encoding: 40420f00
```
### Boolean Values
```
Value | Encoding
------|----------
false | 00
true | 01
```
### Code Example
```javascript
// JavaScript implementation
function encodeU32(value) {
const buffer = new ArrayBuffer(4);
const view = new DataView(buffer);
view.setUint32(0, value, true); // true = little-endian
return new Uint8Array(buffer);
}
encodeU32(1000000); // [0x40, 0x42, 0x0F, 0x00]
```
## 3. Strings and Collections
### String Encoding
Strings use **LEB128 length prefix** + **UTF-8 bytes**:
```
Format: [LEB128 length][UTF-8 bytes]
```
**Examples**:
```
String | Length | LEB128 | UTF-8 Bytes | Full Encoding
------------|--------|--------|-----------------------|---------------
"RGB" | 3 | 03 | 524742 | 03524742
"NIATCKR" | 7 | 07 | 4e494154434b52 | 074e494154434b52
"A"×200 | 200 | c801 | 414141...×200 | c801414141...
```
### LEB128 Length Encoding (Simplified)
**LEB128** (Little Endian Base 128) encodes integers using variable-length bytes:
- Each byte uses 7 bits for data, 1 bit for continuation
- Continuation bit (0x80) indicates more bytes follow
- Final byte has continuation bit cleared
```
Value | Binary | LEB128 | Hex
------|---------------|--------|---------
7 | 0000_0111 | 07 | 07
127 | 0111_1111 | 7f | 7f
128 | 1000_0000 | 8001 | 8001
200 | 1100_1000 | c801 | c801
```
### Vec<T> Encoding
Collections encode as **LEB128 length** + **encoded items**:
```javascript
// Vec<String> example: ["RGB", "20"]
[
02, // LEB128(2) - 2 items
03524742, // String "RGB"
02 3230 // String "20"
]
```
## 4. Options and Variants
### Option<T> Encoding
Options represent nullable values:
```
Value | Tag | Encoded Value | Full Encoding
-------------|-----|---------------|---------------
None | 00 | - | 00
Some("test") | 01 | 0474657374 | 010474657374
```
**Format**: `[tag][value if Some]`
- Tag `00` = None
- Tag `01` = Some, followed by encoded value
## 5. RGB20 Data Structures
RGB20 contracts use three core data structures:
### AssetSpec Structure
```rust
struct AssetSpec {
ticker: String, // Asset symbol (e.g., "BTC")
name: String, // Full name (e.g., "Bitcoin")
precision: u8, // Decimal places
details: Option<String>, // Optional additional info
}
```
**Encoding Process**:
1. Encode `ticker` as String
2. Encode `name` as String
3. Encode `precision` as u8
4. Encode `details` as Option<String>
**Example**: `AssetSpec { ticker: "NIATCKR", name: "NIA asset name", precision: 8, details: None }`
```
Field | Value | Encoding
----------|--------------------|-----------------
ticker | "NIATCKR" (7 chars)| 074e494154434b52
name | "NIA asset name" | 0e4e4941206173736574206e616d65
precision | 8 | 08
details | None | 00
Final: 074e494154434b520e4e4941206173736574206e616d650800
```
### ContractTerms Structure
```rust
struct ContractTerms {
text: String, // Contract terms text
media: Option<String>, // Optional media reference
}
```
**Example**: `ContractTerms { text: "NIA terms", media: None }`
```
Field | Value | Encoding
------|--------------|-------------
text | "NIA terms" | 094e4941207465726d73
media | None | 00
Final: 094e4941207465726d7300
```
### Amount (u64)
Token amounts are 64-bit unsigned integers:
```
Amount: 1,000,000 tokens
u64 little-endian: 40420f0000000000
```
## 6. RGB20 Contract Encoding
### Global State Structure
RGB20 contracts encode global state as a HashMap with type IDs:
```
Type ID | Data Structure | Description
--------|----------------|-------------
2000 | AssetSpec | Asset metadata
2001 | ContractTerms | Contract terms
2002 | Amount | Token supply
```
### Complete RGB20 Genesis Example
**Input Data**:
```javascript
{
ticker: "NIATCKR",
name: "NIA asset name",
precision: 8,
terms: "NIA terms",
supply: 1000000,
utxo: "6a12c58f92d73cd8a685c55b3f0e7d5e2b4a1c23456789abcdef0123456789ab"
}
```
**Encoded Components**:
```
AssetSpec (2000): 074e494154434b520e4e4941206173736574206e616d650800
ContractTerms (2001): 094e4941207465726d7300
Amount (2002): 40420f0000000000
```
**Genesis Structure** (simplified for educational purposes):
```json
{
"schema_id": "rgb20",
"global_state": {
"2000": "074e494154434b520e4e4941206173736574206e616d650800",
"2001": "094e4941207465726d7300",
"2002": "40420f0000000000"
},
"utxo": "6a12c58f92d73cd8a685c55b3f0e7d5e2b4a1c23456789abcdef0123456789ab"
}
```
**Contract ID Generation**:
```
1. Serialize genesis structure deterministically (sorted keys)
2. SHA-256 hash the serialized data
3. Encode with baid64 using "contract:" HRI
4. Result: contract:J6eX3eDp-YkywIQj-bSprBUK-Knq8h3p-glkLTcf-Kp9G~aM
```
## Appendix A: LEB128 Encoding
### Detailed LEB128 Algorithm
LEB128 (Little Endian Base 128) encodes unsigned integers efficiently:
```python
def encode_leb128(value):
result = []
while True:
byte = value & 0x7F # Take 7 bits
value >>= 7 # Shift right 7 bits
if value != 0:
byte |= 0x80 # Set continuation bit
result.append(byte)
if value == 0:
break
return bytes(result)
```
### LEB128 Examples
| Value | Binary Breakdown | Bytes | Hex Encoding |
|-------|------------------|-------|--------------|
| 0 | `0000_0000` | `[00]` | `00` |
| 127 | `0111_1111` | `[7F]` | `7f` |
| 128 | `1_000_0000` | `[80, 01]` | `8001` |
| 300 | `10_010_1100` | `[AC, 02]` | `ac02` |
| 16384 | `100_000_000_000_000` | `[80, 80, 01]` | `808001` |
**Decoding Process**:
```python
def decode_leb128(bytes_data):
result = 0
shift = 0
for byte in bytes_data:
result |= (byte & 0x7F) << shift
if (byte & 0x80) == 0:
break
shift += 7
return result
```
## Appendix B: HashMap Encoding
### HashMap<usize, T> Encoding Rules
RGB uses HashMap<usize, T> for indexed collections (like global state):
1. **Sort by key** (usize values in ascending order)
2. **Extract values** in sorted key order
3. **Encode as Vec<T>** (length + items)
### Example: Global State Encoding
**Input HashMap**:
```rust
HashMap {
2001: ContractTerms(...), // Key 2001
2000: AssetSpec(...), // Key 2000
2002: Amount(...), // Key 2002
}
```
**Encoding Process**:
```
1. Sort by key: [2000, 2001, 2002]
2. Extract values: [AssetSpec, ContractTerms, Amount]
3. Encode as Vec: LEB128(3) + encode(AssetSpec) + encode(ContractTerms) + encode(Amount)
```
### JavaScript Implementation
```javascript
function encodeHashMap(map, valueEncoder) {
// Sort entries by key
const entries = Object.entries(map)
.map(([k, v]) => [parseInt(k), v])
.sort((a, b) => a[0] - b[0]);
// Extract values in sorted order
const values = entries.map(([k, v]) => v);
// Encode as Vec<T>
const encoder = new StrictEncoder();
encoder.encodeLeb128(values.length);
values.forEach(value => valueEncoder(encoder, value));
return encoder.toBytes();
}
```
## Appendix C: Edge Cases and Error Handling
### String Length Limits
**Short strings** (≤ 127 chars): Single-byte LEB128 length
**Long strings** (> 127 chars): Multi-byte LEB128 length
```javascript
// Edge case: 128-character string
const longString = "A".repeat(128);
// Length 128 = 0x80 = needs 2 bytes in LEB128
// Encoding: [0x80, 0x01] + 128 'A' bytes
```
### Integer Overflow Protection
```javascript
function encodeU8(value) {
if (value < 0 || value > 255 || !Number.isInteger(value)) {
throw new Error(`Invalid u8 value: ${value}`);
}
return new Uint8Array([value]);
}
```
### UTF-8 Validation
Strings must be valid UTF-8:
```javascript
function encodeString(str) {
try {
const utf8Bytes = new TextEncoder().encode(str);
// Encoder throws on invalid UTF-8
return encodeLeb128(utf8Bytes.length) + utf8Bytes;
} catch (error) {
throw new Error(`Invalid UTF-8 string: ${str}`);
}
}
```
### Empty Collections
```javascript
// Empty Vec<String>
encodeVec([]); // → [0x00] (LEB128 length = 0)
// Empty String
encodeString(""); // → [0x00] (LEB128 length = 0)
```
## Appendix D: Implementation Reference
### Complete RGB20 Encoder
```javascript
class RGB20Encoder {
static encodeAssetSpec(spec) {
const encoder = new StrictEncoder();
encoder.encodeString(spec.ticker);
encoder.encodeString(spec.name);
encoder.encodeU8(spec.precision);
encoder.encodeOption(spec.details, (details) =>
encoder.encodeString(details)
);
return encoder.toHex();
}
static encodeContractTerms(terms) {
const encoder = new StrictEncoder();
encoder.encodeString(terms.text);
encoder.encodeOption(terms.media, (media) =>
encoder.encodeString(media)
);
return encoder.toHex();
}
static encodeAmount(amount) {
const encoder = new StrictEncoder();
encoder.encodeU64(BigInt(amount));
return encoder.toHex();
}
}
```
### Test Vector Validation
```javascript
// Standard RGB20 test case
const testSpec = {
ticker: "NIATCKR",
name: "NIA asset name",
precision: 8,
details: null
};
const encoded = RGB20Encoder.encodeAssetSpec(testSpec);
console.assert(encoded === "074e494154434b520e4e4941206173736574206e616d650800");
```
### Performance Considerations
1. **Buffer Management**: Pre-allocate buffers for known data sizes
2. **String Caching**: Cache UTF-8 encodings for repeated strings
3. **LEB128 Optimization**: Use lookup tables for common values (0-127)
```javascript
// Optimized LEB128 for common cases
const LEB128_CACHE = new Array(128).fill(0).map((_, i) =>
new Uint8Array([i])
);
function encodeLeb128Fast(value) {
if (value < 128) {
return LEB128_CACHE[value];
}
return encodeLeb128Slow(value);
}
```
## Summary
StrictEncode provides deterministic binary serialization essential for RGB contract validation:
**Core Concepts**:
- Little-endian integers, LEB128 variable-length encoding
- Length-prefixed strings and collections
- Tag-based Option encoding
**RGB20 Application**:
- AssetSpec, ContractTerms, Amount structures
- HashMap-based global state encoding
- SHA-256 contract ID generation
**Implementation Benefits**:
- Consensus-critical determinism
- Compact binary representation
- Type-safe encoding rules
- Extensible for future RGB schemas
This specification enables creation of RGB-compliant contract encoders that produce consistent, verifiable contract identifiers suitable for the RGB protocol's client-side validation architecture.