UNPKG

strictencode

Version:

Deterministic binary encoding for RGB protocol compliance - JavaScript implementation of StrictEncode

529 lines (406 loc) 13.2 kB
# StrictEncode Specification ## Deterministic Binary Encoding for RGB Protocol *A layered guide to understanding StrictEncode, from basics to RGB20 contract encoding* --- ## Table of Contents **Core Concepts** - [1. Introduction](#1-introduction) - [2. Basic Types](#2-basic-types) - [3. Strings and Collections](#3-strings-and-collections) - [4. Options and Variants](#4-options-and-variants) **RGB20 Implementation** - [5. RGB20 Data Structures](#5-rgb20-data-structures) - [6. RGB20 Contract Encoding](#6-rgb20-contract-encoding) **Advanced Topics** - [Appendix A: LEB128 Encoding](#appendix-a-leb128-encoding) - [Appendix B: HashMap Encoding](#appendix-b-hashmap-encoding) - [Appendix C: Edge Cases and Error Handling](#appendix-c-edge-cases-and-error-handling) - [Appendix D: Implementation Reference](#appendix-d-implementation-reference) --- ## 1. Introduction **StrictEncode** is a deterministic binary serialization format designed for consensus-critical applications like the RGB protocol. Unlike JSON or other text formats, StrictEncode produces identical byte sequences for identical data, making it suitable for cryptographic commitments and contract validation. ### Key Principles 1. **Deterministic**: Same input always produces same output 2. **Compact**: Efficient binary representation 3. **Type-safe**: Each type has specific encoding rules 4. **Extensible**: Supports complex nested structures ### Why StrictEncode? ``` JSON: {"name": "RGB20", "value": 1000000} // Variable formatting StrictEncode: 05524742323040420f0000000000 // Always identical ``` The RGB protocol requires deterministic encoding to ensure all participants can independently verify contract states and transitions. --- ## 2. Basic Types ### Unsigned Integers (Little-Endian) All integers are encoded in **little-endian** byte order (least significant byte first). ``` Type | Value | Encoding --------|-----------|---------- u8 | 255 | ff u16 | 65535 | ffff u32 | 1000000 | 40420f00 u64 | 1000000 | 40420f0000000000 ``` **Example**: `u32(1000000)` ``` Decimal: 1,000,000 Hex: 0x00_0F_42_40 Little-end: 0x40_42_0F_00 Encoding: 40420f00 ``` ### Boolean Values ``` Value | Encoding ------|---------- false | 00 true | 01 ``` ### Code Example ```javascript // JavaScript implementation function encodeU32(value) { const buffer = new ArrayBuffer(4); const view = new DataView(buffer); view.setUint32(0, value, true); // true = little-endian return new Uint8Array(buffer); } encodeU32(1000000); // [0x40, 0x42, 0x0F, 0x00] ``` --- ## 3. Strings and Collections ### String Encoding Strings use **LEB128 length prefix** + **UTF-8 bytes**: ``` Format: [LEB128 length][UTF-8 bytes] ``` **Examples**: ``` String | Length | LEB128 | UTF-8 Bytes | Full Encoding ------------|--------|--------|-----------------------|--------------- "RGB" | 3 | 03 | 524742 | 03524742 "NIATCKR" | 7 | 07 | 4e494154434b52 | 074e494154434b52 "A"×200 | 200 | c801 | 414141...×200 | c801414141... ``` ### LEB128 Length Encoding (Simplified) **LEB128** (Little Endian Base 128) encodes integers using variable-length bytes: - Each byte uses 7 bits for data, 1 bit for continuation - Continuation bit (0x80) indicates more bytes follow - Final byte has continuation bit cleared ``` Value | Binary | LEB128 | Hex ------|---------------|--------|--------- 7 | 0000_0111 | 07 | 07 127 | 0111_1111 | 7f | 7f 128 | 1000_0000 | 8001 | 8001 200 | 1100_1000 | c801 | c801 ``` ### Vec<T> Encoding Collections encode as **LEB128 length** + **encoded items**: ```javascript // Vec<String> example: ["RGB", "20"] [ 02, // LEB128(2) - 2 items 03524742, // String "RGB" 02 3230 // String "20" ] ``` --- ## 4. Options and Variants ### Option<T> Encoding Options represent nullable values: ``` Value | Tag | Encoded Value | Full Encoding -------------|-----|---------------|--------------- None | 00 | - | 00 Some("test") | 01 | 0474657374 | 010474657374 ``` **Format**: `[tag][value if Some]` - Tag `00` = None - Tag `01` = Some, followed by encoded value --- ## 5. RGB20 Data Structures RGB20 contracts use three core data structures: ### AssetSpec Structure ```rust struct AssetSpec { ticker: String, // Asset symbol (e.g., "BTC") name: String, // Full name (e.g., "Bitcoin") precision: u8, // Decimal places details: Option<String>, // Optional additional info } ``` **Encoding Process**: 1. Encode `ticker` as String 2. Encode `name` as String 3. Encode `precision` as u8 4. Encode `details` as Option<String> **Example**: `AssetSpec { ticker: "NIATCKR", name: "NIA asset name", precision: 8, details: None }` ``` Field | Value | Encoding ----------|--------------------|----------------- ticker | "NIATCKR" (7 chars)| 074e494154434b52 name | "NIA asset name" | 0e4e4941206173736574206e616d65 precision | 8 | 08 details | None | 00 Final: 074e494154434b520e4e4941206173736574206e616d650800 ``` ### ContractTerms Structure ```rust struct ContractTerms { text: String, // Contract terms text media: Option<String>, // Optional media reference } ``` **Example**: `ContractTerms { text: "NIA terms", media: None }` ``` Field | Value | Encoding ------|--------------|------------- text | "NIA terms" | 094e4941207465726d73 media | None | 00 Final: 094e4941207465726d7300 ``` ### Amount (u64) Token amounts are 64-bit unsigned integers: ``` Amount: 1,000,000 tokens u64 little-endian: 40420f0000000000 ``` --- ## 6. RGB20 Contract Encoding ### Global State Structure RGB20 contracts encode global state as a HashMap with type IDs: ``` Type ID | Data Structure | Description --------|----------------|------------- 2000 | AssetSpec | Asset metadata 2001 | ContractTerms | Contract terms 2002 | Amount | Token supply ``` ### Complete RGB20 Genesis Example **Input Data**: ```javascript { ticker: "NIATCKR", name: "NIA asset name", precision: 8, terms: "NIA terms", supply: 1000000, utxo: "6a12c58f92d73cd8a685c55b3f0e7d5e2b4a1c23456789abcdef0123456789ab" } ``` **Encoded Components**: ``` AssetSpec (2000): 074e494154434b520e4e4941206173736574206e616d650800 ContractTerms (2001): 094e4941207465726d7300 Amount (2002): 40420f0000000000 ``` **Genesis Structure** (simplified for educational purposes): ```json { "schema_id": "rgb20", "global_state": { "2000": "074e494154434b520e4e4941206173736574206e616d650800", "2001": "094e4941207465726d7300", "2002": "40420f0000000000" }, "utxo": "6a12c58f92d73cd8a685c55b3f0e7d5e2b4a1c23456789abcdef0123456789ab" } ``` **Contract ID Generation**: ``` 1. Serialize genesis structure deterministically (sorted keys) 2. SHA-256 hash the serialized data 3. Encode with baid64 using "contract:" HRI 4. Result: contract:J6eX3eDp-YkywIQj-bSprBUK-Knq8h3p-glkLTcf-Kp9G~aM ``` --- ## Appendix A: LEB128 Encoding ### Detailed LEB128 Algorithm LEB128 (Little Endian Base 128) encodes unsigned integers efficiently: ```python def encode_leb128(value): result = [] while True: byte = value & 0x7F # Take 7 bits value >>= 7 # Shift right 7 bits if value != 0: byte |= 0x80 # Set continuation bit result.append(byte) if value == 0: break return bytes(result) ``` ### LEB128 Examples | Value | Binary Breakdown | Bytes | Hex Encoding | |-------|------------------|-------|--------------| | 0 | `0000_0000` | `[00]` | `00` | | 127 | `0111_1111` | `[7F]` | `7f` | | 128 | `1_000_0000` | `[80, 01]` | `8001` | | 300 | `10_010_1100` | `[AC, 02]` | `ac02` | | 16384 | `100_000_000_000_000` | `[80, 80, 01]` | `808001` | **Decoding Process**: ```python def decode_leb128(bytes_data): result = 0 shift = 0 for byte in bytes_data: result |= (byte & 0x7F) << shift if (byte & 0x80) == 0: break shift += 7 return result ``` --- ## Appendix B: HashMap Encoding ### HashMap<usize, T> Encoding Rules RGB uses HashMap<usize, T> for indexed collections (like global state): 1. **Sort by key** (usize values in ascending order) 2. **Extract values** in sorted key order 3. **Encode as Vec<T>** (length + items) ### Example: Global State Encoding **Input HashMap**: ```rust HashMap { 2001: ContractTerms(...), // Key 2001 2000: AssetSpec(...), // Key 2000 2002: Amount(...), // Key 2002 } ``` **Encoding Process**: ``` 1. Sort by key: [2000, 2001, 2002] 2. Extract values: [AssetSpec, ContractTerms, Amount] 3. Encode as Vec: LEB128(3) + encode(AssetSpec) + encode(ContractTerms) + encode(Amount) ``` ### JavaScript Implementation ```javascript function encodeHashMap(map, valueEncoder) { // Sort entries by key const entries = Object.entries(map) .map(([k, v]) => [parseInt(k), v]) .sort((a, b) => a[0] - b[0]); // Extract values in sorted order const values = entries.map(([k, v]) => v); // Encode as Vec<T> const encoder = new StrictEncoder(); encoder.encodeLeb128(values.length); values.forEach(value => valueEncoder(encoder, value)); return encoder.toBytes(); } ``` --- ## Appendix C: Edge Cases and Error Handling ### String Length Limits **Short strings** (≤ 127 chars): Single-byte LEB128 length **Long strings** (> 127 chars): Multi-byte LEB128 length ```javascript // Edge case: 128-character string const longString = "A".repeat(128); // Length 128 = 0x80 = needs 2 bytes in LEB128 // Encoding: [0x80, 0x01] + 128 'A' bytes ``` ### Integer Overflow Protection ```javascript function encodeU8(value) { if (value < 0 || value > 255 || !Number.isInteger(value)) { throw new Error(`Invalid u8 value: ${value}`); } return new Uint8Array([value]); } ``` ### UTF-8 Validation Strings must be valid UTF-8: ```javascript function encodeString(str) { try { const utf8Bytes = new TextEncoder().encode(str); // Encoder throws on invalid UTF-8 return encodeLeb128(utf8Bytes.length) + utf8Bytes; } catch (error) { throw new Error(`Invalid UTF-8 string: ${str}`); } } ``` ### Empty Collections ```javascript // Empty Vec<String> encodeVec([]); // [0x00] (LEB128 length = 0) // Empty String encodeString(""); // [0x00] (LEB128 length = 0) ``` --- ## Appendix D: Implementation Reference ### Complete RGB20 Encoder ```javascript class RGB20Encoder { static encodeAssetSpec(spec) { const encoder = new StrictEncoder(); encoder.encodeString(spec.ticker); encoder.encodeString(spec.name); encoder.encodeU8(spec.precision); encoder.encodeOption(spec.details, (details) => encoder.encodeString(details) ); return encoder.toHex(); } static encodeContractTerms(terms) { const encoder = new StrictEncoder(); encoder.encodeString(terms.text); encoder.encodeOption(terms.media, (media) => encoder.encodeString(media) ); return encoder.toHex(); } static encodeAmount(amount) { const encoder = new StrictEncoder(); encoder.encodeU64(BigInt(amount)); return encoder.toHex(); } } ``` ### Test Vector Validation ```javascript // Standard RGB20 test case const testSpec = { ticker: "NIATCKR", name: "NIA asset name", precision: 8, details: null }; const encoded = RGB20Encoder.encodeAssetSpec(testSpec); console.assert(encoded === "074e494154434b520e4e4941206173736574206e616d650800"); ``` ### Performance Considerations 1. **Buffer Management**: Pre-allocate buffers for known data sizes 2. **String Caching**: Cache UTF-8 encodings for repeated strings 3. **LEB128 Optimization**: Use lookup tables for common values (0-127) ```javascript // Optimized LEB128 for common cases const LEB128_CACHE = new Array(128).fill(0).map((_, i) => new Uint8Array([i]) ); function encodeLeb128Fast(value) { if (value < 128) { return LEB128_CACHE[value]; } return encodeLeb128Slow(value); } ``` --- ## Summary StrictEncode provides deterministic binary serialization essential for RGB contract validation: **Core Concepts**: - Little-endian integers, LEB128 variable-length encoding - Length-prefixed strings and collections - Tag-based Option encoding **RGB20 Application**: - AssetSpec, ContractTerms, Amount structures - HashMap-based global state encoding - SHA-256 contract ID generation **Implementation Benefits**: - Consensus-critical determinism - Compact binary representation - Type-safe encoding rules - Extensible for future RGB schemas This specification enables creation of RGB-compliant contract encoders that produce consistent, verifiable contract identifiers suitable for the RGB protocol's client-side validation architecture.