UNPKG

hybrid-id-generator

Version:

A powerful hybrid ID generator that combines timestamps, machine IDs, random bits, and sequence numbers to create globally unique identifiers. Features collision prevention, Base62 encoding, and optional ID expiry tracking, ideal for distributed systems a

80 lines (64 loc) 5.42 kB
## Goals: 1. **Uniqueness**: Global uniqueness to avoid collisions. 2. **Scalability**: Works efficiently in distributed systems and centralized databases. 3. **Performance**: Fast for lookups, sorting, and indexing. 4. **Readability**: Optionally human-readable (for URLs or external usage). 5. **Space efficiency**: Minimize storage requirements. 6. **Time-ordering**: Enable chronological sorting when required. 8. **Security**: Avoid revealing sensitive information through ID patterns. ## Proposal: Hybrid ID (Time-based + Random + Sharded) To design a new type of ID, we can combine several key features from existing ID systems: - Time component for uniqueness and chronological sorting. - Random component to avoid sequential patterns and ensure uniqueness across distributed systems. - Sharded/Machine ID to avoid collisions in a distributed system. - Compact format to minimize storage, similar to integers. - Optional human-readable encoding (e.g., base32 or base62) for user-facing applications. **Here’s how this hybrid ID would look:** ```css [Timestamp] + [Machine ID] + [Random Component] + [Sequence] ``` 1. **Timestamp (42 bits)** - **Purpose:** Encodes the current timestamp (in milliseconds) for time-based sorting and uniqueness. - **Bits required:** - 42 bits for milliseconds since Unix epoch (~1 trillion values, enough to store dates for over 139 years). - This allows IDs to be ordered chronologically without additional sorting. 2. **Machine ID (or Shard ID) (10 bits)** - **Purpose:** Ensures that IDs generated by different machines or database nodes don’t collide. - **Bits required:** - **10 bits** for the machine/shard ID allows up to 1024 machines in a distributed system to generate IDs simultaneously. 3. **Random Component (8-12 bits)** - **Purpose:** Adds randomness to avoid predictable patterns and collisions, especially in systems where multiple IDs may be generated within the same millisecond on the same machine. - **Bits required:** - **8-12 bits** of randomness ensures low probability of collisions within the same millisecond for IDs generated on the same machine. 4. **Sequence Number (12 bits)** - **Purpose:** In cases where a machine generates more than one ID per millisecond, the sequence number ensures uniqueness by incrementing for each additional ID. - **Bits required:** - **12 bits** allow for 4096 unique IDs to be generated by the same machine within the same millisecond. Timestamp (48 bits) + Machine ID (12 bits) + Sequence (12 bits) + Random Bits (10 bits) + Entropy Bits (5 bits) = 87 bits total. ### Total Size: - `87 bits total` (42 bits timestamp + 10 bits machine ID + 8-12 bits random + 12 bits sequence) is much smaller than a UUID (128 bits) and slightly larger than a 64-bit integer. - Compact enough for efficient storage and fast lookups. --- ## Advantages of the Hybrid ID: 1. **Global Uniqueness**: By combining the timestamp, machine ID, and random/sequence components, collisions are highly unlikely, even in distributed systems. 2. **Efficient Lookups and Indexing**: IDs can be stored and indexed as compact numeric values, making lookups efficient. 3. **Chronological Ordering**: The timestamp component allows IDs to be naturally ordered by the time they were created. 4. **Scalability**: The machine/shard ID allows distributed systems with many servers or nodes to generate IDs independently. 5. **No Sequential Exposure**: Random components and machine-specific identifiers make it harder for users or competitors to infer system information from ID patterns. 6. **Optional Human-Readable Format**: If needed, the ID can be encoded into a base32 or base62 string format, allowing for human-friendly IDs in URLs or external systems (e.g., `B12K-5P1J`). --- ### Additional Considerations: 1. **Collision prevention**: Even though collisions are extremely unlikely, if your system generates a large number of IDs very quickly (in the same millisecond), you can extend the random component or use a larger sequence number. 2. **ID Expiry**: Since this ID includes a timestamp, you can also determine when a record was created. This is useful if your system needs to track when resources were created or if you want to implement an expiry system. 3. **Backwards compatibility**: If integrating with existing systems, consider whether you need to convert this ID to a more compact or user-friendly format (e.g., by encoding it in Base62). ### Pros of the Hybrid ID: - **Globally unique**: Thanks to the combination of time, machine, and randomness. - **Efficient indexing**: Can be stored compactly as a numeric value. - **Scalable**: Works well in distributed systems and single-server setups. - **Time-based**: Useful for time-sensitive queries and natural sorting. - **Predictable size**: Fixed bit length for consistent storage and performance. - **Randomization for security**: Prevents inference of data patterns. ### Cons: - **Slightly larger than plain integers**: It takes more bits than a simple integer, though it's still much smaller than a UUID. - **Requires synchronized clocks**: If the system's clock is not synchronized, it could lead to time discrepancies between different nodes. - **Complexity in generation**: Slightly more complex than auto-incrementing integers or UUIDs, but this can be abstracted away in an ID generator function.