hyperlog-index
Version:
forking indexes for hyperlog
196 lines (145 loc) • 5.41 kB
Markdown
Forking indexes for [hyperlog](https://npmjs.com/package/hyperlog)
Built on the map/reduce pattern; hyperlog-index will call a map function on every hyperlog insert, building the index incrementally.
Using hyperlog-index, we can easily build a key/value store backed to a hyperlog
that implements a [multi-value register conflict strategy](https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type#Others):
``` js
var level = require('level')
var indexer = require('hyperlog-index')
var hyperlog = require('hyperlog')
var sub = require('subleveldown')
var mkdirp = require('mkdirp')
var minimist = require('minimist')
var argv = minimist(process.argv.slice(2), {
default: { d: '/tmp/kv.db' }
})
mkdirp.sync(argv.d)
var hdb = level(argv.d + '/h')
var idb = level(argv.d + '/i')
var log = hyperlog(hdb, { valueEncoding: 'json' })
var db = sub(idb, 'x', { valueEncoding: 'json' })
var dex = indexer({
log: log,
db: sub(idb, 'i'),
map: function (row, next) {
// This method reduces our new state. In this example, db is used for the state.
db.get(row.value.k, function (err, doc) {
if (!doc) doc = {}
row.links.forEach(function (link) {
delete doc[link]
})
doc[row.key] = row.value.v
db.put(row.value.k, doc, next)
})
}
})
if (argv._[0] === 'get') {
dex.ready(function () {
db.get(argv._[1], function (err, values) {
if (err) console.error(err)
else console.log(values)
})
})
} else if (argv._[0] === 'put') {
// Structure `doc` as expected by `map` above
var doc = { k: argv._[1], v: argv._[2] }
dex.ready(function () {
db.get(doc.k, function (err, values) {
// Link the new entry to the "parents", from the current index, if any
log.add(Object.keys(values || {}), doc, function (err, node) {
if (err) console.error(err)
})
})
})
} else if (argv._[0] === 'sync') {
var r = log.replicate()
process.stdin.pipe(r).pipe(process.stdout)
r.on('end', function () { process.stdin.pause() })
}
```
Each key maps to an object of hashes to values:
```
$ node kv.js -d /tmp/db1 put A beep
$ node kv.js -d /tmp/db1 put A boop
$ node kv.js -d /tmp/db1 get A
{ '06e4130fc5f2392cb8bdb065d18eaa523d716f2c61b4877853340a5cc727fb42': 'boop' }
```
Meanwhile, a second database may have additional edits:
```
$ node kv.js -d /tmp/db2 put A whatever
$ node kv.js -d /tmp/db2 put B hey
```
When these two databases are merged together, the key at `A` has two values:
```
$ dupsh 'node kv.js -d /tmp/db1 sync' 'node kv.js -d /tmp/db2 sync'
$ node kv.js -d /tmp/db1 get A
{ '06e4130fc5f2392cb8bdb065d18eaa523d716f2c61b4877853340a5cc727fb42': 'boop',
cba756b45e279ae5c3f3ebc8cfe0d50e1f2205e37a4443ce9e0e5a41491c234c: 'whatever' }
```
The `B` key has only a single element:
```
$ node kv.js -d /tmp/db1 get B
{ '53a374617fb8839b6f19646d6658188a4fc08d19f35c084dab835847532a3468': 'hey' }
```
This is because `put` does the linking of new nodes to old ones, which is not done in merge.
New updates that link at both existing keys will merge into a single key:
```
$ node kv.js -d /tmp/db1 put A whatboop
$ node kv.js -d /tmp/db1 get A
{ '85915730b3e7a4f715057e74af79b564a5be2ec14d334d344cb84d1544ec6107': 'whatboop' }
```
and these merges can be communicated over replication:
```
$ dupsh 'node kv.js -d /tmp/db1 sync' 'node kv.js -d /tmp/db2 sync'
$ node kv.js -d /tmp/db2 get A
{ '85915730b3e7a4f715057e74af79b564a5be2ec14d334d344cb84d1544ec6107': 'whatboop' }
```
And the index can be destroyed (and recalculated) at any time:
```
$ rm -rf /tmp/db1/i
$ node kv.js -d /tmp/db1 get A
{ '85915730b3e7a4f715057e74af79b564a5be2ec14d334d344cb84d1544ec6107': 'whatboop' }
```
This is a useful strategy when you need to update the code in your indexes.
**Note:** If you run the included example, the value is assumed to be a json object.
The command line `put` format will be more like this:
```
$ node example/kv.js -d /tmp/db1 put A '{"baap":"boop"}'
```
**Note:** If you are primarily interested in a key/value index, like in this
example - check out [hyperkv](https://www.npmjs.com/package/hyperkv)
``` js
var indexer = require('hyperlog-index')
```
Create a new hyperlog index instance `dex` from:
* `opts.log` - a hyperlog instance (required)
* `opts.db` - a level instance (required)
* `opts.map` - an indexing function `function (row, next) {}`
You can have as many indexes as you like on the same log, just create more `dex`
instances on sublevels.
The indexing function `fn` runs for each `row`. The indexing function should
write its computed indexes to durable storage and call `next(err)` when it is
finished.
Registers the callback `fn()` to fire when the indexes have "caught up" to the
latest known change in the hyperlog. The `fn()` function fires exactly once. You
may call `dex.ready()` multiple times with different functions.
Pause calculating the indexes. `dex.ready()` will not fire until the indexes
have been resumed.
Resume calculation of the indexes after `dex.pause()`.
If the underlying system generates an error, you can catch it here.
With [npm](https://npmjs.org) do:
```
npm install hyperlog-index
```
MIT