@_all_docs/cache
Version:
Fetch, cache, & map/reduce :origin/{_all_docs,:packument}` documents for a set of lexographically sorted pivots by range or partition
119 lines (89 loc) • 4.36 kB
Markdown
`@_all_docs/*` packages
# `@_all_docs/cli`
**Installation**
``` sh
pnpm -g @_all_docs/cli
```
**Build an initial set of `_all_docs` partitions**
``` sh
Build the initial set of cache partitions
_all_docs cache build --pivots ./path/to/pivots.json
Cache 420 partitions from ./path/to/pivots.json in ~/.cache/_all_docs/partitions/
```
**Create & update _all_docs frames for ecosystem analysis workflows**
``` sh
Create an _all_docs frame for the npm-high-impact dataset
See: https://github.com/wooorm/npm-high-impact
_all_docs frame create npm-high-impact --from $(node -pe "JSON.stringify(require('npm-high-impact').npmHighImpact, null , 2)")
Create new default frame "npm-high-impact" in ~/.local/state/frames/npm-high-impact/
Fetch all packuments for the packages in the npm-high-impact dataset
_all_docs packuments fetch
Cache 9,234 packuments for frame "npm-high-impact" into ~/.cache/_all_docs/packuments
```
**YOU DON'T QUERY IT! YOU WRITE A DISTRIBUTED MAP-REDUCE FUNCTION IN ~ERLANG~ JAVASCRIPT!**
```sh
Run a distributed map-reduce operation on the set (e.g. unified set of all `dependencies`, etc.)
_all_docs runs new --design-doc ./path/to/design/doc.js --exec viewName
Run require('./path/to/design/doc.js').views.viewName on 9,234 packuments for frame "npm-high-impact"? (y/n)
y
Ran require('./path/to/design/doc.js').views.viewName on 9,234 packuments for frame "npm-high-impact"
9,234 okay
0 errors
Written to ~/.local/share/_all_docs/map-reduce/{sid=ae45}/{map,reduce,groups}.json
View _all_docs run history to confirm timing of runs within a given frame (or all frames)
_all_docs runs --frame npm-high-impact
{ sid=ae45, design-doc='./path/to/design/doc.js' exec='viewName' when='just now', okay=9234, errors=0 }
Add all items from the previous run output (e.g. get all dependencies) to the current frame
_all_docs frame add --from $(_all_docs get run/ae45/reduce.json)
Fetch all packuments for the packages in the npm-high-impact dataset
_all_docs packuments fetch
Cache 12,627 packuments for frame "npm-high-impact" into ~/.cache/_all_docs/packuments
9,234 Existing packuments
3,392 New packuments
```
# `@_all_docs/config`
```js
import Config from '@_all_docs/config'
const cfg = new Config({
xdg: new XDG('_all_docs')
})
const {
partitions, // cfg.xdg.cache('partitions')
// ~/.cache/_all_docs/partitions/
packuments, // cfg.xdg.cache('packuments')
// ~/.cache/_all_docs/packuments/
mapReduce, // cfg.xdg.data('map-reduce')
// ~/.local/share/_all_docs/map-reduce/
frames, // cfg.xdg.data('map-reduce')
// ~/.local/state/_all_docs/frames
} = cfg.xdg;
const {
concurrency, // Number of concurrent executions and/or HTTP requests
start, // Number of records to skip prior to beginning execution
size, // Number of records to operate on
dryRun // If true, skips all "hard" operations (e.g. HTTP request, execution of user code, etc.)
// to ensure idempotency of potentially non-idempotent operations
} = cfg.limits
```
# `@_all_docs/frame`
All operations related to _all_docs data "frames":
* Set operations (add, remove, intersect, etc)
* Run operations (i.e. map-reduce execution, reading results, etc.)
n.b. intentionally not calling these "dataframes" to avoid confusion with operations available on a pandas "dataframe"
# `@_all_docs/packument`, `@_all_docs/partition`, `@_all_docs/http`
Fetching, reading, and normalizing `Packument` and `Partition` instances over HTTP(S)/2
# `@_all_docs/cache`
A pair of `@vltpkg/cache` instances that expose the underlying shared disk caches used by default in `_all_docs`:
```js
import { PackumentCache, PartitionCache } from '@_all_docs/cache';
const defaults = {
packuments: new PackumentCache({
path: cfg.xdg.packuments
}),
partitions: new PartitionCache({
path: cfg.xdg.partition
})
}
```
The `PartitionCache` is most useful when a full fetch of a remote `_all_docs` origin is the goal of the program. In most cases the `PackumentCache` should suffice as in the example above as the `Frame` was created from a list of `packument` names (not all packuments).
A `Frame` therefore can bne configured to use its own cache, but by default uses the global `_all_docs` cache for packuments.