UNPKG

@_all_docs/cache

Version:

Fetch, cache, & map/reduce :origin/{_all_docs,:packument}` documents for a set of lexographically sorted pivots by range or partition

119 lines (89 loc) 4.36 kB
# `@_all_docs/*` packages ## `@_all_docs/cli` **Installation** ``` sh $ pnpm -g @_all_docs/cli ``` **Build an initial set of `_all_docs` partitions** ``` sh # Build the initial set of cache partitions $ _all_docs cache build --pivots ./path/to/pivots.json Cache 420 partitions from ./path/to/pivots.json in ~/.cache/_all_docs/partitions/ ``` **Create & update _all_docs frames for ecosystem analysis workflows** ``` sh # Create an _all_docs frame for the npm-high-impact dataset # See: https://github.com/wooorm/npm-high-impact $ _all_docs frame create npm-high-impact --from $(node -pe "JSON.stringify(require('npm-high-impact').npmHighImpact, null , 2)") Create new default frame "npm-high-impact" in ~/.local/state/frames/npm-high-impact/ # Fetch all packuments for the packages in the npm-high-impact dataset $ _all_docs packuments fetch Cache 9,234 packuments for frame "npm-high-impact" into ~/.cache/_all_docs/packuments ``` **YOU DON'T QUERY IT! YOU WRITE A DISTRIBUTED MAP-REDUCE FUNCTION IN ~ERLANG~ JAVASCRIPT!** ```sh # Run a distributed map-reduce operation on the set (e.g. unified set of all `dependencies`, etc.) $ _all_docs runs new --design-doc ./path/to/design/doc.js --exec viewName Run require('./path/to/design/doc.js').views.viewName on 9,234 packuments for frame "npm-high-impact"? (y/n) y Ran require('./path/to/design/doc.js').views.viewName on 9,234 packuments for frame "npm-high-impact" 9,234 okay 0 errors Written to ~/.local/share/_all_docs/map-reduce/{sid=ae45}/{map,reduce,groups}.json # View _all_docs run history to confirm timing of runs within a given frame (or all frames) $ _all_docs runs --frame npm-high-impact { sid=ae45, design-doc='./path/to/design/doc.js' exec='viewName' when='just now', okay=9234, errors=0 } # Add all items from the previous run output (e.g. get all dependencies) to the current frame $ _all_docs frame add --from $(_all_docs get run/ae45/reduce.json) # Fetch all packuments for the packages in the npm-high-impact dataset $ _all_docs packuments fetch Cache 12,627 packuments for frame "npm-high-impact" into ~/.cache/_all_docs/packuments 9,234 Existing packuments 3,392 New packuments ``` ## `@_all_docs/config` ```js import Config from '@_all_docs/config' const cfg = new Config({ xdg: new XDG('_all_docs') }) const { partitions, // cfg.xdg.cache('partitions') // ~/.cache/_all_docs/partitions/ packuments, // cfg.xdg.cache('packuments') // ~/.cache/_all_docs/packuments/ mapReduce, // cfg.xdg.data('map-reduce') // ~/.local/share/_all_docs/map-reduce/ frames, // cfg.xdg.data('map-reduce') // ~/.local/state/_all_docs/frames } = cfg.xdg; const { concurrency, // Number of concurrent executions and/or HTTP requests start, // Number of records to skip prior to beginning execution size, // Number of records to operate on dryRun // If true, skips all "hard" operations (e.g. HTTP request, execution of user code, etc.) // to ensure idempotency of potentially non-idempotent operations } = cfg.limits ``` ## `@_all_docs/frame` All operations related to _all_docs data "frames": * Set operations (add, remove, intersect, etc) * Run operations (i.e. map-reduce execution, reading results, etc.) > n.b. intentionally not calling these "dataframes" to avoid confusion with operations available on a pandas "dataframe" ## `@_all_docs/packument`, `@_all_docs/partition`, `@_all_docs/http` Fetching, reading, and normalizing `Packument` and `Partition` instances over HTTP(S)/2 ## `@_all_docs/cache` A pair of `@vltpkg/cache` instances that expose the underlying shared disk caches used by default in `_all_docs`: ```js import { PackumentCache, PartitionCache } from '@_all_docs/cache'; const defaults = { packuments: new PackumentCache({ path: cfg.xdg.packuments }), partitions: new PartitionCache({ path: cfg.xdg.partition }) } ``` The `PartitionCache` is most useful when a full fetch of a remote `_all_docs` origin is the goal of the program. In most cases the `PackumentCache` should suffice as in the example above as the `Frame` was created from a list of `packument` names (not all packuments). A `Frame` therefore can bne configured to use its own cache, but by default uses the global `_all_docs` cache for packuments.