UNPKG

stew-select

Version:

CSS selectors that allow regular expressions. Stew is a meatier soup.

283 lines (165 loc) 14 kB
# Hacking Stew *([Follow this link to go back to the README file.](../README.html))* [Stew](https://github.com/rodw/stew) is a JavaScript library, implemented in [CoffeeScript](http://coffeescript.org/). It is primarily intended to be used in a [Node.js](http://nodejs.org/) environment.^[Although it probably wouldn't be difficult to make Stew work in a browser context, we haven't had any need for that, and so we haven't (yet) attempted to do it. Drop us a [note](https://github.com/rodw/stew/issues) if this is something you'd like to see Stew support.] Both the (original) CoffeeScript and (generated) JavaScript files are included in the [binary distribution](https://npmjs.org/package/stew-select), so clients can use whichever they prefer. Stew's source code is hosted at [github.com/rodw/stew](https://github.com/rodw/stew). Any [issues](https://github.com/rodw/stew/issues) or [pull-requests](https://github.com/rodw/stew/pulls) you'd like to submit are appreciated. Stew is published under [an MIT license](../MIT-LICENSE.txt). This document provides information that is primarily of interest to those that want to *make changes* to Stew. Most clients (users) of the Stew library will be more intersted in the [README](../README.html) file. ## How it Works Stew is partioned into three classes: *DOMUtil*, *PredicateFactory* and *Stew*. [***Stew***](#stew) is the real driver behind the API, parsing CSS selector expressions and collecting matching nodes from the DOM tree. [***PredicateFactory***](#predicatefactory) defines methods that implement indvidual CSS selection rules. [***DOMUtil***](#domutil) provides fairly generic utility methods for working with DOM structures. We'll cover those bottom-up, from the most generic to the most specific. ### DOMUtil [***DOMUtil***](./docco/dom-util.html) provides generic utilities for working with the DOM (Document Object Model) structure generated by [node-htmlparser](https://github.com/tautologistics/node-htmlparser). For our purposes, the most important of these utilities is the `walk_dom` method, which implements a depth-first walk of a given DOM tree. `walk_dom` will invoke the given `visit` (callback) method for every node in the DOM. For example, to convert a DOM structure into text, we might create a `visit` method like this: ```javascript var buffer = ""; var visit = new function(node,node_metadata,all_metadata) { if(node.type === 'text') { buffer = buffer + node.raw; } return true; }; ``` and invoke it like this: ```javascript domutil.walk_dom(dom,visit); console.log("The text was"); console.log(buffer); ``` *Stew* uses `DOMUtil.walk_dom` to transverse the DOM tree. (`node_metadata` and `all_metadata` contain metadata about the current node, and all previously visited nodes, respectively. For example, `node_metadata.parent` contains the parent of the current node and `node_metadata.siblings` contains an array of all of `node_metadata.parent`'s children. See the comments with `dom-util.coffee` for more detail.) See the [annotated source](./docco/dom-util.html) for more detail. ### PredicateFactory [***PredicateFactory***](./docco/predicate-factory.html) generates predicate functions that test whether a given node matches a specific CSS selector. For example, the "universal selector" (`*`) matches any and every "tag" node. Here's a predicate function that implements the `*` selector: ```javascript function universal_selector_predicate(node) { return node.type === 'tag'; } ``` Here's a predicate that implements a "tag" selector, selecting all tags with the type (name) `foo`: ```javascript function foo_tag_predicate(node) { return node.type === 'tag' && node.name === 'foo'; } ``` *PredicateFactory* methods generate functions like these (bound to particular input parameters such as tag or attribute names). *PredicateFactory* includes generators for each of the core CSS selectors (tag, ID, class, attribute name and attribute value) as well as combinators such as "and" (no space), "or" (`,`), "descendant" (space), "direct descendant" (`>`), "adjacent sibling" (`+`). *Stew* uses these predicates to implement the CSS selection logic. See the [annotated source](./docco/predicate-factory.html) for more detail. ### Stew [***Stew***](./docco/stew.html) is the main entry point for the overall library. *Stew* parses a `String` representation of a CSS Selector, generate the appropriate predicates (using *PredicateFactory*) and then processes the DOM tree (using *DOMUtil*) to select the matching nodes. The CSS parsing is primarily accomplished via regular expressions. This is a multi-step process. For example, lets assume complicated CSS expression such as: 'div#main .sidebar ul.links li:first-child a[rel="author"][href]' 1. The expression is split into individual selectors by `_parse_selectors` using `_SPLIT_ON_WS_REGEXP`. Naively this the same as splitting the expression on white-space characters, but we also need to take into account the use of spaces within `"quoted strings"` and `/regular expressions/` and non-whitespace delimiters like `,` or `+`. In our example, we obtain these five tokens: [ 'div#main', '.sidebar', 'ul.links', 'li:first-child', 'a[rel="author"][href]' ] 2. Each of these tokens is then parsed into one or more CSS specific selectors by `_parse_selector` using `_CSS_SELECTOR_REGEXP` (and where needed, `_ATTRIBUTE_CLAUSE_REGEXP`). For example, from the first token (`div#main`) we identify two individual predicates, one that implements "tag name is `div`" and another that implements "node id is `main`". These two predicates are then joined by an "and" predicate. All together, these five tokens are converted into predicates (something) like these: a. `div#main` becomes `and( tag_name_is_div(), node_id_is_main() )` b. `.sidebar` becomes `class_name_is_sidebar()` c. `ul.links` becomes `and( tag_name_is_ul(), class_name_is_links() )` d. `li:first-child` becomes `and( tag_name_is_li(), tag_is_parents_first_child() )` e. `a[rel="author"][href]` becomes `and( tag_name_is_a(), rel_attr_is_author(), has_href_attr() )` 3. Back in `_parse_selectors` these five predicates are joined into a "descendant selector" predicate, yielding a single predicate that returns `true` if and only if the current node matches the complete CSS expression. CSS-Selector-implementing predicate in hand, *Stew*'s `select` method then visits every node in the DOM tree, collecting each node that matches the predicate. See the [annotated source](./docco/stew.html) for more detail. ### Unit Tests The `./test` directory contains unit tests for each of these types. These tests can be executed by running ```console make test ``` or ```console npm test ``` The [test-coverage report](./coverage.html) identifies the lines of code^[The generated JavaScript code, not source CoffeeScript, for better or worse.] that are exerciesd by the test suite. These report can be generated by running: ```console make coverage ``` ## How you can help. Your contributions, [bug reports](https://github.com/rodw/stew/issues) and [pull-requests](https://github.com/rodw/stew/pulls) are greatly appreciated. ### Areas that need work. If you're looking for areas in which to contribute, here are a few ideas: * Documenation and examples are *always* welcome. There are several Markdown-format files within [./docs/](https://github.com/rodw/stew/tree/master/docs) that are always in need of editing and improvement, and please feel free to plug any documentation gaps that you see. * New and improved unit-tests are also always welcome. You could help us ensure we've tested all the relevant parts of the CSS selector specification, or review the [test coverage report](http://heyrod.com/stew/docs/coverage.html) to identify areas that aren't currently exercised by our unit test suite. * Stew has a few known limitations we'd like to eliminate. See the "Limitations" section of the [README file](../README.html) for details. * Browser-side Stew isn't yet supported, or at least not fully tested. This probably doesn't require substantial changes, but no one has gotten around to it just yet. * Run the target `make todo` to see a list of `TODO`, `FIXME` and similiar comments within the code and documenation. ### How to contribute. We're happy to accept any help you can offer, but the following guidelines can help streamline the process for everyone. * You can report any bugs at [github.com/rodw/stew/issues](https://github.com/rodw/stew/issues). - We'll be able to address the issue more easily if you can provide an demonstration of the problem you are encountering. The best format for this demonstration is a failing unit test (like those found in [./test/](https://github.com/rodw/stew/tree/master/test)), but your report is welcome with or without that. * Our preferered channel for contributions or changes to Stew's source code and documenation is as a Git "patch" or "pull-request". - If you've never submitted a pull-request, here's one way to go about it: 1. Fork or clone the Stew repository. 2. Create a local branch to contain your changes (`git checkout -b my-new-branch`). 3. Make your changes and commit them to your local repository. 4. Create a pull request [as described here]( https://help.github.com/articles/creating-a-pull-request). - If you'd rather use a private (or just non-GitHub) repository, you might find [these generic instructions on creating a "patch" with Git](https://ariejan.net/2009/10/26/how-to-create-and-apply-a-patch-with-git/) helpful. * If you are making changes to the code please ensure that the [unit test suite](#unit-tests) still passes. * If you are making changes to the code to address a bug or introduce new features, we'd *greatly* appreciate it if you can provide one or more [unit tests](#unit-tests) that demonstrate the bug or exercise the new feature. **Please Note:** We'd rather have a contribution that doesn't follow these guidelines than no contribution at all. If you are confused or put-off by any of the above, your contribution is still welcome. Feel free to contribute or comment in whatever channel works for you. ## Nuts and Bolts ### Run-time Depenencies Technically Stew doesn't have any run-time dependencies. No external libraries are required. Practically speaking, Stew depends upon [Chris Winberry's node-htmlparser](https://github.com/tautologistics/node-htmlparser). Stew assumes the structure of the DOM object passed to `select` and similiar methods is compatible with that generated by node-htmlparser. If `node-htmlparser` is available (via a `require` call) then some (optional) `DOMUtil` methods will make use of it. Stew makes use of several libraries to support development, documentation and testing. These are enumerated in the `package.json` file. ### Building and Testing **Downloading** You can clone [Stew's Git repository](https://github.com/rodw/stew) via: ```console git clone git@github.com:rodw/stew.git ``` You can also [download a ZIP archive of the latest source](https://github.com/rodw/stew/archive/master.zip). **Installing** Once you have Stew cloned into a local working directory, you can use [npm](https://npmjs.org/) to install any build-time dependencies, as follows: ```console npm install ``` (This may take a few minutes, as some external libraries may need to be downloaded and natively compiled.) **Testing** Once installed, you can also run Stew's unit test suite using npm: ```console npm test ``` If everything is working properly, you should expect to see a message like `68 tests complete (633 ms)` (although the specific numbers might be different, of course). **Compiling the CoffeeScript files into JavaScript** You can run ```console npm run-script compile ``` to generate JavaScript files from the CoffeeScript files in `./lib`. ### Using Make If you have [GNU Make](http://www.gnu.org/software/make/) installed, the best and easiest way to work with Stew's source code is using the provided makefile. #### Installing and Testing You can use: ```console make install ``` and: ```console make test ``` and: ```console make js ``` in place of the npm equivalents above, but the makefile can help you to do much more than that. #### Generating Documentation **`make markdown`** will generate Stew's HTML documention from various [Markdown](http://daringfireball.net/projects/markdown/) files in the repository. Most of these files will be written to the `./docs` directory. Note that the Makefile uses [Pandoc](http://johnmacfarlane.net/pandoc/) to generate HTML from the Markdown sources, but in theory other Markdown processors could be used. **`make docco`** will generate an annotated version of Stew's source code using the nifty [Docco](http://jashkenas.github.io/docco/) documentation generator. These files will be written to `./docs/docco/`. **`make docs`** will do both of these at once. #### Test Coverage **`make coverage`** will generate a report that shows which source code lines are touched (and not touched) by the test suite. This runs the same unit tests as `make test`, but uses [JSCoverage](http://siliconforks.com/jscoverage/) to evaluate the test coverage. The coverage report is written to `./docs/coverage.html`. #### npm Packaging **`make module`** will generate a package suitable for distribution via npm (into a directory called `./module`). **`make test-module-install`** will generate the `./module` directory and then validate it by trying to install it into a temporary directory. You should expect to see `It worked!` as the last line of output. #### Some other targets **`make clean`** will remove various generated files. **`make todo`** will display a list of "TODO" and related comments found in the source code. **`make targets`** will list all available targets.