stew-select
Version:
CSS selectors that allow regular expressions. Stew is a meatier soup.
283 lines (165 loc) • 14 kB
Markdown
# Hacking Stew
*([Follow this link to go back to the README file.](../README.html))*
[Stew](https://github.com/rodw/stew) is a JavaScript library, implemented in [CoffeeScript](http://coffeescript.org/). It is primarily intended to be used in a [Node.js](http://nodejs.org/) environment.^[Although it probably wouldn't be difficult to make Stew work in a browser context, we haven't had any need for that, and so we haven't (yet) attempted to do it. Drop us a [note](https://github.com/rodw/stew/issues) if this is something you'd like to see Stew support.]
Both the (original) CoffeeScript and (generated) JavaScript files are included in the [binary distribution](https://npmjs.org/package/stew-select), so clients can use whichever they prefer.
Stew's source code is hosted at [github.com/rodw/stew](https://github.com/rodw/stew). Any [issues](https://github.com/rodw/stew/issues) or [pull-requests](https://github.com/rodw/stew/pulls) you'd like to submit are appreciated.
Stew is published under [an MIT license](../MIT-LICENSE.txt).
This document provides information that is primarily of interest to those that want to *make changes* to Stew. Most clients (users) of the Stew library will be more intersted in the [README](../README.html) file.
## How it Works
Stew is partioned into three classes: *DOMUtil*, *PredicateFactory* and *Stew*.
[***Stew***](#stew) is the real driver behind the API, parsing CSS selector expressions and collecting matching nodes from the DOM tree.
[***PredicateFactory***](#predicatefactory) defines methods that implement indvidual CSS selection rules.
[***DOMUtil***](#domutil) provides fairly generic utility methods for working with DOM structures.
We'll cover those bottom-up, from the most generic to the most specific.
### DOMUtil
[***DOMUtil***](./docco/dom-util.html) provides generic utilities for working with the DOM (Document Object Model) structure generated by [node-htmlparser](https://github.com/tautologistics/node-htmlparser).
For our purposes, the most important of these utilities is the `walk_dom` method, which implements a depth-first walk of a given DOM tree. `walk_dom` will invoke the given `visit` (callback) method for every node in the DOM.
For example, to convert a DOM structure into text, we might create a `visit` method like this:
```javascript
var buffer = "";
var visit = new function(node,node_metadata,all_metadata) {
if(node.type === 'text') {
buffer = buffer + node.raw;
}
return true;
};
```
and invoke it like this:
```javascript
domutil.walk_dom(dom,visit);
console.log("The text was");
console.log(buffer);
```
*Stew* uses `DOMUtil.walk_dom` to transverse the DOM tree.
(`node_metadata` and `all_metadata` contain metadata about the current node, and all previously visited nodes, respectively. For example, `node_metadata.parent` contains the parent of the current node and `node_metadata.siblings` contains an array of all of `node_metadata.parent`'s children. See the comments with `dom-util.coffee` for more detail.)
See the [annotated source](./docco/dom-util.html) for more detail.
### PredicateFactory
[***PredicateFactory***](./docco/predicate-factory.html) generates predicate functions that test whether a given node matches a specific CSS selector.
For example, the "universal selector" (`*`) matches any and every "tag" node. Here's a predicate function that implements the `*` selector:
```javascript
function universal_selector_predicate(node) {
return node.type === 'tag';
}
```
Here's a predicate that implements a "tag" selector, selecting all tags with the type (name) `foo`:
```javascript
function foo_tag_predicate(node) {
return node.type === 'tag' && node.name === 'foo';
}
```
*PredicateFactory* methods generate functions like these (bound to particular input parameters such as tag or attribute names).
*PredicateFactory* includes generators for each of the core CSS selectors (tag, ID, class, attribute name and attribute value) as well as combinators such as "and" (no space), "or" (`,`), "descendant" (space), "direct descendant" (`>`), "adjacent sibling" (`+`).
*Stew* uses these predicates to implement the CSS selection logic.
See the [annotated source](./docco/predicate-factory.html) for more detail.
### Stew
[***Stew***](./docco/stew.html) is the main entry point for the overall library. *Stew* parses a `String` representation of a CSS Selector, generate the appropriate predicates (using *PredicateFactory*) and then processes the DOM tree (using *DOMUtil*) to select the matching nodes.
The CSS parsing is primarily accomplished via regular expressions. This is a multi-step process.
For example, lets assume complicated CSS expression such as:
'div#main .sidebar ul.links li:first-child a[rel="author"][href]'
1. The expression is split into individual selectors by `_parse_selectors` using `_SPLIT_ON_WS_REGEXP`. Naively this the same as splitting the expression on white-space characters, but we also need to take into account the use of spaces within `"quoted strings"` and `/regular expressions/` and non-whitespace delimiters like `,` or `+`. In our example, we obtain these five tokens:
[ 'div#main', '.sidebar', 'ul.links', 'li:first-child', 'a[rel="author"][href]' ]
2. Each of these tokens is then parsed into one or more CSS specific selectors by `_parse_selector` using `_CSS_SELECTOR_REGEXP` (and where needed, `_ATTRIBUTE_CLAUSE_REGEXP`). For example, from the first token (`div#main`) we identify two individual predicates, one that implements "tag name is `div`" and another that implements "node id is `main`". These two predicates are then joined by an "and" predicate. All together, these five tokens are converted into predicates (something) like these:
a. `div#main` becomes `and( tag_name_is_div(), node_id_is_main() )`
b. `.sidebar` becomes `class_name_is_sidebar()`
c. `ul.links` becomes `and( tag_name_is_ul(), class_name_is_links() )`
d. `li:first-child` becomes `and( tag_name_is_li(), tag_is_parents_first_child() )`
e. `a[rel="author"][href]` becomes `and( tag_name_is_a(), rel_attr_is_author(), has_href_attr() )`
3. Back in `_parse_selectors` these five predicates are joined into a "descendant selector" predicate, yielding a single predicate that returns `true` if and only if the current node matches the complete CSS expression.
CSS-Selector-implementing predicate in hand, *Stew*'s `select` method then visits every node in the DOM tree, collecting each node that matches the predicate.
See the [annotated source](./docco/stew.html) for more detail.
### Unit Tests
The `./test` directory contains unit tests for each of these types. These tests can be executed by running
```console
make test
```
or
```console
npm test
```
The [test-coverage report](./coverage.html) identifies the lines of code^[The generated JavaScript code, not source CoffeeScript, for better or worse.] that are exerciesd by the test suite. These report can be generated by running:
```console
make coverage
```
## How you can help.
Your contributions, [bug reports](https://github.com/rodw/stew/issues) and [pull-requests](https://github.com/rodw/stew/pulls) are greatly appreciated.
### Areas that need work.
If you're looking for areas in which to contribute, here are a few ideas:
* Documenation and examples are *always* welcome. There are several Markdown-format files within [./docs/](https://github.com/rodw/stew/tree/master/docs) that are always in need of editing and improvement, and please feel free to plug any documentation gaps that you see.
* New and improved unit-tests are also always welcome. You could help us ensure we've tested all the relevant parts of the CSS selector specification, or review the [test coverage report](http://heyrod.com/stew/docs/coverage.html) to identify areas that aren't currently exercised by our unit test suite.
* Stew has a few known limitations we'd like to eliminate. See the "Limitations" section of the [README file](../README.html) for details.
* Browser-side Stew isn't yet supported, or at least not fully tested. This probably doesn't require substantial changes, but no one has gotten around to it just yet.
* Run the target `make todo` to see a list of `TODO`, `FIXME` and similiar comments within the code and documenation.
### How to contribute.
We're happy to accept any help you can offer, but the following guidelines can help streamline the process for everyone.
* You can report any bugs at [github.com/rodw/stew/issues](https://github.com/rodw/stew/issues).
- We'll be able to address the issue more easily if you can provide an demonstration of the problem you are encountering. The best format for this demonstration is a failing unit test (like those found in [./test/](https://github.com/rodw/stew/tree/master/test)), but your report is welcome with or without that.
* Our preferered channel for contributions or changes to Stew's source code and documenation is as a Git "patch" or "pull-request".
- If you've never submitted a pull-request, here's one way to go about it:
1. Fork or clone the Stew repository.
2. Create a local branch to contain your changes (`git checkout -b my-new-branch`).
3. Make your changes and commit them to your local repository.
4. Create a pull request [as described here]( https://help.github.com/articles/creating-a-pull-request).
- If you'd rather use a private (or just non-GitHub) repository, you might find [these generic instructions on creating a "patch" with Git](https://ariejan.net/2009/10/26/how-to-create-and-apply-a-patch-with-git/) helpful.
* If you are making changes to the code please ensure that the [unit test suite](#unit-tests) still passes.
* If you are making changes to the code to address a bug or introduce new features, we'd *greatly* appreciate it if you can provide one or more [unit tests](#unit-tests) that demonstrate the bug or exercise the new feature.
**Please Note:** We'd rather have a contribution that doesn't follow these guidelines than no contribution at all. If you are confused or put-off by any of the above, your contribution is still welcome. Feel free to contribute or comment in whatever channel works for you.
## Nuts and Bolts
### Run-time Depenencies
Technically Stew doesn't have any run-time dependencies. No external libraries are required.
Practically speaking, Stew depends upon [Chris Winberry's node-htmlparser](https://github.com/tautologistics/node-htmlparser). Stew assumes the structure of the DOM object passed to `select` and similiar methods is compatible with that generated by node-htmlparser.
If `node-htmlparser` is available (via a `require` call) then some (optional) `DOMUtil` methods will make use of it.
Stew makes use of several libraries to support development, documentation and testing. These are enumerated in the `package.json` file.
### Building and Testing
**Downloading**
You can clone [Stew's Git repository](https://github.com/rodw/stew) via:
```console
git clone git@github.com:rodw/stew.git
```
You can also [download a ZIP archive of the latest source](https://github.com/rodw/stew/archive/master.zip).
**Installing**
Once you have Stew cloned into a local working directory, you can use [npm](https://npmjs.org/) to install any build-time dependencies, as follows:
```console
npm install
```
(This may take a few minutes, as some external libraries may need to be downloaded and natively compiled.)
**Testing**
Once installed, you can also run Stew's unit test suite using npm:
```console
npm test
```
If everything is working properly, you should expect to see a message like `68 tests complete (633 ms)` (although the specific numbers might be different, of course).
**Compiling the CoffeeScript files into JavaScript**
You can run
```console
npm run-script compile
```
to generate JavaScript files from the CoffeeScript files in `./lib`.
### Using Make
If you have [GNU Make](http://www.gnu.org/software/make/) installed, the best and easiest way to work with Stew's source code is using the provided makefile.
#### Installing and Testing
You can use:
```console
make install
```
and:
```console
make test
```
and:
```console
make js
```
in place of the npm equivalents above, but the makefile can help you to do much more than that.
#### Generating Documentation
**`make markdown`** will generate Stew's HTML documention from various [Markdown](http://daringfireball.net/projects/markdown/) files in the repository. Most of these files will be written to the `./docs` directory. Note that the Makefile uses [Pandoc](http://johnmacfarlane.net/pandoc/) to generate HTML from the Markdown sources, but in theory other Markdown processors could be used.
**`make docco`** will generate an annotated version of Stew's source code using the nifty [Docco](http://jashkenas.github.io/docco/) documentation generator. These files will be written to `./docs/docco/`.
**`make docs`** will do both of these at once.
#### Test Coverage
**`make coverage`** will generate a report that shows which source code lines are touched (and not touched) by the test suite. This runs the same unit tests as `make test`, but uses [JSCoverage](http://siliconforks.com/jscoverage/) to evaluate the test coverage. The coverage report is written to `./docs/coverage.html`.
#### npm Packaging
**`make module`** will generate a package suitable for distribution via npm (into a directory called `./module`).
**`make test-module-install`** will generate the `./module` directory and then validate it by trying to install it into a temporary directory. You should expect to see `It worked!` as the last line of output.
#### Some other targets
**`make clean`** will remove various generated files.
**`make todo`** will display a list of "TODO" and related comments found in the source code.
**`make targets`** will list all available targets.