text-diff
Version:
Compute a diff between two pieces of text with support for same-line diffs, and optionally display a visual diff as HTML. Module made from the text-diff-match-patch library.
67 lines (37 loc) • 4.51 kB
Markdown
# TextDiff
## JavaScript diff library with support for visual, HTML-formatted output
This repository contains the diff functionality of the [google-diff-match-patch library](http://code.google.com/p/google-diff-match-patch/) by Neil Fraser, turned into a node module which is suitable for `require`ing into projects.
## [Demo](http://neil.fraser.name/software/diff_match_patch/svn/trunk/demos/demo_diff.html)
## Example usage
```javascript
var Diff = require('text-diff');
var diff = new Diff(); // options may be passed to constructor; see below
var textDiff = diff.main('text1', 'text2'); // produces diff array
diff.prettyHtml(textDiff); // produces a formatted HTML string
```
### Initialization options
Arguments may be passed into the `Diff` constructor in the form of an object:
- `timeout`: Number of seconds to map a diff before giving up (0 for infinity).
- `editCost`: Cost of an empty edit operation in terms of edit characters.
Example initialization with arguments: `var diff = new Diff({ timeout: 2, editCost: 6 });`
## Documentation
The API documentation below has been modified from the [original API documentation](https://code.google.com/p/google-diff-match-patch/wiki/API).
### Initialization
The first step is to create a new diff object (see example above). This object contains various properties which set the behaviour of the algorithms, as well as the following methods/functions:
### main(text1, text2) => diffs
An array of differences is computed which describe the transformation of text1 into text2. Each difference is an array. The first element specifies if it is an insertion (1), a deletion (-1) or an equality (0). The second element specifies the affected text.
```javascript
main("Good dog", "Bad dog") => [(-1, "Goo"), (1, "Ba"), (0, "d dog")]
```
Despite the large number of optimisations used in this function, diff can take a while to compute. The `timeout` setting is available to set how many seconds any diff's exploration phase may take (see "Initialization options" section above). The default value is 1.0. A value of 0 disables the timeout and lets diff run until completion. Should diff time out, the return value will still be a valid difference, though probably non-optimal.
### cleanupSemantic(diffs) => null
A diff of two unrelated texts can be filled with coincidental matches. For example, the diff of "mouse" and "sofas" is `[(-1, "m"), (1, "s"), (0, "o"), (-1, "u"), (1, "fa"), (0, "s"), (-1, "e")]`. While this is the optimum diff, it is difficult for humans to understand. Semantic cleanup rewrites the diff, expanding it into a more intelligible format. The above example would become: `[(-1, "mouse"), (1, "sofas")]`. If a diff is to be human-readable, it should be passed to cleanupSemantic.
### cleanupEfficiency(diffs) => null
This function is similar to `cleanupSemantic`, except that instead of optimising a diff to be human-readable, it optimises the diff to be efficient for machine processing. The results of both cleanup types are often the same.
The efficiency cleanup is based on the observation that a diff made up of large numbers of small diffs edits may take longer to process (in downstream applications) or take more capacity to store or transmit than a smaller number of larger diffs. The `diff.EditCost` property sets what the cost of handling a new edit is in terms of handling extra characters in an existing edit. The default value is 4, which means if expanding the length of a diff by three characters can eliminate one edit, then that optimisation will reduce the total costs.
### levenshtein(diffs) => int
Given a diff, measure its Levenshtein distance in terms of the number of inserted, deleted or substituted characters. The minimum distance is 0 which means equality, the maximum distance is the length of the longer string.
### prettyHtml(diffs) => html
Takes a diff array and returns a string of pretty HTML. Deletions are wrapped in `<del></del>` tags, and insertions are wrapped in `<ins></ins>` tags. Use CSS to apply styling to these tags.
## Tests
Tests have not been ported to this library fork, however tests are [available in the original library](https://github.com/liddiard/google-diff-match-patch/tree/master/javascript). If you would like to port tests over, you will need to do some function call renaming (viz. the `diff_` prefix has been removed from functions in the fork) and remove tests specific to the "patch" and "match" functionalities of the original library.