ccgjs
Version:
A combinatory categorial grammar (CCG) library for the web.
247 lines (187 loc) • 5.58 kB
Markdown
A combinatory categorial grammar (CCG) library for the web.
**NOTE**: Work-in-progress could be found on the [development][1] branch.
## Requirements
- Node.js ^= 12.18.2
- NPM ^= 6.14.7
Run `npm install` and we are all set.
Include this library on your HTML file.
```html
<script type="text/javascript" src="/path/to/ccgjs"></script>
```
Replace `/path/to/ccgjs` with the CCGjs library URL.
Then, use it:
```html
<script type="text/javascript">
const { CCG } = ccgjs;
// do something
</script>
```
See `examples/index.html` as reference.
Read and then parse machine-readable CCG derivation into a JavaScript object.
Usage:
```typescript
const str = '(<T S 0 2> (<L S/NP PSP PSP Hi S/NP>) (<L NP NNP NNP Wisnu NP>))';
const reader = new CCG.Reader(str);
if (reader.read()) {
console.log(reader.result);
}
```
The returned object looks like this:
```
{
node: {
type: 'T',
ccgCat: 'S',
head: 0,
dtrs: 2,
},
left: {
node: {
type: 'L',
ccgCat: 'S/NP',
modPOSTag: 'PSP',
origPOSTag: 'PSP',
word: 'Hi',
predArgCat: 'S/NP',
},
},
right: {
node: {
type: 'L',
ccgCat: 'NP',
modPOSTag: 'NNP',
origPOSTag: 'NNP',
word: 'Wisnu',
predArgCat: 'NP',
},
},
}
```
We uses PEG.js to build the parser.
The parsing expression grammar could be found on the `src/ccg.pegjs` file.
As for the generated parser, it could be found on the `src/generated.pegjs.ts`
file.
Run `npm run pegjs` to generate the `.pegjs` file into `.ts` file.
Construct a JavaScript tree object based on the parsed machine-readable CCG
derivation via `CCG.Reader`. It will also building useful metadata for later
use.
Usage:
```typescript
const str = '(<T S 0 2> (<L S/NP PSP PSP Hi S/NP>) (<L NP NNP NNP Wisnu NP>))';
const tree = new CCG.Tree(str);
console.log(tree);
```
The returned object looks like this:
```
Tree {
metadata: {
isParsed: true,
sentence: 'Hi Wisnu',
words: [ 'Hi', 'Wisnu' ],
ccgCats: [ 'S/NP', 'NP' ],
height: 2,
nodes: [ [Object], [Object], [Object] ]
},
mappedIndexedWords: { '0': { value: [Object] }, '1': { value: [Object] } },
root: {
value: { type: 'T', ccgCat: 'S', head: 0, dtrs: 2 },
left: { value: [Object] },
right: { value: [Object] }
}
}
```
For more information about the omitted `[Object]`,
see `CCG.TreeTypes.Metadata`, `CCG.TreeTypes.IndexedWordMapper`, and
`CCG.TreeTypes.Node`.
We can also turn the tree back into machine-readable CCG derivation by doing
`tree.toString()`. The returned `string` will be:
```
(<T S 0 2> (<L S/NP PSP PSP Hi S/NP>) (<L NP NNP NNP Wisnu NP>))
```
It is possible to get the structured CCG derivation based on the
`CCG.TreeTypes.Node` simply by doing `tree.buildDerivations()`.
The returned `Array<Array<CCG.TreeTypes.Derivation>>` will be:
```
[
[
{ from: 0, to: 0, ccgCat: 'S/NP' },
{ from: 1, to: 1, ccgCat: 'NP' }
],
[ { from: 0, to: 1, ccgCat: 'S', opr: '>' } ]
]
```
How to read?
In the `CCG.TreeTypes.Metadata`, we may find `words` key. In this example,
it will be `['Hi', 'Wisnu']`. Meaning that word `Hi` is at `0` index and
word `Wisnu` is at `1` index. We may read it as:
```
Hi Wisnu
------ ---------
S/NP NP
--------------->
S
```
Render and manipulate `CCG.Tree` as a DOM (document object model) directly on
the browser. Currently, there is only one method available.
Usage:
```typescript
const str = [
'(<T Sf 1 2>',
'(<T NP 0 2>',
'(<L NP NNP NNP raam NP>)',
'(<L NP\\NP PSP PSP ne NP\\NP>))',
'(<T Sf\\NP 1 2>',
'(<T NP 0 2>',
'(<L NP NNP NNP mohan NP>)',
'(<L NP\\NP PSP PSP ko NP\\NP>))',
'(<T (Sf\\NP)\\NP 1 2>',
'(<T NP 1 2>',
'(<L NP/NP JJ JJ niilii NP/NP>)',
'(<L NP NN NN kitaab NP>))',
'(<L ((Sf\\NP)\\NP)\\NP VM VM dii ((Sf\\NP)\\NP)\\NP>))))',
].join(' ');
const dom = new CCG.DOM(str);
const table = dom.createTable();
// apply it directly
document.body.appendChild(table);
// or take the HTML string
console.log(table.outerHTML);
```
How it looks like?
Take a look on this [JS Bin](https://jsbin.com/woqazik/3/edit?html,output)!
There are a lot of things to do.
The goal of this project is to enable interactive CCG derivation manipulation
directly on the web browser. By _manipulation_ we meant, as an example, the
ability to (interactively) create, edit, and delete a CCG node from its tree.
The direction of this project should be clear by now.
## Contributing
Please refrain to contribute for the time being until this project officially
released. We will add `CONTRIBUTING.md` after we are ready.
Both issues and pull requests will be ignored.
## Acknowledgements
This JavaScript library is part of my undergraduate thesis.
Hence, I would like to thank my supervisors
([@aromadhony][2] and [@saidalfaraby][3]) for their advice and guidance.
## References
Hockenmaier, J., & Steedman, M. (2007). CCGbank: A corpus of CCG derivations
and dependency structures extracted from the Penn Treebank.
_Computational Linguistics_, _33_(3), 355–396.
Ambati, B.R., Deoskar, T. & Steedman, M. Hindi CCGbank: A CCG treebank
from the Hindi dependency treebank.
_Lang Resources & Evaluation_ **52**, 67–100 (2018).
## License
Licensed under the [MIT License](LICENSE).
[1]: https://github.com/wisn/ccgjs/tree/development
[2]: https://github.com/aromadhony
[3]: https://github.com/saidalfaraby