@sugarcube/plugin-tika
Version:
Parse files and metadata using Tika.
63 lines (41 loc) • 1.61 kB
Markdown
# `@sugarcube/plugin-tika`
Use the [Apache Tika](https://tika.apache.org/) toolkit to detect and extract
metadata and text from over a thousand different file types.
## Installation
```shell
npm install --save @sugarcube/plugin-tika
```
To use this plugin you need as well Java installed.
## Plugins
### `tika_parse`
Parse a list of file specified by the query type `glob_pattern`.
```shell
sugarcube -Q glob_pattern:files/**/*.pdf -p tika_parse
```
### `tika_links`
This plugin iterates over all links in `_sc_media` and fetches the text and
meta data for this link. This plugin ignores any errors that the fetch might
throw.
### `tika_location`
This plugin parses any location specified using the `tika_location_field`
query type. This fetches the text and meta data of e.g. a url inside the unit.
```shell
sugarcube -Q google_search:Keith\ Johnstone \
-Q tika_location_field:href \
-p google_search,tika_location
```
The text and meta data are added into the `_sc_media` collection and placed
directly on the unit as well, e.g. if the location field is `href`, the
`href_text` and `href_meta` fields are added to the unit.
### `tika_export`
Export the text and meta data that `tika_location` parses to a file.
```shell
sugarcube -Q google_search:Keith\ Johnstone \
-p google_search,tika_location,tika_export \
--tika.location_field href
```
**Configuration Options**:
- `tika.data_dir`: Specify the target directory where to store all
files. Defaults to `./data/tika_location`.
## License
[GPL3](./LICENSE) @ [Christo](christo@cryptodrunks.net)