bquery
Version:
bquery is a useful node module to fetch web page, which use css selector to fetch and structure this html page content.
119 lines (82 loc) • 3 kB
Markdown
---
category: reference
heading: 'Web service'
---
This code reference aims to document the use of noodle as both a web service
and a node module.
With both web service and node module, the same key/value object maps are
used as queries to fetch and extract data from web documents.
noodle currently supports multiple web documents with an almost uniform
query syntax for grabbing data from the different types (html, json, feeds, xml).
noodle is ready to run as a web service from `bin/noodle-server`.
## Quick setup
Git method:
$ git clone https://github.com/dharmafly/noodle.git
$ cd noodle
$ npm install
$ bin/noodle-server
Server running on port 8888
NPM method:
$ npm install noodlejs
$ cd node_modules/noodlejs/
$ bin/noodle-server
Server running on port 8888
## GET or POST
The server supports a different ways for querying the service.
### JSONP
noodle supports JSONP if a `callback` parameter is supplied.
The query itself can be sent in the `q` parameter either as a url encoded
JSON blob or as a querystring serialised representation (`jQuery.param()`).
GET http://example.noodlejs.com?q={JSONBLOB}&callback=foo
### POST
noodle also supports a query sent as JSON in the POST body.
POST http://example.noodlejs.com
## Ratelimiting
The web service also provides rate limiting out of the box with
[connect-ratelimit](https://github.com/dharmafly/connect-ratelimit).
## Configuration
### Server port
The specify what port the noodle web service serves on just write it as the
first argument to the binary.
$ bin/noodle-server 9000
Server running on port 9000
### Behaviour settings
Various noodle settings like cache and ratelimit settings are exposed
and can be edited in `lib/config.json`.
{
// Setting to true will log out information to the
// terminal
"debug": true,
"resultsCacheMaxTime": 3600000,
"resultsCachePurgeTime": 60480000, // -1 will turn purging off
"resultsCacheMaxSize": 124,
"pageCacheMaxTime": 3600000,
"pageCachePurgeTime": 60480000, // -1 will turn purging off
"pageCacheMaxSize": 32,
// If no query type option is supplied then
// what should noodle assume
"defaultDocumentType": "html",
// How the noodle scraper identifies itself
// to scrape targets
"userAgent": "",
// Rate limit settings
// https://npmjs.org/package/connect-ratelimit#readme
"rateLimit": {
"whitelist": ["127.0.0.1", "localhost"],
"blacklist": [],
"catagories": {
"normal": {
"totalRequests": 1000,
"every": 3600000000
},
"whitelist": {
"totalRequests": 10000,
"every": 60000000
},
"blacklist": {
"totalRequests": 0,
"every": 0
}
}
}
}