UNPKG

elasticsearch-watchdog

Version:

A watchdog of elasticsearch - cluster nodes' statuses monitor, auto restart, keep PRIMARY node unique.

229 lines (164 loc) 6.61 kB
elasticsearch-watchdog[![NPM version](https://badge.fury.io/js/elasticsearch-watchdog.svg)](http://badge.fury.io/js/elasticsearch-watchdog) [![Build Status](https://travis-ci.org/Tjatse/elasticsearch-watchdog.svg?branch=master)](https://travis-ci.org/Tjatse/elasticsearch-watchdog) ====================== A watchdog of elasticsearch - cluster nodes' statuses monitor, auto restart, keep PRIMARY node unique. In my situation, millions data are indexed to ElasticSearch everyday, and our cluster has too many nodes, we spent a lot of time to make it stable and reliable, but unfortunately, they crash every few months due to: - Status changes to `red` or `grey`. - Different primary nodes but not a unique one (like autocephaly). - Unresponsive (HTTP timeout, shake failed and all that stuff). - Other issues. # What Can Watchdog Do - Monitor statuses/healths/states of ElasticSearch cluster/node. - Auto restart ElasticSearch through openSSH. - Quick look of Watchdog statuses any where, especially on mobile device. - Make every day is Sunday. # Installation ``` $ npm install elasticsearch-watchdog -g ``` # Usage **watchdog** ```bash Usage: watchdog [cmd] [file|name] Commands: pwd <password> encrypt the password encrypt [options] <file> encrypt the configuration file and save it to disk tmpl <name> render a configuration template start [options] <file> start watching on an ElasticSearch cluster stop <uid> stop watching by `uid`, all the watchdogs will be killed if `uid` is `all` restart <uid> restart watching by `uid`, call the watchdogs back and then send them out for watching again if `uid` is `all` ls [options] list all the watchdogs we have web [port] launch a web GUI, port default by 8088 Options: -h, --help output usage information -v, --version output the version number -r, --root the root location, you can find all logs here. Basic Examples: Start a watchdog, by file: $ watchdog start watchdog.yml Restart the alive watchdog, by uid: $ watchdog restart 1001 Restart all watchdogs: $ watchdog restart all Stop the watchdog, by uid: $ watchdog stop 1001 Stop all the watchdogs: $ watchdog stop all ``` **encrypt** ```bash Usage: encrypt [options] <file> Options: -h, --help output usage information --no-blank remove the blank line if this option is provided ``` **tmpl** ```bash $ watchdog tmpl <file> ``` `<file>` is the name of configuration file, `.yml` is optional, i.e. `$ watchdog tmpl es-server` and `$ watchdog tmpl es-server.yml` are both fine. **start** ```bash Usage: start [options] <file> Options: -h, --help output usage information --no-daemon running watchdog as a service, otherwise in the terminal -m, --max <number> maximize retry count when dog has died ``` **stop** ```bash $ watchdog stop <uid> ``` All the watchdogs will be killed if uid is `all`. Head over to [Printf](#printf) to get more information about uid. **restart** ```bash $ watchdog restart <uid> ``` All the watchdogs will be called back and then sent out for watching if name is `all`. Head over to [Printf](#printf) to get more information about uid. **ls** ```bash Usage: ls [options] Options: -h, --help output usage information --no-format print list as JSON without formatting ``` **web** ```bash # simple $ watchdog web [port] # daemonic # start $ nohup watchdog web > /dev/null 2>&1 & echo $! > /path/to/watchdog.pid # stop $ kill -9 `cat /path/to/watchdog.pid` ``` Port of web interface is optional (8088 by default). In order to have a perfect viewport, using your mobile device in a landscape mode, but not portrait. GUI: ![image](screenshots/web.png) And a restful interface is providing yet, i.e.:`http://[domain|ip]:[port]/json`. ## Printf Take an example for `$ watchdog ls`, the output will be formatted like following. ![image](screenshots/output.png) - **name** `CLUSTER-SERVER` and `PERCOLATOR-SERVER` are names of the Watchdog. - **uid** `7707` and `6384` are uids of the Watchdogs, run `$ watchdog stop 7707` or `$ watchdog restart 7707` to do a `stop/restart` operation. - **colors** `red`, `yellow`, `grey` and `green` are the statuses of ElasticSearch. - **symbols** `★` means primary node, `✩` means leaves (not master nodes). - **dim style** - `UNKNOWN [missing status]` / `192.168.100.112 [unknown]` It means unknown primary node, and can not get the status through `_cluster/health` / `_cluster/state` API. - `192.168.100.166 [error]` It means can not connect to server through openSSH, and you'd better check the logs (`~/.watchdog/logs/`). ## Programmatic ```javascript var Watchdog = require('watchdog'); // load configuration. var monit = Watchdog({ conf: '/path/to/conf.yml', uid: false }); // listen events. monit.on('info', function(msg){ console.log('[INFO]', msg.type, msg.message); }); // start watching. monit.watching(); // end it. // monit.end(); ``` # Configuration Execute `$ watchdog tmpl my-es` to render a copy one, edit it to meet the individual requirements. BTW, it almost supports all the YAML syntaxs. - [Fully](tmpl/watchdog.yml) - [Local](example/confs/local.yml) - [Single](example/confs/single.yml) - [Cluster1](example/confs/cluster1.yml) - [Cluster2](example/confs/cluster2.yml) In order to restart ElasticSearch smoothly, if you have ElasticSearch running then stop the process and start it using: ```bash $ elasticsearch -d -p /path/to/es.pid [options] ``` **Local environment** If you're running Watchdog and ElasticSearch on a same server, get the IP address by visit: ``` http://localhost:9200/_cluster/state ``` The `transport_address` of current server is which you're binding to ElasticSearch, and there is no need to provide `nodes.ssh.password` in configuration for it. # Examples Head over to `example` or `test` directories. # Test ```bash $ npm test ``` # License Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.