UNPKG

nsyslog

Version:

Modular new generation log agent. Reads, transform, aggregate, correlate and send logs from sources to destinations

117 lines (99 loc) 3.53 kB
## Multilang Processor Multilang processors allows the use of [Apache Storm Multilang](http://storm.apache.org/releases/1.1.2/Multilang-protocol.html) protocol to call external components for data processing (Apache Storm Bolts). This way, it's possible to create external scripts in any language that process the data in an asynchronous, parallell and/or multi-core way. ## Examples Use of a external script written in NodeJS. Will spawn 4 processes that process data in a round-robin fashion (shuffle). ```json "processors" : { "tokenize" : { "type" : "multilang", "config" : { "path" : "node multilang/js/tokenize.js", "cores" : 4, "wire" : "shuffle", "module" : false, "input" : "${tuple}", "output" : "tuple", "options" : { "max" : 10 } } } } ``` Use of a module script written in NodeJS. Will spawn 2 processes that process data grouped by the *filename* property. ```json "processors" : { "tokenize" : { "type" : "multilang", "config" : { "path" : "multilang/js/tokenize.js", "cores" : 2, "wire" : "group", "module" : false, "input" : "${tuple}", "output" : "tuple", "field" : "${filename}", "options" : { "max" : 10 } } } } ``` ## Configuration parameters * **path** : Command line of the process to execute, or file path if *module* mode is used. * **cores** : Number of parallell instances to be run * **wire** : Can be either *shuffle* or *group*. When *shuffle* is used, each data object will be sent randomly to one of the instanced processes. Alternatively, when *group* is used, all objects with the same *field* value will be sent to the same process instance. * **module** : Only available if the script is written in NodeJS and exports a Bolt component. When *true*, *path* parameter only specifies the script path, and, instead of spawn new processes, multiple bolt instances are created in the main process. * **input** : Expression used to access a tuple array in the entry data. Input data for multilang components mus be a flat array of values. * **output** : Output field for the multilang component. * **field** : Expression used when *group* mode is used. * **options** : JSON object passed to configure the multilang component. ## Multilang component examples: ```javascript const {BasicBolt} = require('./storm'); class SplitBolt extends BasicBolt { constructor() { super(); } initialize(conf, context, callback) { // Configuration received from the config file this.max = conf.max; callback(); } process(tup, done) { // Split the first tuple value var words = tup.values[0].split(" ").splice(0,this.max); // For each splitted word, emit a new tuple words.forEach((word) => { this.emit( {tuple: [word], anchorTupleId: tup.id}, (taskIds)=>{ this.log(word + ' sent to task ids - ' + taskIds); } ); }); // Ack the tuple without errors done(); } } // Export module so can be used internally without needing to // spawn a new process if(module.parent) { module.exports = SplitBolt; } else { new SplitBolt().run(); } ``` ```python import storm class SplitSentenceBolt(storm.BasicBolt): def process(self, tup): words = tup.values[0].split(" ") for word in words: storm.emit([word]) SplitSentenceBolt().run() ``` You can see more examples on [github](https://github.com/solzimer/nsyslog/tree/master/multilang)