rbql
Version: 
Rainbow Query Language
369 lines (281 loc) • 15.5 kB
Markdown
RBQL is both a library and a command line tool which provides SQL-like language with JavaScript expressions
## Table of Contents
1. [RBQL as browser library](#using-rbql-as-browser-library)
2. [RBQL as Node library](#using-rbql-as-node-library)
3. [RBQL as command line tool](#using-rbql-as-command-line-tool)
4. [RBQL language description](#language-description)
# Using RBQL as a browser library
## Installation:
In order to make RBQL work in browser as a library for your App you need just one single file: `rbql.js`
To get it you can either use npm:  
```
$ npm install rbql
```
Now you can just source rbql.js and it will work:  
```
<script src="rbql.js"></script>
```
## API description
The following two functions are avilable in the browser version:  
1. [rbql.query_table(...)](#rbqlquery_table)  
2. [rbql.query(...)](#rbqlquery)  
### rbql.query_table(...)
Run user query against input array of records and put the result set in the output array:  
```
async function query_table(user_query, input_table, output_table, output_warnings, join_table=null, input_column_names=null, join_column_names=null, output_column_names=null, normalize_column_names=true)
```
#### Parameters:  
* _user_query_: **string**  
  query that user of your app manually enters in some kind of input field.  
* _input_table_: **array**  
  an array with input records  
* _output_table_: **array**  
  an array where to output records would be pushed  
* _output_warnings_: **array**  
  Warnings will be stored here after the query completion. If no warnings - the array would be empty
* _join_table_: **array**  
  an array with join table records so that user can use join table B in input queries  
* _input_column_names_: **array**  
  Names of _input_table_ columns which users of the app can use in their queries
* _join_column_names_: **array**  
  Names of _join_table_ columns which users of the app can use in their queries
* _output_column_names_: **array**  
  Output column names will be stored in this array after the query completion.
* _normalize_column_names_: **boolean**  
  If set to true - column names provided with _input_column_names_ and _join_column_names_ will be normalized to "a" and "b" prefix forms e.g. "Age" -> "a.Age", "Sale price" -> "b['Sale price']".  
  If set to false - column names can be used in user queries "as is".  
### rbql.query(...)
Allows to run queries against any kind of structured data.  
You will have to implement special wrapper classes for your custom data structures and pass them to the `rbql.query(...)` function.  
```
async function query(user_query, input_iterator, output_writer, output_warnings, join_tables_registry=null)
```
#### Parameters:  
* _user_query_: **string**  
  query that user of your app manually enters in some kind of input field.  
* _input_iterator_:  **RBQLInputIterator**  
  special object which iterates over input records. E.g. over a remote table 
  Examples of classes which support **RBQLInputIterator** interface: **TableIterator**, **CSVRecordIterator** (these classes can be found in RBQL source code)
* _output_writer_:  **RBQLOutputWriter**  
  special object which stores output records somewhere. E.g. to an array  
  Examples of classes which support **RBQLOutputWriter** interface: **TableWriter**, **CSVWriter** (these classes can be found in RBQL source code)
* _output_warnings_: **array**  
  Warnings will be stored here after the query completion. If no warnings - the array would be empty
* _join_tables_registry_: **RBQLJoinTableRegistry**  
  special object which provides **RBQLInputIterator** iterators for join tables (e.g. table "B") which user can refer to in their queries.  
  Examples of classes which support **RBQLJoinTableRegistry** interface: **SingleTableRegistry**, **FileSystemCSVRegistry** (these classes can be found in RBQL source code)
## Usage:
#### "Hello world" web test in RBQL  
Very simple test to make sure that RBQL library works:  
```
<!DOCTYPE html>
<html><head>
<script src="../../rbql-js/rbql.js"></script>
<script>
    let output_table = [];
    let warnings = [];
    let error_handler = function(exception) {
        console.log('RBQL finished with error: ' + String(exception));
    }
    let success_handler = function() {
        console.log('warnings: ' + JSON.stringify(warnings));
        console.log('output table: ' + JSON.stringify(output_table));
    }
    rbql.query_table('select a2 + " test", a1 limit 2', [[1, 'foo'], [2, 'bar'], [3, 'hello']], output_table, warnings).then(success_handler).catch(error_handler);
</script>
<title>RBQL Generic Test</title>
</head><body>
<div><span>Open browser console</span></div>
</body></html>
```
Save the code above as `rbql_test.html`; put `rbql.js` in the same folder; open `rbql_test.html` in your browser and make sure that console output contains the expected result.  
#### "JSFiddle" demo test  
A little more advanced, but still very simple demo test with [JSFiddle](https://jsfiddle.net/mechatroner/kpuwc83x/)
It uses the same `rbql.js` script file.
# Using RBQL as Node library
## Installation:
```
$ npm install rbql
```
## API description
The following 3 functions are avilable in Node version:  
1. [rbql.query_csv(...)](#rbqlquery_csv)  
2. [rbql.query_table(...)](#rbqlquery_table) - identical to browser version
3. [rbql.query(...)](#rbqlquery) - identical to browser version
### rbql.query_csv(...)
Run user query against input_path CSV file and save it as output_path CSV file.  
```
async function rbql.query_csv(user_query, input_path, input_delim, input_policy, output_path, output_delim, output_policy, csv_encoding, output_warnings, with_headers=false, comment_prefix=null)
```
#### Parameters:
* _user_query_: **string**  
  query that user of your application manually enters in some kind of input field.  
* _input_path_: **string**  
  path of the input csv table  
* _input_delim_: **string**  
  field separator character in input table  
* _input_policy_: **string**  
  allowed values: `'simple'`, `'quoted'`  
  along with input_delim defines CSV dialect of input table. "quoted" means that separator can be escaped inside double quoted fields  
* _output_path_: **string**  
  path of the output csv table  
* _output_delim_: **string**  
  same as input_delim but for output table  
* _output_policy_: **string**  
  same as input_policy but for output table  
* _csv_encoding_: **string**  
  allowed values: `'binary'`, `'utf-8'`  
  encoding of input, output and join tables (join table can be defined inside the user query)  
* _output_warnings_: **array**  
  Warnings will be stored here after the query completion. If no warnings - the array would be empty
* _with_headers_: **boolean**  
  If set to `true` treat the first records in input (and join) file as header.
* _comment_prefix_: **string**  
  Treat lines starting with the prefix as comments and skip them.
## Usage:
#### Example of query_table() usage:  
```
const rbql = require('rbql')
let input_table = [
    ['Roosevelt',1858,'USA'],
    ['Napoleon',1769,'France'],
    ['Dmitri Mendeleev',1834,'Russia'],
    ['Jane Austen',1775,'England'],
    ['Hayao Miyazaki',1941,'Japan'],
];
let user_query = 'SELECT a1, a2 % 1000 WHERE a3 != "USA" LIMIT 3';
let output_table = [];
let warnings = [];
let error_handler = function(exception) {
    console.log('Error: ' + String(exception));
}
let success_handler = function() {
    console.log('warnings: ' + JSON.stringify(warnings));
    console.log('output table: ' + JSON.stringify(output_table));
}
rbql.query_table(user_query, input_table, output_table, warnings).then(success_handler).catch(error_handler);
```
#### Example of query_csv() usage:  
```
const rbql = require('rbql');
let user_query = 'SELECT a1, parseInt(a2) % 1000 WHERE a3 != "USA" LIMIT 5';
let error_handler = function(exception) {
    console.log('Error: ' + String(exception));
}
let warnings = [];
let success_handler = function() {
    if (warnings.length)
        console.log('warnings: ' + JSON.stringify(warnings));
    console.log('output table: output.csv');
}
rbql.query_csv(user_query, 'input.csv', ',', 'quoted', 'output.csv', ',', 'quoted', 'utf-8', warnings).then(success_handler).catch(error_handler);
```
You can also check rbql-js cli app code as a usage example: [rbql-js cli source code](https://github.com/mechatroner/RBQL/blob/master/rbql-js/cli_rbql.js)  
# Using RBQL as command line tool
### Installation:
To use RBQL as CLI app you need to install it in global (-g) mode:  
```
$ npm install -g rbql
```
### Usage (non-interactive mode):
```
$ rbql-js --query "select a1, a2 order by a1" < input.tsv
```
### Usage (interactive mode):
In interactive mode rbql-js will show input table preview so it is easier to type SQL-like query.  
```
$ rbql-js --input input.csv --output result.csv
```
# Language description
### Main Features
* Use JavaScript expressions inside _SELECT_, _UPDATE_, _WHERE_ and _ORDER BY_ statements
* Supports multiple input formats
* Result set of any query immediately becomes a first-class table on its own
* No need to provide FROM statement in the query when the input table is defined by the current context.
* Supports all main SQL keywords
* Supports aggregate functions and GROUP BY queries
* Supports user-defined functions (UDF)
* Provides some new useful query modes which traditional SQL engines do not have
* Lightweight, dependency-free, works out of the box
#### Limitations:
* RBQL doesn't support nested queries, but they can be emulated with consecutive queries
* Number of tables in all JOIN queries is always 2 (input table and join table), use consecutive queries to join 3 or more tables
### Supported SQL Keywords (Keywords are case insensitive)
* SELECT
* UPDATE
* WHERE
* ORDER BY ... [ DESC | ASC ]
* [ LEFT | INNER ] JOIN
* DISTINCT
* GROUP BY
* TOP _N_
* LIMIT _N_
* AS
All keywords have the same meaning as in SQL queries. You can check them [online](https://www.w3schools.com/sql/default.asp)  
### RBQL variables
RBQL for CSV files provides the following variables which you can use in your queries:
* _a1_, _a2_,..., _a{N}_  
   Variable type: **string**  
   Description: value of i-th field in the current record in input table  
* _b1_, _b2_,..., _b{N}_  
   Variable type: **string**  
   Description: value of i-th field in the current record in join table B  
* _NR_  
   Variable type: **integer**  
   Description: Record number (1-based)  
* _NF_  
   Variable type: **integer**  
   Description: Number of fields in the current record  
* _a.name_, _b.Person_age_, ... _a.{Good_alphanumeric_column_name}_  
   Variable type: **string**  
   Description: Value of the field referenced by it's "name". You can use this notation if the field in the header has a "good" alphanumeric name  
* _a["object id"]_, _a['9.12341234']_, _b["%$ !! 10 20"]_ ... _a["Arbitrary column name!"]_  
   Variable type: **string**  
   Description: Value of the field referenced by it's "name". You can use this notation to reference fields by arbitrary values in the header
### UPDATE statement
_UPDATE_ query produces a new table where original values are replaced according to the UPDATE expression, so it can also be considered a special type of SELECT query.
### Aggregate functions and queries
RBQL supports the following aggregate functions, which can also be used with _GROUP BY_ keyword:  
_COUNT_, _ARRAY_AGG_, _MIN_, _MAX_, _ANY_VALUE_, _SUM_, _AVG_, _VARIANCE_, _MEDIAN_  
Limitation: aggregate functions inside JavaScript expressions are not supported. Although you can use expressions inside aggregate functions.  
E.g. `MAX(float(a1) / 1000)` - valid; `MAX(a1) / 1000` - invalid.  
There is a workaround for the limitation above for _ARRAY_AGG_ function which supports an optional parameter - a callback function that can do something with the aggregated array. Example:  
`SELECT a2, ARRAY_AGG(a1, v => v.sort().slice(0, 5)) GROUP BY a2`
### JOIN statements
Join table B can be referenced either by its file path or by its name - an arbitrary string which the user should provide before executing the JOIN query.  
RBQL supports _STRICT LEFT JOIN_ which is like _LEFT JOIN_, but generates an error if any key in the left table "A" doesn't have exactly one matching key in the right table "B".  
Table B path can be either relative to the working dir, relative to the main table or absolute.  
Limitation: _JOIN_ statements can't contain JavaScript expressions and must have the following form: _<JOIN\_KEYWORD> (/path/to/table.tsv | table_name ) ON a... == b... [AND a... == b... [AND ... ]]_
### SELECT EXCEPT statement
SELECT EXCEPT can be used to select everything except specific columns. E.g. to select everything but columns 2 and 4, run: `SELECT * EXCEPT a2, a4`  
Traditional SQL engines do not support this query mode.
### UNNEST() operator
UNNEST(list) takes a list/array as an argument and repeats the output record multiple times - one time for each value from the list argument.  
Example: `SELECT a1, UNNEST(a2.split(';'))`  
### LIKE() function
RBQL does not support LIKE operator, instead it provides "like()" function which can be used like this:
`SELECT * where like(a1, 'foo%bar')`
### WITH (header) and WITH (noheader) statements
You can set whether the input (and join) CSV file has a header or not using the environment configuration parameters which could be `--with_headers` CLI flag or GUI checkbox or something else.
But it is also possible to override this selection directly in the query by adding either `WITH (header)` or `WITH (noheader)` statement at the end of the query.
Example: `select top 5 NR, * with (header)`
### User Defined Functions (UDF)
RBQL supports User Defined Functions  
You can define custom functions and/or import libraries in a special file: `~/.rbql_init_source.js`
## Examples of RBQL queries
* `SELECT TOP 100 a1, a2 * 10, a4.length WHERE a1 == "Buy" ORDER BY parseInt(a2) DESC`
* `SELECT a.id, a.weight / 1000 AS weight_kg`
* `SELECT * ORDER BY Math.random()` - random sort
* `SELECT TOP 20 a.vehicle_price.length / 10, a2 WHERE parseInt(a.vehicle_price) < 500 && ["car", "plane", "boat"].indexOf(a['Vehicle type']) > -1 limit 20` - referencing columns by names from header
* `UPDATE SET a3 = 'NPC' WHERE a3.indexOf('Non-playable character') != -1`
* `SELECT NR, *` - enumerate records, NR is 1-based
* `SELECT a1, b1, b2 INNER JOIN ./countries.txt ON a2 == b1 ORDER BY a1, a3` - example of join query
* `SELECT MAX(a1), MIN(a1) WHERE a.Name != 'John' GROUP BY a2, a3` - example of aggregate query
* `SELECT ...a1.split(':')` - Using JS "destructuring assignment" syntax to split one column into many. Do not try this with other SQL engines!
### References
* [RBQL: Official Site](https://rbql.org/)
RBQL is integrated with Rainbow CSV extensions in [Vim](https://github.com/mechatroner/rainbow_csv), [VSCode](https://marketplace.visualstudio.com/items?itemName=mechatroner.rainbow-csv), [Sublime Text](https://packagecontrol.io/packages/rainbow_csv) editors.
* [RBQL in PyPI](https://pypi.org/project/rbql/): `$ pip install rbql`
* Rainbow CSV extension with integrated RBQL in [Visual Studio Code](https://marketplace.visualstudio.com/items?itemName=mechatroner.rainbow-csv)  
* Rainbow CSV extension with integrated RBQL in [Vim](https://github.com/mechatroner/rainbow_csv)  
* Rainbow CSV extension with integrated RBQL in [Sublime Text 3](https://packagecontrol.io/packages/rainbow_csv)