UNPKG

supergroup

Version:

Nested groups on arrays of objects where groups are Strings that know what you want them to know about themselves and their relatives.

342 lines (276 loc) 16.3 kB
<!DOCTYPE html> <html> <head> <meta charset="utf-8"/> <meta name="layout" content="post"/> <title>&quot;supergroup&quot;</title> <meta name="comments" content="true"/> <meta name="categories" content="[repo]"/> <meta name="source" content="https://github.com/Sigfried/supergroup"/> <link type="text/css" rel="stylesheet" href="./style.css"/> <link type="text/css" rel="stylesheet" href="./assets/prism.css"/> </head> <body> <script src="https://cdnjs.cloudflare.com/ajax/libs/underscore.js/1.8.2/underscore.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.5/d3.js"></script> <script src="https://rawgit.com/Sigfried/supergroup/master/supergroup.js"></script> <!-- more --> <div> <section> <h1 id="supergroup.js">Supergroup.js</h1> <p>Supergroup brings extreme convenience and understandability to the manipulation of Javascript data collections, especially in the context of D3.js visualization programming.</p> <p>As if in submission to the great programmers commandment&#8211;<em>Don&#8217;t Repeat Yourself</em>&#8211;every time I find myself writing a piece of code that solves basically the same problem I&#8217;ve solved a dozen times before, a little piece of my soul dies.</p> <p>Utilities for grouping record collections into maps or nests abound: <a href="https://github.com/mbostock/d3/wiki/Arrays#-nest">d3.nest</a>, <a href="https://github.com/mbostock/d3/wiki/Arrays#associative-arrays">d3.map</a>, <a href="http://underscorejs.org/#groupBy">Underscore.groupBy</a>, <a href="https://github.com/iros/underscore.nest">Underscore.Nest</a>, to name a few. But after these tools relieve us of a certain amount of repetitive stress, we&#8217;re often left with a tangle of hairy details that fill us with a dreadful sense of deja vu. Supergroup may seem like the kind of tacky wonder gadget you&#8217;d find on a late-night Ronco ad, but, for the low, low price of free, it makes data-centric Javascript programming fun again. <strong>And</strong>, when you find yourself in a D3.js callback routine holding a datum object that might have come from anywhere&#8211;for instance, with a tooltip callback used on disparate object types&#8211;everything you want to know about your object and its associated metadata and records is right there at your fingertips.</p> <p>Just to be clear about the problem&mdash;you start with tabular data from a CSV file, a SQL query, or some AJAX call: <p><span class="iframe">Some very fake hospital data in a CSV file&#8230;</span> <iframe width="100%" height="70px" src="examples/examples.html?data"> </iframe></p></p> <p><span class="iframe">...turned into canonical array of Objects (using d3.csv, for instance)</span> <iframe width="100%" height="80px" src="examples/examples.html?json"> </iframe></p> <p>Without Supergroup, you&#8217;d group the records on the values of one or more fields with a standard grouping function, giving you data like:</p> <p><span class="iframe">d3.nest().key(function(d) { return d.Physician; }).key(function(d) { return d.Unit; }).map(data)</span> <iframe width="100%" height="150px" src="examples/examples.html?d3map"> </iframe></p> <p><span class="iframe">d3.nest().key(function(d) { return d.Physician; }).key(function(d) { return d.Unit; }).entries(data)</span> <iframe width="100%" height="150px" src="examples/examples.html?d3nest"> </iframe></p> <p>To my mind, these are awkward data structures (not to mention the awkwardness of the calling functions.) The <code>map</code> version looks ok in the console, but D3 wants data in arrays, not as objects. The <code>entries</code> version gives us arrays of key/value pairs, but on upper levels <code>values</code> is another array of key/value pairs while on the bottom level <code>values</code> is an array of records. In both <code>entries</code> and <code>map</code>, you can&#8217;t tell from a node at any level what dimension was being grouped at that level. </p> <p>Supergroup gives you almost everything you&#8217;d want for every item in your nest (or in your single array if you have a one-level grouping):</p> <ul> <li>An array of the values grouped on (could be strings, numbers, or dates) (<a href="#sgphysunit">example</a>)</li> <li>The records associated with each group (<a href="#records">example</a>)</li> <li>Information about the values at any level <ul> <li>Parent group if any</li> <li>Immediate child groups if any</li> <li>All descendant groups</li> <li>Only descendant groups at the leaf level</li> <li>Aggregate calculations on records for that group and its descendants</li> <li>Path of group names from root to current group</li> <li>Path of group dimension names from root to current group</li> </ul></li> <li>Information about the groupings at any level</li> <li>For a group at any level, the name of the dimension (attribute, column, property, etc.) grouped on</li> <li>Any of these in a format D3 or some other tool expects</li> </ul> <h2 id="supergroup">Supergroup</h2> <p><code class="language-javascript"> var foo = bar; </code></p> <p>Works as an Underscore (or Lo-Dash) mixin: </p> <pre class="language-markup" data-src="mixin_example.html"></pre> <h2 id="aplainarrayofstringsenhancedwithchildrenandrecords">A plain Array of Strings, enhanced with children and records</h2> <p><code>_.supergroup(data, fieldname)</code> returns an array whose elements are the distinct values of <code>&lt;fieldname&gt;</code> in the original data records. These elements, or Values can be String or Number objects (Dates to be implemented eventually). Each Value holds a <code>.records</code> property which is an array containing the subset of original records matching that Value.</p> <p>In the example below we do a multi-level grouping by Physician and Unit. So <code>sg = _.supergroup(data,['Physician','Unit'])</code> returns a list of physicians (the top-level grouping). The first item in this list, <code>sg[0]</code>, is &#8220;Adams&#8221;, a String object. <code>sg[0].records</code> is an array containing the records where Physician=&#8220;Adams&#8221;. <code>sg[0].children</code> is a list of the Units (our second-level grouping) in the records where Physician=&#8220;Adams&#8221;. <code>sg[0].children[0].records</code> would be the subset of records where Physician=&#8220;Adams&#8221; and Unit=&#8220;preop&#8221;.</p> <p><a id='sgphysunit'></a> <p><span class="iframe">Supergroup on physician and unit</span> <iframe width="100%" height="400px" src="examples/examples.html?sgphysunit"> </iframe></p></p> <p>It does a bunch more I still need to document.</p> <hr/> <h2 id="everythingbelowisolddocumentationimtryingtoreplace">Everything below is old documentation I&#8217;m trying to replace</h2> <pre><code class="json Some records loaded from a CSV or JSON file">var gradeBook = [ {lastName: &quot;Gold&quot;, firstName: &quot;Sigfried&quot;, class: &quot;Remedial Programming&quot;, grade: &quot;C&quot;, num: 2}, {lastName: &quot;Gold&quot;, firstName: &quot;Sigfried&quot;, class: &quot;Literary Posturing&quot;, grade: &quot;B&quot;, num: 3}, {lastName: &quot;Gold&quot;, firstName: &quot;Sigfried&quot;, class: &quot;Documenting with Pretty Colors&quot;, grade: &quot;B&quot;, num: 3}, {lastName: &quot;Sassoon&quot;, firstName: &quot;Sigfried&quot;, class: &quot;Remedial Programming&quot;, grade: &quot;A&quot;, num: 3}, {lastName: &quot;Androy&quot;, firstName: &quot;Sigfried&quot;, class: &quot;Remedial Programming&quot;, grade: &quot;B&quot;, num: 3} ]; </code></pre> <pre><code class="javascript Grouping on one dimension">var byLastName = _.supergroup(gradeBook, &quot;lastName&quot;); // an Array of Strings: [&quot;Gold&quot;,&quot;Sassoon&quot;,&quot;Androy&quot;] byLastName[0].records; // Array of Sigfried Gold's original 3 records byLastName.rawValues(); // Array of native strings (easier to look at or use in contexts where you need a plain string) </code></pre> <pre><code class="javascript Grouping by a calculated value">var byName = _.supergroup(gradeBook, function(d) { return d.firstName + ' ' + d.lastName; }); // an Array of Strings: [&quot;Sigfried Gold&quot;,&quot;Sigfried Sassoon&quot;,&quot;Sigfried Androy&quot;] </code></pre> <pre><code class="javascript It's a native Array, but you can treat it as map, and then do cool stuff. Here's a GPA:">byName.lookup(&quot;Sigfried Gold&quot;).records.pluck(&quot;num&quot;).mean(); // 2.6666666666666665 </code></pre> <p>The above example shows how Supergroup can chain Underscore methods (and mixins), functionality it gets from <a href="../underscore-unchained">underscore-unchained</a>.</p> <pre><code class="javascript Grouping hierarchically">var byClassGrade = _.supergroup(gradeBook, [&quot;class&quot;, &quot;grade&quot;]); // Array of top-level groups: [&quot;Remedial Programming&quot;, &quot;Literary Posturing&quot;, &quot;Documenting with Pretty Colors&quot;] byClassGrade[0].children; // Children of a single group: [&quot;C&quot;, &quot;B&quot;] byClassGrade[0].records; // Array original records for a single group byClassGrade.lookup(&quot;Remedial Programming&quot;); // lookup a top-level group by name byClassGrade.lookup([&quot;Remedial Programming&quot;,&quot;B&quot;]); // lookup a second-level group by name path byClassGrade.lookup([&quot;Remedial Programming&quot;,&quot;B&quot;]).namePath(' -&gt; '); // &quot;Remedial Programming -&gt; B&quot; byClassGrade.lookup([&quot;Remedial Programming&quot;,&quot;B&quot;]).dimPath() // &quot;class/grade&quot; </code></pre> <p>Supergroup can flatten a tree into an array of nodes much like D3&#8217;s hierarchy layout, but in a way that&#8217;s easier to use IMHO. <code>javascript byClassGrade.flattenTree(); // [&quot;Remedial Programming&quot;, &quot;C&quot;, &quot;A&quot;, &quot;B&quot;, &quot;Literary Posturing&quot;, &quot;B&quot;, &quot;Documenting with Pretty Colors&quot;, &quot;B&quot;] byClassGrade.flattenTree().invoke('namePath'); // [&quot;Remedial Programming&quot;, &quot;Remedial Programming/C&quot;, &quot;Remedial Programming/A&quot;, &quot;Remedial Programming/B&quot;, &quot;Literary Posturing&quot;, &quot;Literary Posturing/B&quot;, &quot;Documenting with Pretty Colors&quot;, &quot;Documenting with Pretty Colors/B&quot;] // only want leaf nodes? byClassGrade.leafNodes().invoke('namePath'); // [&quot;Remedial Programming/C&quot;, &quot;Remedial Programming/A&quot;, &quot;Remedial Programming/B&quot;, &quot;Literary Posturing/B&quot;, &quot;Documenting with Pretty Colors/B&quot;] </code></p> <!-- { old stuff % jsfiddle us9k9/2 % } In a SQL group by query you get one record for each resulting group and you can calculate values based on the aggregate of the rows comprised by each group. Another step is needed to go back from the group to the individual rows in that group. D3's nest and Underscore's groupBy do slightly better in that they offer a collection of groups where each group is tied to its associated records. To explain the advantages of supergroup over Underscore and D3's versions, let's compare the results: ``` javascript Underscore's groupBy _.groupBy(gradeBook,'lastName') => { Gold: [ { firstName: "Sigfried", lastName: "Gold", class: "Remedial Programming", grade: "C", num: 2 }, { firstName: "Sigfried", lastName: "Gold", class: "Literary Posturing", grade: "B", num: 3 }, { firstName: "Sigfried", lastName: "Gold", class: "Documenting with Pretty Colors", grade: "B", num: 3 } ], Else: [ { firstName: "Someone", lastName: "Else", class: "Remedial Programming", grade: "B", num: 3 } ] } ``` ``` javascript D3's nest and map d3.nest().key(function(d) { return d.lastName }).map(gradeBook) // same result as Underscore. ``` Because D3 visualizations depend so completely on arrays, D3 provides a way of structuring groups as arrays: ``` javascript D3's nest and entries d3.nest().key(function(d) { return d.lastName }).entries(gradeBook) => [ { key: "Gold", values: [ { firstName: "Sigfried", lastName: "Gold", class: "Remedial Programming", grade: "C", num: 2 }, { firstName: "Sigfried", lastName: "Gold", class: "Literary Posturing", grade: "B", num: 3 }, { firstName: "Sigfried", lastName: "Gold", class: "Documenting with Pretty Colors", grade: "B", num: 3 } ] }, { key: "Else", values: [ { firstName: "Someone", lastName: "Else", class: "Remedial Programming", grade: "B", num: 3 } ] } ] // making a list with this data in D3 might look like this: gradeBookEntries = d3.nest() .key(function(d) { return d.lastName }) .key(function(d) { return d.grade }) .entries(gradeBook) _.rebind(console, 'log') // so console.log can be used as a callback d3.select('div#main').append('ul').selectAll('li') .data(gradeBookEntries) .enter() .append('li') .text(function(d) { return d.key }) .on('click', console.log) .append('ul').selectAll('li') .data(function(d) { return d.values}) .enter() .append('li') .text(function(d) { return d.key + ': ' + d.values.map(function(r) { return r.class }).join(', ') }) .on('click', console.log) gradeBookNames = _.supergroup(gradeBook,['lastName','grade']); d3.select('div#main').append('ul').selectAll('li') .data(gradeBookNames) .enter() .append('li') .text(_.identity) .on('click', console.log) .append('ul').selectAll('li') .data(function(d) { return d.children}) .enter() .append('li') .text(function(d) { return d + ': ' + d.records.pluck('class').join(', ') }) .on('click', console.log) ``` These produce identical results with fairly similar syntax, but when the visualization becomes more complex, the supergroup nodes are much more useful. A common use case is providing information about a node on mouseover. One drawback of d3.nest above is a difference in datum types between parent and leaf nodes: datum.values at a parent node is an array of {key:'...',values:[...]}, but at the leaf node it's an array of raw records. Supergroup does not mix up raw records and hierarchy children in this way. At every level 'records' refers to raw records (which you can only access as leaf nodes in d3.nest) and 'children' refers to nested children if there are any at that node. gradeBookNames = _.supergroup(gradeBook,['lastName','grade']); d3.select('div#main').append('ul').selectAll('li') .data(gradeBookEntries) .enter() .append('li') .text(_.identity) .append('ul').selectAll('li') .data(function(d) { return d.records}) .enter() .append('li') .text(function(d) { return d.namePath() }) d3.select('body').append('ul').selectAll('li') .data(gradeBookEntries) .enter() .append('li') .text(function(d) { return d.key }) .append('p') .text(function(d) { return d.values.length + ' records in group ' + this.parentNode.__data__.key }) ``` has the exact same result (with less pleasant syn ``` javascript var gradeBook = [ {firstName: 'Sigfried', lastName: 'Gold', class: 'Remedial Programming', grade: 'C+', num: 2.2}, {firstName: 'Sigfried', lastName: 'Gold', class: 'Literary Posturing', grade: 'B', num: 3}, {firstName: 'Sigfried', lastName: 'Gold', class: 'Documenting with Pretty Colors', grade: 'B-', num: 2.7}, {firstName: 'Someone', lastName: 'Else', class: 'Remedial Programming', grade: 'A'}]; var gradesByLastName = enlightenedData.group(gradeBook, 'lastName') ``` ``` javascript var gradesByName = enlightenedData.group(gradeBook, function(d) { return d.lastName + ', ' + d.firstName }, {dimName: 'fullName'}) var sigfried = gradesByName.lookup('Gold, Sigfried'); sigfried.records.length; // 3 var sigfriedGPA = sigfried.records.reduce(function(result,rec) { return result+rec.num }, 0) / sigfried.records.length; (it does much much more, will explain below) ``` { old % include_code supergroup-test.js % } --> </section> </div> <script src="assets/prism.js"></script> </body> </html>