supergroup
Version:
Nested groups on arrays of objects where groups are Strings that know what you want them to know about themselves and their relatives.
342 lines (276 loc) • 16.3 kB
HTML
<html>
<head>
<meta charset="utf-8"/>
<meta name="layout" content="post"/>
<title>"supergroup"</title>
<meta name="comments" content="true"/>
<meta name="categories" content="[repo]"/>
<meta name="source" content="https://github.com/Sigfried/supergroup"/>
<link type="text/css" rel="stylesheet" href="./style.css"/>
<link type="text/css" rel="stylesheet" href="./assets/prism.css"/>
</head>
<body>
<script src="https://cdnjs.cloudflare.com/ajax/libs/underscore.js/1.8.2/underscore.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.5/d3.js"></script>
<script src="https://rawgit.com/Sigfried/supergroup/master/supergroup.js"></script>
<!-- more -->
<div>
<section>
<h1 id="supergroup.js">Supergroup.js</h1>
<p>Supergroup brings extreme convenience and understandability to the manipulation of
Javascript data collections, especially in the context of D3.js visualization
programming.</p>
<p>As if in submission to the great programmers commandment–<em>Don’t
Repeat Yourself</em>–every time I find myself writing a piece of code
that solves basically the same problem I’ve solved a dozen times
before, a little piece of my soul dies.</p>
<p>Utilities for grouping record collections into maps or nests abound:
<a href="https://github.com/mbostock/d3/wiki/Arrays#-nest">d3.nest</a>,
<a href="https://github.com/mbostock/d3/wiki/Arrays#associative-arrays">d3.map</a>,
<a href="http://underscorejs.org/#groupBy">Underscore.groupBy</a>,
<a href="https://github.com/iros/underscore.nest">Underscore.Nest</a>, to name
a few. But after these tools relieve us of a certain amount of
repetitive stress, we’re often left with a tangle of hairy details
that fill us with a dreadful sense of deja vu. Supergroup may seem
like the kind of tacky wonder gadget you’d find on a late-night
Ronco ad, but, for the low, low price of free, it makes data-centric
Javascript programming fun again. <strong>And</strong>, when you find yourself
in a D3.js callback routine holding a datum object that might have
come from anywhere–for instance, with a tooltip callback used on
disparate object types–everything you want to know about your
object and its associated metadata and records is right there at
your fingertips.</p>
<p>Just to be clear about the problem—you start with tabular data from a CSV
file, a SQL query, or some AJAX call:
<p><span class="iframe">Some very fake hospital data in a CSV file…</span>
<iframe width="100%" height="70px" src="examples/examples.html?data">
</iframe></p></p>
<p><span class="iframe">...turned into canonical array of Objects (using d3.csv, for instance)</span>
<iframe width="100%" height="80px" src="examples/examples.html?json">
</iframe></p>
<p>Without Supergroup, you’d group the records on the values of one or more fields
with a standard grouping function, giving you data like:</p>
<p><span class="iframe">d3.nest().key(function(d) { return d.Physician; }).key(function(d) { return d.Unit; }).map(data)</span>
<iframe width="100%" height="150px" src="examples/examples.html?d3map">
</iframe></p>
<p><span class="iframe">d3.nest().key(function(d) { return d.Physician; }).key(function(d) { return d.Unit; }).entries(data)</span>
<iframe width="100%" height="150px" src="examples/examples.html?d3nest">
</iframe></p>
<p>To my mind, these are awkward data structures (not to mention the awkwardness
of the calling functions.) The <code>map</code> version looks ok in the console, but
D3 wants data in arrays, not as objects. The <code>entries</code> version gives us
arrays of key/value pairs, but on upper levels <code>values</code> is another array of
key/value pairs while on the bottom level <code>values</code> is an array of records. In
both <code>entries</code> and <code>map</code>, you can’t tell from a node at any level what
dimension was being grouped at that level. </p>
<p>Supergroup gives you almost everything you’d want for every item in your nest
(or in your single array if you have a one-level grouping):</p>
<ul>
<li>An array of the values grouped on (could be strings, numbers, or dates) (<a href="#sgphysunit">example</a>)</li>
<li>The records associated with each group (<a href="#records">example</a>)</li>
<li>Information about the values at any level
<ul>
<li>Parent group if any</li>
<li>Immediate child groups if any</li>
<li>All descendant groups</li>
<li>Only descendant groups at the leaf level</li>
<li>Aggregate calculations on records for that group and its descendants</li>
<li>Path of group names from root to current group</li>
<li>Path of group dimension names from root to current group</li>
</ul></li>
<li>Information about the groupings at any level</li>
<li>For a group at any level, the name of the dimension (attribute, column, property, etc.) grouped on</li>
<li>Any of these in a format D3 or some other tool expects</li>
</ul>
<h2 id="supergroup">Supergroup</h2>
<p><code class="language-javascript">
var foo = bar;
</code></p>
<p>Works as an Underscore (or Lo-Dash) mixin: </p>
<pre class="language-markup" data-src="mixin_example.html"></pre>
<h2 id="aplainarrayofstringsenhancedwithchildrenandrecords">A plain Array of Strings, enhanced with children and records</h2>
<p><code>_.supergroup(data, fieldname)</code> returns an array whose elements are the
distinct values of <code><fieldname></code> in the original data records. These elements,
or Values can be String or Number objects (Dates to be implemented eventually).
Each Value holds a <code>.records</code> property which is an array containing the subset of
original records matching that Value.</p>
<p>In the example below we do a multi-level grouping by Physician and Unit. So
<code>sg = _.supergroup(data,['Physician','Unit'])</code> returns a list of
physicians (the top-level grouping). The first item in this list,
<code>sg[0]</code>, is “Adams”, a String object. <code>sg[0].records</code> is an array
containing the records where Physician=“Adams”. <code>sg[0].children</code> is a
list of the Units (our second-level grouping) in the records where
Physician=“Adams”. <code>sg[0].children[0].records</code> would be the subset of
records where Physician=“Adams” and Unit=“preop”.</p>
<p><a id='sgphysunit'></a>
<p><span class="iframe">Supergroup on physician and unit</span>
<iframe width="100%" height="400px" src="examples/examples.html?sgphysunit">
</iframe></p></p>
<p>It does a bunch more I still need to document.</p>
<hr/>
<h2 id="everythingbelowisolddocumentationimtryingtoreplace">Everything below is old documentation I’m trying to replace</h2>
<pre><code class="json Some records loaded from a CSV or JSON file">var gradeBook = [
{lastName: "Gold", firstName: "Sigfried", class: "Remedial Programming", grade: "C", num: 2},
{lastName: "Gold", firstName: "Sigfried", class: "Literary Posturing", grade: "B", num: 3},
{lastName: "Gold", firstName: "Sigfried", class: "Documenting with Pretty Colors", grade: "B", num: 3},
{lastName: "Sassoon", firstName: "Sigfried", class: "Remedial Programming", grade: "A", num: 3},
{lastName: "Androy", firstName: "Sigfried", class: "Remedial Programming", grade: "B", num: 3}
];
</code></pre>
<pre><code class="javascript Grouping on one dimension">var byLastName = _.supergroup(gradeBook, "lastName"); // an Array of Strings: ["Gold","Sassoon","Androy"]
byLastName[0].records; // Array of Sigfried Gold's original 3 records
byLastName.rawValues(); // Array of native strings (easier to look at or use in contexts where you need a plain string)
</code></pre>
<pre><code class="javascript Grouping by a calculated value">var byName = _.supergroup(gradeBook, function(d) { return d.firstName + ' ' + d.lastName; });
// an Array of Strings: ["Sigfried Gold","Sigfried Sassoon","Sigfried Androy"]
</code></pre>
<pre><code class="javascript It's a native Array, but you can treat it as map, and then do cool stuff. Here's a GPA:">byName.lookup("Sigfried Gold").records.pluck("num").mean(); // 2.6666666666666665
</code></pre>
<p>The above example shows how Supergroup can chain Underscore methods (and mixins), functionality
it gets from <a href="../underscore-unchained">underscore-unchained</a>.</p>
<pre><code class="javascript Grouping hierarchically">var byClassGrade = _.supergroup(gradeBook, ["class", "grade"]); // Array of top-level groups: ["Remedial Programming", "Literary Posturing", "Documenting with Pretty Colors"]
byClassGrade[0].children; // Children of a single group: ["C", "B"]
byClassGrade[0].records; // Array original records for a single group
byClassGrade.lookup("Remedial Programming"); // lookup a top-level group by name
byClassGrade.lookup(["Remedial Programming","B"]); // lookup a second-level group by name path
byClassGrade.lookup(["Remedial Programming","B"]).namePath(' -> '); // "Remedial Programming -> B"
byClassGrade.lookup(["Remedial Programming","B"]).dimPath() // "class/grade"
</code></pre>
<p>Supergroup can flatten a tree into an array of nodes much like D3’s hierarchy layout, but in a way
that’s easier to use IMHO.
<code>javascript
byClassGrade.flattenTree(); // ["Remedial Programming", "C", "A", "B", "Literary Posturing", "B", "Documenting with Pretty Colors", "B"]
byClassGrade.flattenTree().invoke('namePath'); // ["Remedial Programming", "Remedial Programming/C", "Remedial Programming/A", "Remedial Programming/B", "Literary Posturing", "Literary Posturing/B", "Documenting with Pretty Colors", "Documenting with Pretty Colors/B"]
// only want leaf nodes?
byClassGrade.leafNodes().invoke('namePath'); // ["Remedial Programming/C", "Remedial Programming/A", "Remedial Programming/B", "Literary Posturing/B", "Documenting with Pretty Colors/B"]
</code></p>
<!--
{ old stuff % jsfiddle us9k9/2 %
}
In a SQL group by query you get one record for each resulting group and
you can calculate values based on the aggregate of the rows comprised by
each group. Another step is needed to go back from the group to
the individual rows in that group. D3's nest and Underscore's groupBy do
slightly better in that they offer a collection of groups where each group
is tied to its associated records.
To explain the advantages of supergroup over Underscore and D3's versions, let's compare the results:
``` javascript Underscore's groupBy
_.groupBy(gradeBook,'lastName')
=> {
Gold: [
{ firstName: "Sigfried", lastName: "Gold", class: "Remedial Programming", grade: "C", num: 2 },
{ firstName: "Sigfried", lastName: "Gold", class: "Literary Posturing", grade: "B", num: 3 },
{ firstName: "Sigfried", lastName: "Gold", class: "Documenting with Pretty Colors", grade: "B", num: 3 }
],
Else: [
{ firstName: "Someone", lastName: "Else", class: "Remedial Programming", grade: "B", num: 3 }
]
}
```
``` javascript D3's nest and map
d3.nest().key(function(d) { return d.lastName }).map(gradeBook) // same result as Underscore.
```
Because D3 visualizations depend so completely on arrays, D3 provides a way of structuring groups as arrays:
``` javascript D3's nest and entries
d3.nest().key(function(d) { return d.lastName }).entries(gradeBook)
=> [
{ key: "Gold",
values: [
{ firstName: "Sigfried", lastName: "Gold", class: "Remedial Programming", grade: "C", num: 2 },
{ firstName: "Sigfried", lastName: "Gold", class: "Literary Posturing", grade: "B", num: 3 },
{ firstName: "Sigfried", lastName: "Gold", class: "Documenting with Pretty Colors", grade: "B", num: 3 }
]
},
{ key: "Else",
values: [
{ firstName: "Someone", lastName: "Else", class: "Remedial Programming", grade: "B", num: 3 }
]
}
]
// making a list with this data in D3 might look like this:
gradeBookEntries = d3.nest()
.key(function(d) { return d.lastName })
.key(function(d) { return d.grade })
.entries(gradeBook)
_.rebind(console, 'log') // so console.log can be used as a callback
d3.select('div#main').append('ul').selectAll('li')
.data(gradeBookEntries)
.enter()
.append('li')
.text(function(d) { return d.key })
.on('click', console.log)
.append('ul').selectAll('li')
.data(function(d) { return d.values})
.enter()
.append('li')
.text(function(d) { return d.key + ': ' + d.values.map(function(r) { return r.class }).join(', ') })
.on('click', console.log)
gradeBookNames = _.supergroup(gradeBook,['lastName','grade']);
d3.select('div#main').append('ul').selectAll('li')
.data(gradeBookNames)
.enter()
.append('li')
.text(_.identity)
.on('click', console.log)
.append('ul').selectAll('li')
.data(function(d) { return d.children})
.enter()
.append('li')
.text(function(d) { return d + ': ' + d.records.pluck('class').join(', ') })
.on('click', console.log)
```
These produce identical results with fairly similar syntax, but when the visualization
becomes more complex, the supergroup nodes are much more useful. A common use case
is providing information about a node on mouseover.
One drawback of d3.nest above is a difference in datum types between parent and leaf
nodes: datum.values at a parent node is an array of {key:'...',values:[...]}, but at
the leaf node it's an array of raw records.
Supergroup does not mix up raw records and hierarchy children in this way. At every
level 'records' refers to raw records (which you can only access as leaf nodes in
d3.nest) and 'children' refers to nested children if there are any at that node.
gradeBookNames = _.supergroup(gradeBook,['lastName','grade']);
d3.select('div#main').append('ul').selectAll('li')
.data(gradeBookEntries)
.enter()
.append('li')
.text(_.identity)
.append('ul').selectAll('li')
.data(function(d) { return d.records})
.enter()
.append('li')
.text(function(d) { return d.namePath() })
d3.select('body').append('ul').selectAll('li')
.data(gradeBookEntries)
.enter()
.append('li')
.text(function(d) { return d.key })
.append('p')
.text(function(d) { return d.values.length + ' records in group ' + this.parentNode.__data__.key })
```
has the exact same result (with less pleasant syn
``` javascript
var gradeBook = [
{firstName: 'Sigfried', lastName: 'Gold', class: 'Remedial Programming', grade: 'C+', num: 2.2},
{firstName: 'Sigfried', lastName: 'Gold', class: 'Literary Posturing', grade: 'B', num: 3},
{firstName: 'Sigfried', lastName: 'Gold', class: 'Documenting with Pretty Colors', grade: 'B-', num: 2.7},
{firstName: 'Someone', lastName: 'Else', class: 'Remedial Programming', grade: 'A'}];
var gradesByLastName = enlightenedData.group(gradeBook, 'lastName')
```
``` javascript
var gradesByName = enlightenedData.group(gradeBook,
function(d) { return d.lastName + ', ' + d.firstName },
{dimName: 'fullName'})
var sigfried = gradesByName.lookup('Gold, Sigfried');
sigfried.records.length; // 3
var sigfriedGPA = sigfried.records.reduce(function(result,rec) { return result+rec.num }, 0) / sigfried.records.length;
(it does much much more, will explain below)
```
{ old % include_code supergroup-test.js %
}
-->
</section>
</div>
<script src="assets/prism.js"></script>
</body>
</html>